基于内容的农业网络信息可信度评估方法研究
发布时间:2018-12-16 07:38
【摘要】:随着网络技术的普及,信息化技术得以飞速发展,同样农业在社会信息化过程中也在逐步实现农业的信息化。农业的主体是农民,在农业信息化服务中,由于农民普遍存在知识文化水平不高以及经济能力较弱的问题,无法判别网络中各类信息的真实可靠性。本文针对农业信息服务过程中出现的这些问题,对如何评估农业网络信息的可信度问题进行了研究,主要工作包括:(1)针对传统的TF-IDF主题提取方法没有考虑词语所在网页位置的问题,提出基于词语位置权重的TF-IDF方法提取农业web信息的主题,经实验验证本文提出的改进方法的主题提取精度高于传统的TF-IDF方法,提取效果理想;(2)针对候选网页获取阶段搜索引擎没有考虑其可信度的问题,提出基于内容的农业网络信息可信度评估方法,主要构建有四层可信度评估指标的指标体系:第一层判断网页的权威性,针对目前还没有网页权威性的分类和量化标准问题,自定义一个网站权威度的权重赋予表,对区分不同的网页权威性效果较好;第二层判断网页的时效性,提出一种以网络信息内容发布日期来建立特定的时间衰减函数的新方法,能够更好的反映时效性对农业网络信息可信度的影响;第三层判断网页的相关性,通过引入VSM模型生成每个候选网页各自的词频向量,对候选网页的内容与关键字的相关程度进行计算;第四层判断网页的影响力,结合网页链接和用户行为两方面引入网站PR值、Page View值和Time on Page值,能够很好地量化网页影响力的大小;(3)设置不同的主题来反映查询词个数与主题相关性之间的联系,结果表明,选取4个查询词得到的候选网页主题相关性平均值为77.4%,结果为最优;(4)分别建立搜索引擎自然排序、缺相关性指标的排序和本文基于内容的评估方法的排序,对候选网页可信度进行验证。自然排序可信度值分布落差较大;缺相关性指标排序把一些与主题内容无关的信息排在了靠前位置;本文方法的排序把主题内容相关的且可信度高的网页筛选出来并能最先提供给用户,说明本文基于内容的评估方法对评价农业web信息可信度是具有一定的有效性和实用性。
[Abstract]:With the popularization of network technology, information technology has been developed rapidly, and agriculture is gradually realizing agricultural informatization in the process of social informatization. The main body of agriculture is farmers. In the service of agricultural information, it is impossible for farmers to judge the true reliability of all kinds of information in the network because of the problems of low level of knowledge and culture and weak economic ability. In view of these problems in the process of agricultural information service, this paper studies how to evaluate the credibility of agricultural network information. The main work includes: (1) aiming at the problem that the traditional TF-IDF topic extraction method does not consider the location of words on the web page, a TF-IDF method based on word position weight is proposed to extract agricultural web information. The experimental results show that the proposed method is more accurate than the traditional TF-IDF method, and the extraction effect is ideal. (2) aiming at the problem that the search engine does not consider its credibility in the stage of obtaining candidate web pages, a content-based method for evaluating the credibility of agricultural network information is proposed. This paper mainly constructs an index system with four levels of credibility evaluation index: the first layer judges the authority of the web page, aiming at the problem that there is no authoritative classification and quantification standard of the web page at present, we define a weighting table of the authority degree of the website. It has a good effect on differentiating the authority of different web pages. The second layer judges the timeliness of the web page, and puts forward a new method to establish the specific time attenuation function by the date of the publication of the network information content, which can better reflect the influence of the timeliness on the credibility of the agricultural network information. The third layer judges the relevance of the web page, and generates the word frequency vector of each candidate page by introducing the VSM model, and calculates the correlation degree between the content of the candidate page and the keyword. The fourth layer judges the influence of the web page and introduces the, Page View value and Time on Page value of the website PR value in combination with the two aspects of the web page link and user behavior, which can well quantify the size of the influence of the web page. (3) different topics are set to reflect the relationship between the number of query words and the relevance of the topic. The results show that the average value of the topic relevance of the candidate pages is 77.4, and the result is the best; (4) search engine natural sort, lack of correlation index sort and content-based evaluation method are established respectively to verify the credibility of candidate web pages. The distribution of reliability value of natural ranking is large, and the ranking of lack of correlation index ranks some information which is independent of subject content in the front position. The ranking of the methods in this paper filters out the highly reliable web pages related to the subject content and can be provided to the users first. It shows that the evaluation method based on the content in this paper is effective and practical in evaluating the credibility of agricultural web information.
【学位授予单位】:湖南农业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:S126
本文编号:2382007
[Abstract]:With the popularization of network technology, information technology has been developed rapidly, and agriculture is gradually realizing agricultural informatization in the process of social informatization. The main body of agriculture is farmers. In the service of agricultural information, it is impossible for farmers to judge the true reliability of all kinds of information in the network because of the problems of low level of knowledge and culture and weak economic ability. In view of these problems in the process of agricultural information service, this paper studies how to evaluate the credibility of agricultural network information. The main work includes: (1) aiming at the problem that the traditional TF-IDF topic extraction method does not consider the location of words on the web page, a TF-IDF method based on word position weight is proposed to extract agricultural web information. The experimental results show that the proposed method is more accurate than the traditional TF-IDF method, and the extraction effect is ideal. (2) aiming at the problem that the search engine does not consider its credibility in the stage of obtaining candidate web pages, a content-based method for evaluating the credibility of agricultural network information is proposed. This paper mainly constructs an index system with four levels of credibility evaluation index: the first layer judges the authority of the web page, aiming at the problem that there is no authoritative classification and quantification standard of the web page at present, we define a weighting table of the authority degree of the website. It has a good effect on differentiating the authority of different web pages. The second layer judges the timeliness of the web page, and puts forward a new method to establish the specific time attenuation function by the date of the publication of the network information content, which can better reflect the influence of the timeliness on the credibility of the agricultural network information. The third layer judges the relevance of the web page, and generates the word frequency vector of each candidate page by introducing the VSM model, and calculates the correlation degree between the content of the candidate page and the keyword. The fourth layer judges the influence of the web page and introduces the, Page View value and Time on Page value of the website PR value in combination with the two aspects of the web page link and user behavior, which can well quantify the size of the influence of the web page. (3) different topics are set to reflect the relationship between the number of query words and the relevance of the topic. The results show that the average value of the topic relevance of the candidate pages is 77.4, and the result is the best; (4) search engine natural sort, lack of correlation index sort and content-based evaluation method are established respectively to verify the credibility of candidate web pages. The distribution of reliability value of natural ranking is large, and the ranking of lack of correlation index ranks some information which is independent of subject content in the front position. The ranking of the methods in this paper filters out the highly reliable web pages related to the subject content and can be provided to the users first. It shows that the evaluation method based on the content in this paper is effective and practical in evaluating the credibility of agricultural web information.
【学位授予单位】:湖南农业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:S126
【参考文献】
相关期刊论文 前10条
1 冀俊忠;张玲玲;吴晨生;吴金源;;基于知识语义权重特征的朴素贝叶斯情感分类算法[J];北京工业大学学报;2014年12期
2 胡堰;彭启民;胡晓惠;;一种基于隐语义概率模型的个性化Web服务推荐方法[J];计算机研究与发展;2014年08期
3 徐静;杨小平;柳增;;基于内容信任的Web信息可信度验证方法研究[J];北京理工大学学报;2014年07期
4 杨博;陈贺昌;朱冠宇;赵学华;;基于超链接多样性分析的新型网页排名算法[J];计算机学报;2014年04期
5 卓志宏;;一种基于语义信息的主题相关性判别模型[J];计算机与现代化;2013年09期
6 马海波;杨楠;于新兴;;用户差别化和主题敏感的PageRank算法[J];大连交通大学学报;2013年04期
7 黄f^;俞建家;;基于分类排名的网站可信度分析[J];福州大学学报(自然科学版);2013年01期
8 丁世飞;齐丙娟;谭红艳;;支持向量机理论与算法研究综述[J];电子科技大学学报;2011年01期
9 艾静;王仲远;孟小峰;;C-Rank:一种Deep Web数据记录可信度评估方法[J];计算机科学与探索;2009年06期
10 鞠时光;吕霞;王];;基于时间链接分析的页面排序优化算法[J];计算机应用研究;2009年07期
,本文编号:2382007
本文链接:https://www.wllwen.com/kejilunwen/nykj/2382007.html