当前位置:主页 > 管理论文 > 移动网络论文 >

林产品商务Web信息源发现技术研究

发布时间:2018-11-28 08:57
【摘要】:为解决“信息过载”问题和满足专业领域信息需求,面向主题的Web信息整合技术逐渐成为研究热点。Web信息整合技术将位于不同信息源的分散的主题信息进行整合并提供垂直信息服务。林产品商务网站中包含的林产品供求信息是一种重要的林业信息资源,整合互联网上分散的林产品供求信息是实现高效的林产品商务信息服务的基础。已有的研究对不同林产品商务网站中的信息进行了整合,但整合信息来自的信息源是由人工获取的,人工搜索不仅工作量大,而且获取数量有限。林产品商务信息源数量众多且分布广泛,需要一种林产品商务信息源的自动发现方法。 本文首先介绍了现有的网站发现方法及相关技术,然后结合林产品商务网站的特点,提出了一种林产品商务信息源的自动发现方法,将信息源发现转化为网络搜索过程和网站分类过程。网络搜索过程的目的是从“种子网站”出发,在尽量少爬取网页的同时发现站外链接,对“种子网站”进行扩展;网站分类的目的是将符合条件的林产品商务网站从所有网站中区分开来。在网站分类时,研究建立了林产品商务网站特征词库,提出了一种基于关键资源的改进的向量空间模型来描述网站主题,使用基于SVM的分类器对网站进行分类判别。最后设计并实现了林产品商务信息源发现模块,通过实验发现了110个林产品商务网站,验证了所提方法的有效性,能够有效解决林产品商务Web信息整合中的信息源发现问题。
[Abstract]:To address the problem of "information overload" and to meet the information needs of specialized areas, Topic oriented Web information integration technology has gradually become a research hotspot. Web information integration technology integrates distributed topic information located in different information sources and provides vertical information services. The supply and demand information of forest products contained in the forest products commerce website is an important forestry information resource. Integrating the scattered forest product supply and demand information on the Internet is the basis to realize the efficient commercial information service of forest products. Existing studies have integrated the information in different forest products commercial websites, but the integrated information from the information source is obtained by artificial, manual search not only heavy workload, but also a limited amount of access. The commercial information sources of forest products are numerous and widely distributed, so it is necessary to find the commercial information sources of forest products automatically. This paper first introduces the existing methods of website discovery and related technologies, and then, according to the characteristics of forest product commerce website, puts forward an automatic discovery method of forest product business information source. The information source discovery is transformed into the web search process and the website classification process. The purpose of the network search process is to start from the "seed website", to find out the link outside the station while crawling the webpage as little as possible, and to expand the "seed website"; The purpose of website classification is to distinguish eligible forest products business websites from all websites. In the process of website classification, the characteristic lexicon of forest product commerce website is established, an improved vector space model based on key resources is proposed to describe the website topic, and the classifier based on SVM is used to classify and discriminate the website. Finally, the commercial information source discovery module of forest products is designed and implemented. 110 commercial forest products websites are found through experiments. The effectiveness of the proposed method is verified, and the problem of information source discovery in forest product commercial Web information integration is effectively solved.
【学位授予单位】:北京林业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;F326.2;F724.6

【参考文献】

相关期刊论文 前6条

1 陈龙;范瑞霞;高琪;;基于概念的文本表示模型[J];计算机工程与应用;2008年20期

2 朱炜,李俊,王超,潘金贵;一个自动发现确定主题下资源的系统[J];计算机应用研究;2004年11期

3 刘雪琼;武刚;邓厚平;;Web信息整合中的数据去重方法[J];计算机应用;2013年09期

4 李有梅;基于词义的关键词抽取方法研究[J];情报理论与实践;2000年02期

5 庞观松;蒋盛益;;文本自动分类技术研究综述[J];情报理论与实践;2012年02期

6 李会;王立峰;;Web网页文本特征选择方法研究[J];计算机工程与设计;2010年16期

相关博士学位论文 前1条

1 杨抒;基于WEB的林产品信息资源整合方法研究[D];北京林业大学;2011年



本文编号:2362405

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2362405.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户0bf20***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com