当前位置:主页 > 科技论文 > 搜索引擎论文 >

林业动态信息快速搜索与集成

发布时间:2018-08-18 10:03
【摘要】:我国林业是国民经济的基础产业,担负着生态环境建设和促进社会可持续发展的重大使命,人类是林业产业建设中的主要受益群体,当林业产业结构形成时,人们就在其中发挥其各自不同的作用。近年来,林业信息化推进,促进了林业信息资源的共享,为公众提供了便利,促进了林业产业的发展。但是,林业信息化的发展还有其必要性,如何更好地利用林业信息资源,为我国林业科学领域广大科研人员、教学工作者以及林农服务,就迫切需要对林业信息实现快速搜索与集成。 如何从海量信息中快速查找到用户所需要的信息,已经成为公众对特定领域信息的查找所面临的主要问题。互联网上林业信息越来越庞杂且无序,普通的搜索引擎已经不能满足大众对个性化信息的需求。针对用户在进行林业主题信息查询时,通用搜索引擎需要花费大量时间及精力去查找所需要的信息,且主题信息的召回率和精确率都比较低,不能满足用户的需求,因此公众急需一个分类精确、数据全面、更新及时的林业主题搜索引擎。 本论文的研究内容来自导师主持的湖南省科技计划重点项目(2010nk2004)。本文以系统科学、林学、信息学和统计学等理论为指导,对林业动态信息搜索与集成进行了全面的研究。研究过程中,对国内外林业动态信息搜索与集成等方面的研究进行了综述。主要从林业动态信息的需求分析与分类、主题爬虫搜索器以及文本识别分类器等三个方面进行了研究,主要研究工作如下: (1)综合分析了国内外对于搜索引擎的既有理论和实践成果,表明了目前建立一个林业主题搜索引擎的重要性和必要性,并对其中的关键技术进行了深入研究。本研究将林业主题搜索引擎分为数据收集层、数据存储层以及数据表示层三个层次,并对这三个层次中涉及的相关方法进行了探讨和总结。 (2)利用网页上公布的信息,结合各部门及公众对林业动态信息的需求,明确对各部门及公众真正有实际意义的林业动态信息类别,并对所需林业动态信息进行分类、分块,使各种林业动态信息具体化,主要分为以下七类:林业科技信息、林业生产资料、林产品市场供求信息、花卉信息、林业政策法规、林业劳务信息、气象与环境信息。 (3)根据已构建的林业动态信息类别体系,搜集与之相关的林业专业网站,明确信息采集的网站来源。采集我们所需要的数据所提供的网站域名,并采集域名后的内容,同时辨别所采集的网站类别,实现对林业动态信息源的搜集及分类。 (4)运用基于内容分析与基于链接结构分析相结合的一种新型搜索策略,通过综合分析评价,对候选URL所指向的页面进行主题相关度判断以及对候选URL进行排序,实现最优的林业主题爬虫搜索器,从而使所下载的网页按与林业主题相关且重要性突出递减的顺序排列。 (5)采用计算机智能的SVM自动文本分类技术,对样本数据进行机器训练,实现对主题爬虫搜索器所采集到的林业动态信息进行分类存储,达到对林业主题搜索引擎的数据收集层的构建进行优化的目的。 林业动态信息搜索与集成是在对现有的搜索与集成技术进行研究和优化的基础上,融合了公众对林业动态信息的需求,使得公众在获取林业动态信息时的准确率、全面率和成功率都得到了明显提高。随着科学技术的快速发展,新理论、新方法、新技术将进一步运用于林业动态信息快速搜索与集成,林业信息管理与服务也将迈上新台阶。
[Abstract]:Forestry in China is the basic industry of the national economy and undertakes the important mission of ecological environment construction and social sustainable development. Mankind is the main beneficiary group in the construction of forestry industry. When the forestry industrial structure forms, people play their different roles in it. In recent years, forestry informatization has promoted forestry credit. The sharing of information resources provides convenience for the public and promotes the development of forestry industry. However, the development of forestry informatization is still necessary. How to make better use of forestry information resources and provide services for scientific researchers, teaching workers and foresters in the field of Forestry Science in China is an urgent need to search and collect forestry information quickly. It is.
How to quickly find the information users need from the mass of information has become a major problem facing the public in the search of information in a specific field. Forestry information on the Internet is becoming more and more complex and disorderly, and ordinary search engines can no longer meet the needs of the public for personalized information. When searching, the general search engine needs to spend a lot of time and energy to find the information needed, and the recall rate and accuracy of the subject information are relatively low, which can not meet the needs of users. Therefore, the public urgently needs a forestry subject search engine with accurate classification, comprehensive data and timely update.
The research content of this paper comes from the key project of Hunan Science and Technology Program (2010 nk2004), which is presided over by the tutor. Guided by the theories of system science, forestry, informatics and statistics, this paper makes a comprehensive study on the search and integration of forestry dynamic information. In the course of the study, the research on the search and integration of forestry dynamic information at home and abroad is carried out. In this paper, the demand analysis and classification of forestry dynamic information, subject crawler searcher and text recognition classifier are summarized.
(1) The existing theories and practices of search engines at home and abroad are analyzed comprehensively, which indicates the importance and necessity of establishing a forestry subject search engine at present, and the key technologies are studied deeply. The forestry subject search engine is divided into three layers: data collection layer, data storage layer and data representation layer. At the same time, we discuss and summarize the relevant methods in these three levels.
(2) Using the information published on the web pages and combining with the demand of various departments and the public for forestry dynamic information, the types of forestry dynamic information which are really meaningful to the departments and the public are defined, and the required forestry dynamic information is classified and divided into seven groups, so as to concretize the various forestry dynamic information. Forestry means of production, market supply and demand information for forest products, flower information, forestry policies and regulations, Forestry labor information, meteorological and environmental information.
(3) According to the established forestry dynamic information classification system, collect the relevant forestry professional websites, identify the source of information collection websites, collect the domain name of the websites provided by the data we need, and collect the content after the domain name, at the same time identify the websites collected, so as to realize the collection and classification of forestry dynamic information sources.
(4) Using a new search strategy based on content analysis and link structure analysis, through comprehensive analysis and evaluation, the topic relevance of the pages pointed by the candidate URLs is judged and the candidate URLs are sorted to achieve the optimal forestry theme crawler searcher, so that the downloaded pages are related to forestry topics. And the importance is highlighted in decreasing order.
(5) Adopting SVM automatic text categorization technology of computer intelligence, the sample data is trained by machine, and the dynamic forestry information collected by the subject crawler searcher is classified and stored, so as to optimize the data collection layer of the forestry subject search engine.
Forestry dynamic information search and integration is based on the research and optimization of existing search and integration technology, which integrates the public demand for forestry dynamic information. The accuracy, comprehensiveness and success rate of public access to forestry dynamic information have been significantly improved. New methods and new technologies will be further applied to the rapid search and integration of forestry dynamic information, and forestry information management and service will also take a new step.
【学位授予单位】:中南林业科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:S712

【参考文献】

相关期刊论文 前10条

1 方鸿锦;孙旭东;刘燕德;;江西省农业信息化与新农村建设的研究[J];安徽农业科学;2007年34期

2 张黎烁;李鑫;徐猛;;基于PageRank的网页主题相关性算法研究[J];光盘技术;2008年12期

3 王灏,黄厚宽,田盛丰;文本分类实现技术[J];广西师范大学学报(自然科学版);2003年01期

4 刘林,汪涛,樊孝忠;主题爬虫的解决方案[J];华南理工大学学报(自然科学版);2004年S1期

5 郑丽桑;兰樟仁;卢毅敏;;福建省林业信息服务平台的研究[J];集美大学学报(自然科学版);2006年02期

6 钱功伟;倪林;曹荣;;基于网页链接和内容分析的改进PageRank算法[J];计算机工程与应用;2007年21期

7 欧阳柳波,李学勇,李国徽,王鑫;专业搜索引擎搜索策略综述[J];计算机工程;2004年13期

8 吴明礼,施水才;一种结合超链接分析的搜索引擎排序方法[J];计算机工程;2004年15期

9 李勇;韩亮;;主题搜索引擎中网络爬虫的搜索策略研究[J];计算机工程与科学;2008年03期

10 牛振国,符海芳,崔伟宏;面向多层用户的农业信息分类初步研究[J];计算机与农业.综合版;2003年03期

相关硕士学位论文 前7条

1 陈杰;主题搜索引擎中网络蜘蛛搜索策略研究[D];浙江大学;2006年

2 郑火国;农业信息服务平台的构建与实现[D];中国农业科学院;2006年

3 刘玮玮;搜索引擎中主题爬虫的研究与实现[D];南京理工大学;2006年

4 郑健珍;定题爬虫搜索策略研究[D];厦门大学;2007年

5 陈丛丛;主题爬虫搜索策略研究[D];山东大学;2009年

6 王冬坡;基于Lucene的主题搜索引擎的研究与实现[D];河北科技大学;2010年

7 冯明丽;面向个性化主题搜索的用户—查询词语义本体构建[D];西华大学;2010年



本文编号:2189131

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2189131.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户bff1e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com