当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于本体的食品安全垂直搜索引擎研究

发布时间:2018-07-12 11:58

  本文选题:垂直搜索引擎 + 本体 ; 参考:《浙江工业大学》2013年硕士论文


【摘要】:最近一段时间,由于不断有食品安全事件出现,社会各界人士开始将目光投向了食品安全问题。国家虽然建有统一的食品安全信息发布平台,但是就目前运营状况看,难以达到信息全面、更新及时、与老百姓互动的基本水平。因此,有必要建立一个公共信息平台,及时收集分散在互联网上食品安全信息,为人民群众提供信息服务。 随着搜索引擎技术快速发展和各行业需求的变化,国内外出现了许多面向各领域的垂直搜索引擎,但能够搜索食品信息的引擎较少。实践证明本体等自然语言处理技术的使用,可以提高了搜索的精准度,因此,许多垂直搜索引擎开始应用本体技术。随着食品安全法规的制定,食品安全标准已经较为详实,这就为构建食品安全本体铺平了道路,并可以应用本体技术来提高系统的食品安全检测能力。由搜索引擎获取信息重要功能模块是聚焦爬虫,因此如何针对食品安全领域提高聚焦爬虫效能,以获得较为理想的准确率和召回率是本文研究的重点。 因此,本文首先分析了官方网站和一般论坛网站的不同特征,提出了不同的页面搜索方法。文章通过分析了常用聚焦爬虫运行算法,针对食品安全信息较多、分类较细而导致检索结果不理想的情况,提出了一种组合搜索算法。这种策略包括利用本体开展更为高效的链接分析和相关度计算。本文采用了优化的Fish-Search算法,使用本体进行关键词筛选和拓展,分步实现相关度分析。即先用向量空间模型计算,然后结合本体得出较精确的文档相似度,再进行分类。在此过程中用k最近邻算法和Bayes算法做分类算法,基本做到了先宏观再微观的主题筛选。最后,在这个基础上,本文做了相关信息抓取和检索实验,实验结果表明,基于本体的信息检索方法能能够明显提高网络蜘蛛的抓取效率和对食品安全信息搜索的查准率。
[Abstract]:Recently, due to the continuous food safety incidents, people from all walks of life began to focus on food safety issues. Although the country has a unified food safety information publishing platform, it is difficult to reach the basic level of comprehensive information, timely update and interaction with the common people. Therefore, it is necessary to set up a public information platform to collect food safety information scattered over the Internet in time to provide information services for the people. With the rapid development of search engine technology and the change of industry demand, there are many vertical search engines facing various fields at home and abroad, but there are fewer engines to search food information. Practice has proved that the use of natural language processing technology such as ontology can improve the accuracy of search. Therefore many vertical search engines begin to apply ontology technology. With the establishment of food safety laws and regulations, food safety standards have been more detailed, which paved the way for the construction of food safety ontology, and can use ontology technology to improve the system's ability of food safety detection. The important function module of obtaining information from search engine is focused crawler, so how to improve the efficiency of focused crawler in the field of food safety in order to obtain ideal accuracy and recall rate is the focus of this paper. Therefore, this paper firstly analyzes the different features of official website and general forum website, and puts forward different page search methods. Based on the analysis of common focused crawler algorithms, a combined search algorithm is proposed to solve the problem that the food safety information is more and the classification is fine, which leads to the unsatisfactory retrieval results. This strategy includes using ontology to carry out more efficient link analysis and correlation calculation. In this paper, the optimized Fish-Search algorithm is used to filter and expand the keywords of ontology, and the correlation analysis is realized step by step. First, the vector space model is used to calculate, then the more accurate document similarity is obtained by ontology, and then classified. In this process, k-nearest neighbor algorithm and Bayes algorithm are used as classification algorithms. Finally, on this basis, this paper has done the relevant information capture and retrieval experiment, the experimental results show that the ontology-based information retrieval method can obviously improve the efficiency of web spider capture and the precision of food safety information search.
【学位授予单位】:浙江工业大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TS201.6;TP391.3

【参考文献】

相关期刊论文 前4条

1 周立柱,林玲;聚焦爬虫技术研究综述[J];计算机应用;2005年09期

2 艾英山;张德贤;;基于文本和类别信息的KNN文本分类算法[J];计算机与数字工程;2009年11期

3 刘博;杨柳;袁方;;改进的KNN方法及其在中文文本分类中的应用[J];西华大学学报(自然科学版);2008年02期

4 陈军;陈竹敏;;基于网页分块的Shark-Search算法[J];山东大学学报(理学版);2007年09期



本文编号:2117077

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2117077.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户eeaec***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com