企业级元搜索引擎的研究与应用
发布时间:2018-03-06 07:37
本文选题:企业级 切入点:元搜索引擎 出处:《复旦大学》2012年硕士论文 论文类型:学位论文
【摘要】:伴随全球信息化进程的长期高速发展,各式各样的信息以电子文件的形式在人们的存储设备中得到极速膨胀。与此同时,人们对信息的获取也提出了更高的要求。如何在浩如烟海的电子文件中快速准确地获取人们想要的信息,无疑成为一道难题。搜索引擎的诞生和发展在一定程度上能够解决这一难题,但同时搜索引擎也存在其局限性,我们无法期望单一搜索引擎能够满足不同场景下用户多变的搜索需求。 本文所要面对的搜索场景为集团环境下的企业级搜索。集团中的每个分公司保有其自身的搜索引擎,对分公司内部提供文档的全文搜索服务,但同时集团又有涵盖各分公司文档的搜索需求,因此就需要构建一个面向集团的企业级元搜索引擎。与web元搜索引擎存在较大不同,这里主要关注的是特定企业场景下的元搜索文档排序算法。 本文对经典的全文搜索引擎排序算法和元搜索引擎排序算法进行了广泛和深入的研究,分析和归纳了各排序算法的特点和适用场景。然后深入探讨了Lucene的文档评分机制,提出了针对元搜索引擎应用场景的规范化公式,以消除原本Lucene成员搜索引擎中对文档分值不适当的局部性加权。最后结合文档分值类算法、加权类算法以及Hits算法中的经典思想,提出一种混合型的加权算法,对元搜索环境中的文档分值进行迭代加权,以改变文档相关度分值,达到排序结果优化的效果。并在以上研究的基础上,实现了.分公司全文搜索引擎系统和集团元搜索引擎系统。
[Abstract]:With the rapid development of the global information process, all kinds of information in the form of electronic files in the form of people's storage devices get extremely rapid expansion at the same time, People also put forward higher requirements for obtaining information. How to get the information people want quickly and accurately in the vast number of electronic documents, The birth and development of search engines can solve this problem to a certain extent, but at the same time search engines also have their limitations. We cannot expect a single search engine to meet the variable search needs of users in different scenarios. Each branch in the group maintains its own search engine and provides a full-text search service for documents within the branch. But at the same time, the group also has the search requirements covering the documents of each branch, so it is necessary to build an enterprise-level meta search engine oriented to the group, which is quite different from the web meta search engine. The main concern here is the meta-search document sorting algorithm in a particular enterprise scenario. In this paper, the classic full-text search engine sorting algorithm and meta-search engine sorting algorithm are studied extensively and deeply, the characteristics and applicable scenarios of each sort algorithm are analyzed and summarized, and then the document scoring mechanism of Lucene is deeply discussed. This paper proposes a normalized formula for the application scenario of meta search engine to eliminate the local weighting of the improper value of document in the original Lucene member search engine. Finally, combining with the algorithm of document value class, In this paper, a hybrid weighted algorithm is proposed, in which the document scores in the meta search environment are weighted iteratively to change the document correlation score, and the classical ideas in the weighted class algorithm and the Hits algorithm are proposed. On the basis of the above research, we have realized the full-text search engine system and the group meta-search engine system of the branch company.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前3条
1 李广建,黄];元搜索引擎及其主要技术[J];情报科学;2002年02期
2 张磊;;搜索引擎综述[J];泰州科技;2008年08期
3 孔芳芳;;元搜索引擎系统的研究[J];科技创新导报;2009年35期
,本文编号:1573936
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1573936.html