元搜索引擎的结果合成算法研究

发布时间：2019-02-12 08:16

【摘要】：搜索引擎为用户进行信息检索提供了很大的便利,但是研究表明,搜索引擎的资源覆盖率还是不能满足需求,而且在准确率方面也有待提高。元搜索引擎集成了多个独立的搜索引擎,它调用其成员搜索引擎来完成用户检索,最后统一处理返回的结果集,在一定程度上解决了搜索引擎存在的一些问题,得到了广泛使用。目前,有关元搜索引擎的核心技术研究有检索请求的分析与转换,成员引擎的调度算法,检索结果的合成算法等。本文的研究重点是元搜索引擎的结果合成机制,针对结果合成机制中的网页去重和结果融合排序两大部分进行了研究。结果去重和排序对元搜索引擎的性能非常重要,而现在有关元搜索引擎的去重和排序还存在许多不足之处,本文针对这些问题进行研究,论文的主要工作有:(1)本文系统性地研究了搜索引擎与元搜索引擎的体系结构及工作原理,并且对各自的国内外研究现状做了分析,并详细介绍了元搜索引擎的关键技术。(2)针对现有的搜索引擎与元搜索引擎中常用的网页去重算法进行了比较分析,研究了其优缺点,结合元搜索引擎的结果返回特点,提出利用返回结果的URL、标题和摘要的来去重的算法,并针对URL、标题和摘要各自的特点提出了不同的判别方法,使得去重算法更准确。(3)研究了元搜索引擎中经典的检索结果排序算法,对不同的排序算法的优缺点进行了分析总结,重点研究了 Borda投票排序法,针对Borda排序的不足,提出了结合位置关系与查询相似度的改进算法,并对结果位置的规范化方法和相似度计算方法进行了改进。(4)提出了一个元搜索引擎系统原型,在此系统之上对提出的去重算法和排序算法做了相应的实验,对实验结果进行了分析,验证了算法的性能。论文的最后对全文进行了总结,全面总结了本文的主要工作,创新点以及实验的过程,并对元搜索引擎的发展方向及以后的研究问题进行了阐述。
[Abstract]:Search engine provides users with great convenience for information retrieval, but the research shows that the search engine resource coverage still can not meet the needs, and the accuracy of the search engine needs to be improved. The meta-search engine integrates several independent search engines. It invokes its member search engines to complete the user search. Finally, it uniformly processes the returned result sets, and to some extent solves some problems existing in the search engines. It is widely used. At present, the core technologies of meta search engine are the analysis and transformation of retrieval request, the scheduling algorithm of member engine, the algorithm of composition of retrieval results, and so on. In this paper, we focus on the meta-search engine's result composition mechanism, and focus on the two parts of web page de-reduplication and result fusion ranking in the result composition mechanism. The results are very important to the performance of the meta-search engine, but there are still many shortcomings in the meta search engine. The main work of this paper is as follows: (1) this paper systematically studies the architecture and working principle of search engine and meta search engine, and analyzes the current situation of their research both at home and abroad. The key technologies of meta search engine are introduced in detail. (2) comparing and analyzing the existing search engine and the commonly used web page de-duplication algorithm in meta search engine, studying its advantages and disadvantages, combining with the characteristics of the result return of the meta search engine. In this paper, an algorithm based on the URL, title and summary is proposed, and different discriminant methods are proposed according to the characteristics of the URL, title and summary. It makes the rescheduling algorithm more accurate. (3) the classic search result sorting algorithm in meta search engine is studied, the advantages and disadvantages of different sorting algorithms are analyzed and summarized, and the Borda voting sorting method is emphatically studied, aiming at the shortage of Borda sorting. An improved algorithm combining location relation and query similarity is proposed, and the normalization method and similarity calculation method of result location are improved. (4) A meta-search engine prototype is proposed. On the basis of this system, the corresponding experiments are made on the proposed algorithm, and the experimental results are analyzed, and the performance of the algorithm is verified. At the end of the paper, the thesis summarizes the main work, innovation points and experimental process of this paper, and expounds the development direction of meta search engine and the future research problems.
【学位授予单位】：哈尔滨工程大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3

【参考文献】