元搜索引擎的结果合成算法研究
[Abstract]:Search engine provides users with great convenience for information retrieval, but the research shows that the search engine resource coverage still can not meet the needs, and the accuracy of the search engine needs to be improved. The meta-search engine integrates several independent search engines. It invokes its member search engines to complete the user search. Finally, it uniformly processes the returned result sets, and to some extent solves some problems existing in the search engines. It is widely used. At present, the core technologies of meta search engine are the analysis and transformation of retrieval request, the scheduling algorithm of member engine, the algorithm of composition of retrieval results, and so on. In this paper, we focus on the meta-search engine's result composition mechanism, and focus on the two parts of web page de-reduplication and result fusion ranking in the result composition mechanism. The results are very important to the performance of the meta-search engine, but there are still many shortcomings in the meta search engine. The main work of this paper is as follows: (1) this paper systematically studies the architecture and working principle of search engine and meta search engine, and analyzes the current situation of their research both at home and abroad. The key technologies of meta search engine are introduced in detail. (2) comparing and analyzing the existing search engine and the commonly used web page de-duplication algorithm in meta search engine, studying its advantages and disadvantages, combining with the characteristics of the result return of the meta search engine. In this paper, an algorithm based on the URL, title and summary is proposed, and different discriminant methods are proposed according to the characteristics of the URL, title and summary. It makes the rescheduling algorithm more accurate. (3) the classic search result sorting algorithm in meta search engine is studied, the advantages and disadvantages of different sorting algorithms are analyzed and summarized, and the Borda voting sorting method is emphatically studied, aiming at the shortage of Borda sorting. An improved algorithm combining location relation and query similarity is proposed, and the normalization method and similarity calculation method of result location are improved. (4) A meta-search engine prototype is proposed. On the basis of this system, the corresponding experiments are made on the proposed algorithm, and the experimental results are analyzed, and the performance of the algorithm is verified. At the end of the paper, the thesis summarizes the main work, innovation points and experimental process of this paper, and expounds the development direction of meta search engine and the future research problems.
【学位授予单位】:哈尔滨工程大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 苏君华;;搜索引擎评价研究综述[J];情报杂志;2011年04期
2 安和平;雷英杰;杜书华;崔三俊;;元搜索引擎研究[J];计算机工程与设计;2010年22期
3 周小平;黄家裕;刘连芳;梁一平;申文明;;基于网页正文主题和摘要的网页去重算法[J];广西科学院学报;2009年04期
4 刘四维;章轶;夏勇明;钱松荣;;基于HTML标记和长句提取的网页去重算法[J];微型电脑应用;2009年08期
5 吴小兰;汪琪;;元搜索引擎研究综述[J];图书情报工作;2009年09期
6 姚新波;马治坤;;基于特征串的网页去重算法[J];科技信息;2008年28期
7 谢蕙;秦杰;胡双双;;基于用户查询关键词的网页去重方法研究[J];现代图书情报技术;2008年07期
8 杨彬;康慕宁;;基于用户反馈的搜索引擎选择及结果归并[J];计算机工程;2007年24期
9 魏丽霞;郑家恒;;基于网页文本结构的网页去重[J];计算机应用;2007年11期
10 郭晨娟;李战怀;;基于概念的网页相似度处理算法研究[J];计算机应用;2006年12期
相关会议论文 前1条
1 彭渊;赵铁军;郑德权;于浩;;基于特征句抽取的网页去重研究[A];全国第八届计算语言学联合学术会议(JSCL-2005)论文集[C];2005年
相关硕士学位论文 前6条
1 李磊;个性化元搜索引擎关键技术的研究[D];内蒙古科技大学;2013年
2 栾艳;基于段落指纹的大规模近似网页检测算法研究[D];南京理工大学;2012年
3 王春艳;元搜索引擎的研究与实现[D];吉林大学;2011年
4 孟庆鑫;搜索引擎相关技术研究[D];中国科学技术大学;2011年
5 胡升泽;个性化元搜索引擎若干关键技术研究[D];国防科学技术大学;2008年
6 姚漫;基于文本聚类的网页消重算法研究[D];北京交通大学;2008年
,本文编号:2420245
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2420245.html