当前位置:主页 > 科技论文 > 搜索引擎论文 >

Max-Score查询处理优化技术研究

发布时间:2018-08-07 07:43
【摘要】:随着互联网的迅速发展,网络资源的信息量也急剧增长。面对海量数据、海量查询、实时响应的搜索引擎应用需求,如何高效地为用户查询提供实时的响应成为搜索引擎面临的一个重要问题。一种重要的方法是通过优化单机的查询处理性能来提高整个系统的检索效率。本文首先介绍了一些倒排索引查询处理技术的相关理论,包括倒排索引的结构、查询处理方式以及动态索引剪枝等内容。DAAT Max-Score算法是Top-k查询处理算法的经典算法之一。针对现有Max-Score算法中,初始阈值为0带来的“慢启动”问题,本文提出了一种基于查询划分以及一种基于双层索引结构的DAAT Max-Score算法。基于查询词划分的DAAT Max-Score算法根据用户提交查询词特点,利用TAAT方法对短查询集合的快速查询处理选择候选文档和提高初始阈值。而基于双层索引的DAAT Max-Score算法结合双层索引结构的特点,在构建双层索引结构时大幅降低了查询词在下层索引的全局最大分数,同样利用TAAT方法对上层索引的快速查询处理选择候选文档和提高初始阈值,两种改进算法均能有效减少非最终Top-k文档进入候选文档,从而改进查询处理性能。最后本文以两种改进算法为基础,对提出的两种改进算法有机结合,在Terrier平台上设计实现了索引检索系统。
[Abstract]:With the rapid development of the Internet, the amount of information of network resources is also increasing rapidly. In the face of the demand of search engine application for massive data, massive query and real-time response, how to efficiently provide real-time response to user query becomes an important problem facing search engine. An important method is to improve the retrieval efficiency of the whole system by optimizing the query processing performance of single machine. This paper first introduces some related theories of inverted index query processing technology, including the structure of inverted index, query processing method and dynamic index pruning. DAAT Max-Score algorithm is one of the classical algorithms of Top-k query processing algorithm. Aiming at the "slow start" problem caused by the initial threshold of 0 in existing Max-Score algorithms, this paper proposes a DAAT Max-Score algorithm based on query partitioning and a double-layer index structure. According to the characteristics of user submitted query words, DAAT Max-Score algorithm based on query word partition uses TAAT method to select candidate documents and raise initial threshold for fast query processing of short query sets. The DAAT Max-Score algorithm based on double-layer index combines the characteristics of double-layer index structure, and reduces the global maximum score of query words in the lower layer index greatly in the construction of double-layer index structure. The TAAT method is also used to select candidate documents and raise the initial threshold for fast query processing in the upper index. Both of the two improved algorithms can effectively reduce the non-final Top-k documents entering candidate documents and thus improve the query processing performance. Finally, based on two improved algorithms, an index retrieval system is designed and implemented on Terrier platform.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.3

【参考文献】

相关期刊论文 前1条

1 邓顺国;试论搜索引擎的发展趋势[J];图书馆理论与实践;2003年05期

相关博士学位论文 前2条

1 单栋栋;搜索引擎中索引剪枝的研究[D];北京大学;2013年

2 朱明杰;互联网搜索系统中的高性能查询问题研究[D];中国科学技术大学;2009年

相关硕士学位论文 前2条

1 罗会红;基于SSH和Lucene垂直搜索引擎研究[D];长沙理工大学;2011年

2 高磊;基于LUCENE的搜索引擎研究与实现[D];武汉理工大学;2007年



本文编号:2169339

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2169339.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e3ddc***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com