基于依存与排序学习技术的冗长查询处理
发布时间:2018-07-08 19:48
本文选题:冗长查询 + 查询扩展 ; 参考:《大连理工大学》2013年硕士论文
【摘要】:用户一般通过搜索引擎来查询所需信息,而搜索引擎的返回结果很多,每个检索结果并不一定都是符合用户需求的。同时由于教育文化背景的差异,用户即使有相同的查询意图,但其提交的查询关键词也是千差万别。因此,用户输入相同的查询关键词,返回同样的结果并不能令每个用户都满意。在信息检索系统中,一般通过查询扩展技术来提高检索性能。用户通常有时会把自己的详细需求信息全部输入至信息检索系统进行检索,即用户会输入冗长查询。这就给信息检索系统带来了压力,迫使检索系统不断进行改进以满足用户输入的冗长查询的需求。目前以往的搜索引擎在处理冗长查询时性能都会下降,检索结果不能聚焦在查询主题上,返回的信息不能满足用户需求。 本文提出了两种冗长查询处理方法:基于依存关系的冗长查询重构模型方法;基于语义与排序学习技术的冗长查询处理方法。 基于依存关系的冗长查询重构模型方法区别于基于关键词的处理方式,而是针对冗长查询自身所具有的特征,冗长查询中的词项间具有良好的语法关系。本方法对文档进行依存关系分析,由于依存关系类型很多有些会产生噪音,因此本文对依存关系类型进行了筛选抽取有效关系对,这点达到了缩短查询词的效果,之后对不同的关系对按其在重构模型中的重要程度分配不同的权重,这点达到了重新加权的效果。实验验证了本方法对特别是对低召回率的检索性能的提升作用,MAP和P@N两个评价指标上都有很大提升。 基于语义与排序学习技术的冗长查询处理方法。利用文档在不同主题空间上的分布以及计算其香农距离,并利用排序学习的方法对原始检索结果进行重新排序。这说明排序学习方法能够为冗长查询处理技术提供较大帮助。实验结果表明,对于冗长查询,不能像传统的查询扩展那样认为查询中每个词是相互独立的来看待,要充分利用其特有的依存语义信息才能够对其检索性能进行改进。 本文的实验所用的语料均来自公开数据集TREC标准语料,运用多种方式对本文的实验结果进行了评测,实验结果表明本文提出的两种对于冗长查询的处理技术对检索系统性能有了较大的改进。
[Abstract]:Users generally query the required information through search engines, but search engines return a lot of results, each search result does not necessarily meet the needs of the user. At the same time, because of the difference of educational and cultural background, even if the user has the same query intention, the key words submitted by the user are also very different. Therefore, user input the same query keywords, return the same results can not be satisfied with every user. In information retrieval system, query expansion technology is generally used to improve retrieval performance. Users sometimes input their detailed requirements information to the information retrieval system for retrieval, that is, users will enter lengthy queries. This puts pressure on the information retrieval system, forcing the retrieval system to continuously improve to meet the needs of lengthy queries input by users. At present, the performance of previous search engines in dealing with lengthy queries will decline, the retrieval results can not focus on the query topic, and the information returned can not meet the needs of users. In this paper, two methods of processing verbose queries are proposed: the method of reconstructing the model of verbose queries based on dependency, and the method of processing verbose queries based on semantic and sort learning techniques. The method of reconstructing the model of verbose query based on dependency is different from the processing method based on keywords, but aiming at the characteristics of the verbose query itself, there is a good grammatical relation among the words in the verbose query. This method analyzes the dependency relation of the document. Because there are many dependency types, some of them will produce noise, so we select and extract the effective relation pair for the dependent relation type, which can shorten the query words. After that, different relationships are assigned different weights according to their importance in the reconstruction model, which achieves the effect of reweighting. The experimental results show that the proposed method can improve the retrieval performance, especially for the low recall rate. Both map and P@ N are greatly improved. A long query processing method based on semantic and sorting learning techniques. The distribution of documents on different topic spaces and the Shannon distance are calculated, and the original retrieval results are reordered by sorting learning method. This shows that the sorting learning method can provide a great help for the lengthy query processing technology. The experimental results show that every word in the query can not be regarded as independent as the traditional query extension, and the retrieval performance can only be improved by making full use of its unique dependency semantic information. The data used in this paper are all from the public data set TREC standard corpus, using a variety of ways to evaluate the results of the experiment. The experimental results show that the two techniques proposed in this paper improve the performance of the retrieval system.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前2条
1 宋巍;张宇;刘挺;李生;;基于检索历史上下文的个性化查询重构技术研究[J];中文信息学报;2010年03期
2 崔航,文继荣,李敏强;基于用户日志的查询扩展统计模型[J];软件学报;2003年09期
相关博士学位论文 前1条
1 叶正;基于网络挖掘与机器学习技术的相关反馈研究[D];大连理工大学;2011年
相关硕士学位论文 前1条
1 李正华;依存句法分析统计模型及树库转化研究[D];哈尔滨工业大学;2008年
,本文编号:2108558
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2108558.html