基于查询子意图识别的检索结果多样化方法研究
本文选题:信息检索 + 查询子意图 ; 参考:《哈尔滨工业大学》2012年硕士论文
【摘要】:互联网的飞速发展使得人类的信息总量呈现出指数级增长的趋势,为了使用户能够更快更准确地在海量的资源中找到与当前需求相关的信息,信息检索技术应运而生。目前,搜索引擎作为信息检索的一个重要应用已经成为人们日常工作生活中不可获取的工具,它对用户提交的代表其搜索意图的查询词进行检索并按照文档与查询词的相似度的大小关系作为返回相关文档的排列顺序。然而,相同的查询词对于不同的用户来说所代表的查询意图可能并不相同,,造成这个结果的原因主要有两个:一是查询词可能存在着歧义性,二是查询词下可能涵盖着多个子意图。因此,仅仅考虑相似度的检索方式会使某些用户的需求不会被满足,检索结果应该考虑用户的多样化需求。为了满足用户的多样化需求,本文对检索结果多样化问题进行研究,提出了基于查询子意图识别的多样化方法,充分考虑返回文档集合与用户查询意图的相关性和返回文档之间的多样性。 本文中的基于查询子意图识别的多样化方法是在传统的显式多样化方法和隐式多样化方法的基础上发展而来的,兼有显式多样化方法中对原始查询下不同的子意图进行显式覆盖和隐式多样化方法中降低返回文档集合的冗余性的特性。方法中主要包括原始查询下的不同子意图的识别,不同子意图的权重大小关系预测和利用不同权重的子意图进行多样化结果排序三个方面。 因此本文主要从以下几个方面展开研究: 1.对原始查询下的不同子意图进行显示挖掘。将商业搜索引擎给出的与原始查询相关的Related查询和Suggested查询作为候选子查询,并利用人工标注的方式将不同的候选子查询划分为不同的子意图。同时,与其它三种挖掘候选子查询的方式进行性能比较,证明了我们使用的方法的有效性。 2.对不同类别的子意图进行权重预测。通过对6个月的浏览器用户日志的挖掘,提取出32个与子意图相关的特征,并利用SVM排序模型对不同类别的子意图进行权重预测。 3.对检索结果多样化问题进行分析,并提出基于查询子意图识别的检索结果多样化方法并给出算法的一般过程。通过与传统的显隐式多样化方法的性能上限和一种显式多样化的变种方法的性能进行比较,证明基于查询子意图识别的检索结果多样化方法的有效性。并对基于查询子意图识别的检索结果多样化方法的性能与子意图类别个数的关系进行分析。 通过在NTCIR9子意图挖掘任务的数据集合上的验证,证明我们使用的子意图挖掘方式具有很好的表现,为其它需要使用查询子意图的工作奠定了基础。通过在NTCIR9多样化结果排序任务的数据集合上与其它检索结果多样化方法的比较,证明基于查询子意图识别的检索结果多样化方法更能够满足用户对于多样化检索结果的需求。
[Abstract]:With the rapid development of the Internet, the total amount of human information is increasing exponentially. In order to enable users to find information related to the current needs in a large amount of resources faster and more accurately, information retrieval technology emerges as the times require. At present, as an important application of information retrieval, search engine has become an inaccessible tool in people's daily working life. It retrieves the query words submitted by the user representing their search intention and returns the relevant documents in the order according to the similarity between the documents and the query terms. However, the same query words may represent different query intentions for different users. There are two main reasons for this result: first, the query words may have ambiguity. Second, query words may cover multiple sub-intentions. Therefore, only considering the similarity of the retrieval method will make some users' needs will not be satisfied, and the retrieval results should take into account the diverse needs of users. In order to meet the diverse needs of users, this paper studies the diversification of retrieval results, and proposes a diversification method based on query sub-intention recognition. Fully consider the correlation between the return document set and the user's query intention and the diversity of the returned document. The diversification method based on query subintention recognition in this paper is based on the traditional explicit diversification method and implicit diversification method. In both explicit diversification methods and implicit diversification methods, the explicit coverage of different subintentions under the original query and the reduction of the redundancy of the return document set are presented. The method mainly includes three aspects: the recognition of different sub-intention under the original query, the prediction of the weight relation of different sub-intention and the ranking of the result by using the sub-intention of different weight. Therefore, this paper mainly carries out the research from the following aspects: 1. Display and mine the different subintentions of the original query. The Related query and Suggested query related to the original query given by the commercial search engine are used as candidate subqueries, and the different candidate subqueries are divided into different sub-intentions by manual annotation. At the same time, compared with the other three methods of mining candidate subqueries, the effectiveness of our method is proved. 2. The weight of different kinds of subintentions is predicted. By mining the browser user logs for 6 months, 32 features related to sub-intention are extracted, and the weight of different subintentions is predicted by using SVM sorting model. 3. This paper analyzes the diversification of retrieval results, and proposes a method for diversity of retrieval results based on query subintention recognition and gives the general process of the algorithm. By comparing with the performance upper limit of the traditional explicit and implicit diversification method and the performance of an explicit diversification variant method, the effectiveness of the retrieval result diversification method based on query subintention recognition is proved. The relationship between the performance of the method and the number of subintention categories is analyzed. Through the verification on the data set of the NTCIR9 sub-intention mining task, it is proved that the sub-intention mining method we use has a good performance, which lays the foundation for other work that needs to use the query sub-intention. By comparing the data set of NTCIR9 diversity result sorting task with other retrieval result diversification methods, it is proved that the search result diversification method based on query subintention recognition can better meet the needs of users for diversified retrieval results.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【共引文献】
相关期刊论文 前10条
1 区卫民;谭泗桥;袁哲明;柏连阳;熊洁仪;;SVR-KNN法用于除草剂QSAR研究[J];安徽农业科学;2008年35期
2 韩勇鹏;;SVM方法及其在乳制品分类问题上的应用[J];安徽农业科学;2009年08期
3 郭立萍;唐家奎;米素娟;张成雯;赵理君;;基于支持向量机遥感图像融合分类方法研究进展[J];安徽农业科学;2010年17期
4 张永生;魏新军;侯振雨;彭娟;;支持向量回归分光光度法同时测定苋菜红和果绿[J];安徽农业科学;2010年33期
5 张永生;;支持向量机在害虫预测预报中的应用[J];现代农业科技;2009年14期
6 冯学军;;最小二乘支持向量机的研究与应用[J];安庆师范学院学报(自然科学版);2009年01期
7 宋海滨;刘云帼;;基于支持向量机的预测控制算法[J];兵工自动化;2006年04期
8 徐波;;基于改进PSO-LSSVM的军用工程机械研制费用预测模型[J];兵工自动化;2011年10期
9 马喜波;阎爱侠;;支持向量机算法用于烷基苯的热容和标准焓值的预测[J];北京化工大学学报(自然科学版);2008年02期
10 刘华富;支持向量机Mercer核的若干性质[J];北京联合大学学报(自然科学版);2005年01期
相关会议论文 前10条
1 周绮凤;林成德;罗林开;彭洪;;一种基于黎曼度量的训练样本类不平衡SVM分类方法研究[A];第二十六届中国控制会议论文集[C];2007年
2 张向波;梅国建;徐宗昌;;基于SVM的装备战备完好性预测模型[A];第十届中国科协年会论文集(一)[C];2008年
3 张永生;魏新军;颜振敏;南海娟;;多元线性回归分光光度法同时测定饮料中三种色素[A];第十届中国科协年会论文集(三)[C];2008年
4 肖惠玲;曾翎;黄海莹;张琳;王昱清;杨勤;陈华富;;支持向量机探测脑功能活动[A];中国生物医学工程进展——2007中国生物医学工程联合学术年会论文集(下册)[C];2007年
5 王永春;;一种复合的支持向量机模型在电力系统短期负荷中的应用[A];第十届全国电工数学学术年会论文集[C];2005年
6 吴p
本文编号:1819469
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1819469.html