基于相关反馈的微博搜索结果优化
发布时间:2018-08-09 20:07
【摘要】:微博作为一种非常流行的社交方式,可以提供海量实时的文本信息,比如新闻事件,热门评论,热点话题等信息。微博搜索与传统搜索引擎有很大不同,因其实时性和社交性特点,用户对微博搜索的需求越来越大,如何对搜索结果进行优化也成为研究重点。相关反馈技术作为查询扩展中提升性能的关键技术,对结果优化有着重要影响。本文针对微博语料集,主要研究相关反馈技术,提出基于相关反馈的重排序算法,主要完成以下工作: 第一,本文提出了改进相关模型的反馈算法,对传统相关模型进行改进,提升话题扩展的检索性能;设计了基于词激活力的反馈算法,构建词网,挖掘话题词所激活的扩展词。 第二,本文针对微博特征和语料集特点,创新性地将扩展词结果作为特征融入到排序学习模型中,而不是直接进行二次检索;并且单独分析扩展词特征,URL内容特征对排序结果的影响,并提出了融合多种特征的重排序算法,对搜索结果进行优化。在TREC2011-2013微博评测的Twitter语料集上进行验证,实验证明该方法检索指标P@30, MAP等值均有大幅提高,最后设计并实现了基于相关反馈的微博搜索系统。
[Abstract]:As a very popular social way, Weibo can provide a large amount of real-time text information, such as news events, hot comments, hot topics and other information. Weibo search is very different from traditional search engine. Because of its real-time and social characteristics, users need more and more Weibo search, so how to optimize search results has become the focus of research. As a key technology to improve the performance of query extension, correlation feedback plays an important role in the optimization of results. In this paper, we mainly study the correlation feedback technology for Weibo corpus, and propose a reordering algorithm based on correlation feedback. The main work is as follows: first, this paper proposes a feedback algorithm to improve the correlation model. Improve the traditional related model to improve the retrieval performance of topic extension; design a feedback algorithm based on word activation to construct word network and mine the extended words activated by topic words. Secondly, according to the features of Weibo and corpus, we creatively incorporate the extended word results into the ranking learning model, instead of directly performing secondary retrieval. The influence of URL content features of extended words on the sorting results is analyzed separately, and a reordering algorithm is proposed to optimize the search results. It is verified on the Twitter corpus evaluated by TREC2011-2013 Weibo, and the experiment proves that the index of MAP and the index of MAP are improved greatly. Finally, a Weibo search system based on correlation feedback is designed and implemented.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1;TP391.3
本文编号:2175174
[Abstract]:As a very popular social way, Weibo can provide a large amount of real-time text information, such as news events, hot comments, hot topics and other information. Weibo search is very different from traditional search engine. Because of its real-time and social characteristics, users need more and more Weibo search, so how to optimize search results has become the focus of research. As a key technology to improve the performance of query extension, correlation feedback plays an important role in the optimization of results. In this paper, we mainly study the correlation feedback technology for Weibo corpus, and propose a reordering algorithm based on correlation feedback. The main work is as follows: first, this paper proposes a feedback algorithm to improve the correlation model. Improve the traditional related model to improve the retrieval performance of topic extension; design a feedback algorithm based on word activation to construct word network and mine the extended words activated by topic words. Secondly, according to the features of Weibo and corpus, we creatively incorporate the extended word results into the ranking learning model, instead of directly performing secondary retrieval. The influence of URL content features of extended words on the sorting results is analyzed separately, and a reordering algorithm is proposed to optimize the search results. It is verified on the Twitter corpus evaluated by TREC2011-2013 Weibo, and the experiment proves that the index of MAP and the index of MAP are improved greatly. Finally, a Weibo search system based on correlation feedback is designed and implemented.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1;TP391.3
【参考文献】
相关期刊论文 前3条
1 王莉军;;基于Indri的检索模型研究[J];电子设计工程;2012年24期
2 赵正文;康耀红;;统计语言模型在信息检索中的应用[J];计算机工程与应用;2006年36期
3 严华云;刘其平;肖良军;;信息检索中的相关反馈技术综述[J];计算机应用研究;2009年01期
,本文编号:2175174
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2175174.html