WAF改进算法在基于语义分析的查询扩展上的应用
发布时间:2018-04-27 00:14
本文选题:查询扩展 + 词激活力 ; 参考:《北京邮电大学》2012年硕士论文
【摘要】:查询扩展是信息检索中的一项重要技术,是辅助用户更好使用搜索引擎的有效手段。但是,随着互联网信息的复杂化和多远化,尤其是微博、微信等社交方式高速发展,传统的查询扩展算法由于忽略了文档中词间的语义关系,已无法在不规范的短文本上推荐出有效的关键词。传统检索模型的词独立性假设和短文本的信息缺失,导致现有查询扩展算法无法获取足够的语义信息,进入无法解决用户检索时普遍存在的同义词和多义词问题。 本文针对以上问题对经典的信息检索模型和查询扩展方法展开了深入调研,分析得出引发查询扩展问题的根本原因在于缺少行之有效的语义分析,本文创造性地提出将词激活力算法WAF应用在基于话题的查询扩展中,意在通过精准的语义分析手段为查询扩展的提高寻找突破口。 本文通过对WAF理论的深入学习,提出一种全新的基于WAF的查询扩展算法,主要工作如下: 第一,通过WAF与传统词关联算法在微博语料上的大量对比实验,证明了WAF在语义分析和词网建模上的巨大优势,尤其是话题核心词的扩展和高价值词的挖掘。 第二,针对短文本的不规范性和信息缺失,本文通过调整WAF中词激活力的计算方式,使其充分利用短文本特点,弱化噪声特征对于核心语义分析的影响。为了提高WAF的词扩展质量,本文提出在词网模型的基础上,通过词亲和度的整体分布对关联词列表的排序进行调整。 第三,本文将WAF的语义分析和话题聚类相结合,设计出一种较为完备的查询扩展算法,并且嵌入到微博监控项目的整体框架中,应用在微博语料的检索上。经过与基于BM25权重机制的查询扩展的对比实验,证明了WAF生成的词网模型在查询扩展中的巨大潜力。
[Abstract]:Query expansion is an important technology in information retrieval and an effective means to assist users to use search engine better. However, with the complexity and remoteness of Internet information, especially the rapid development of Weibo, WeChat and other social methods, traditional query expansion algorithms ignore the semantic relationship between words in the document. It is no longer possible to recommend valid keywords on an irregular essay. Because of the assumption of word independence in traditional retrieval model and the lack of information in short text, the existing query expansion algorithms can not obtain enough semantic information and can not solve the problem of synonyms and polysemous words commonly existing in user retrieval. In this paper, the classical information retrieval model and query expansion method are investigated, and the basic reason of the query expansion problem is the lack of effective semantic analysis. This paper creatively proposes to apply the word activation algorithm (WAF) to the topic based query expansion in order to find a breakthrough for the improvement of query expansion by means of precise semantic analysis. In this paper, a new query extension algorithm based on WAF is proposed through the in-depth study of WAF theory. The main work is as follows: First, through a large number of comparative experiments between WAF and traditional word association algorithm in Weibo corpus, it is proved that WAF has great advantages in semantic analysis and word net modeling, especially the expansion of topic core words and the mining of high-value words. Secondly, in view of the lack of information and the irregularity of short text, this paper adjusts the calculation method of word activation force in WAF to make full use of the feature of short text, and weakens the influence of noise feature on core semantic analysis. In order to improve the word extension quality of WAF, this paper proposes to adjust the ranking of associated words through the global distribution of word affinity on the basis of word net model. Thirdly, this paper combines the semantic analysis of WAF and topic clustering to design a more complete query expansion algorithm, and embed it into the overall framework of Weibo monitoring project, which is applied to the retrieving of Weibo corpus. By comparing with the query expansion based on BM25 weight mechanism, it is proved that the word net model generated by WAF has great potential in query expansion.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1
【参考文献】
相关期刊论文 前2条
1 胡佳妮,徐蔚然,郭军,邓伟洪;中文文本分类中的特征选择算法研究[J];光通信研究;2005年03期
2 林鸿飞,杨元生;用户兴趣模型的表示和更新机制[J];计算机研究与发展;2002年07期
相关硕士学位论文 前2条
1 杨海南;基于语义词典和局部分析的查询扩展研究[D];武汉理工大学;2010年
2 赵欣;基于双语命名实体识别的词汇对齐和机器翻译研究[D];厦门大学;2009年
,本文编号:1808313
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1808313.html