当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于语言模型的个性化信息检索的方法与实现

发布时间:2018-03-02 01:32

  本文关键词: 信息检索 语言模型 查询扩展 用户模型 出处:《内蒙古大学》2013年硕士论文 论文类型:学位论文


【摘要】:由于互联网的快速发展,在繁多纷杂的信息中,如何辨别用户的真实意图,准确的从浩瀚的信息资源中找到所需的信息,成为当前信息检索领域一个较为关注的问题。在当今技术较为成熟的搜索引擎网站上,查全率及响应速度已经做得很好,但在查准率上始终难以让用户满意。 信息检索的主要目的,即:从众多的文档中找到符合用户查询需求的文档。传统的查询扩展重视原问句的扩展,但是忽略了扩展后查询问句中存在许多不必要的词汇,从而又阻碍了扩展后查询的准确性,因此不能从根本上表达用户查询意图。本文将从用户的个性化角度,对查询扩展进行研究。 本文为个性化的研究看出了两种检索方法,即:用户查询扩展模型和去掉扩展词的停用词表方法,两种方法的基本思想是源于查询优化,对用户的查询进行查询扩展或是查询词的删减。用户模型主要是通过结合个体用户所涉及到的主题领域对其查询问句进行扩充,扩展后的新查询可以提高用户的准确率和查全率。而去掉扩展词的停用词是将通过原始查询进行伪相关扩展后的新查询问句的研究,在不同的领域基础上总结得出查询问句的停用词表,以减少新的查询问句中词的不必要词,将其所分配的概率值重新分配,加大了原始查询词的概率值。 本文在语言模型的基础上,利用现有的成熟技术,从新的角度来研究查询问句扩展,通过实验,进一步改进查询问句的方法,利用用户兴趣模型,提高用户的检索结果。我们将在文中详细讨论各种检索模型中查询扩展的方法。经过实验训练,验证本文提出用户查询扩展和提出的不同领域的停用词表。
[Abstract]:Because of the rapid development of the Internet, how to distinguish the real intention of the user and find the needed information from the vast information resources in the numerous and complicated information, It has become a more concerned problem in the field of information retrieval. Recall rate and response speed have been done well on search engine websites with more mature technology, but it is always difficult to satisfy users in recall rate. The main purpose of information retrieval is to find documents from many documents that meet the needs of users. Traditional query expansion attaches importance to the expansion of the original question, but ignores the existence of many unnecessary words in the extended query. Therefore, the accuracy of the extended query can not be expressed fundamentally. In this paper, the query expansion will be studied from the user's personalized point of view. In this paper, we find out two retrieval methods for personalized research, that is, user query extension model and the method of removing extended word table. The basic idea of the two methods is from query optimization. The user model mainly extends the query questions by combining the subject areas of the individual users. The extended new query can improve the accuracy and recall of the user. In order to reduce the unnecessary words in the new query question and redistribute the probability value, the probabilistic value of the original query word can be increased by summing up the stop word list of the query question on the basis of different fields. On the basis of the language model, this paper makes use of the existing mature technology to study the expansion of query questions from a new perspective. Through experiments, we further improve the method of querying question sentences, and use the user interest model. In this paper, we will discuss in detail the methods of query expansion in various retrieval models. Through experimental training, we verify the proposed user query expansion and the proposed discontinuation tables in different domains.
【学位授予单位】:内蒙古大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3

【引证文献】

相关硕士学位论文 前2条

1 李云飞;基于查询日志的动态查询扩展研究[D];内蒙古大学;2016年

2 丁凯朝;信息检索中虚拟域重排技术的研究与实现[D];内蒙古大学;2014年



本文编号:1554484

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1554484.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户81adc***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com