基于机器学习的个性化信息检索的研究

发布时间：2018-05-24 06:20

本文选题：信息检索 + 个性化　；参考：《吉林大学》2017年硕士论文

【摘要】：近几年来,互联网快速发展使得信息资源数据规模暴涨,促使了人们对于网络的依赖性不断的增加。快速的生活节奏使得大众在繁杂的网络中迅速而准确的获取自己想要的信息变得至关重要,搜索引擎作为普通大众寻找网络资源最为重要的入口,其重要性日趋明显。随着越来越多的用户依赖于搜索引擎获取资源,搜索引擎的体验的好坏已经严重影响着人们的生活,其中影响用户体验效果最为重要的就是检索的结果和用户需求的相关性程度。从当今搜索引擎的发展来看,目前的搜索引擎还远没有达到能返回完全符合用户需求的资源。决定搜索引擎返回结果和用户需求的相关性的关键性技术,是搜索引擎的检索模型,早期对于检索模型的主要是研究方向都是基于用户的输入搜索关键词对相关文档进行排序。但是通过研究发现存在两个问题,一个是用户可能对自己所要搜寻的资源不明确,二是用户通过搜索引擎输入的关键词通常不能完全表达自己的需求。基于以上两个问题,研究者们提出把机器学习应用到搜索引擎的检索模型中,但是这种方案目前还正在处于研究阶段,本文的目的就是讨论和研究怎样把机器学习应用到检索模型中,提高信息检索的准确率,缩短查询信息的时间。机器学习应用到信息检索中的方法称为学习排序,而目前常见的学习排序分为三类,单文档方法、文档对方法、文档列表法,其中文档列表法是机器学习应用于信息检索被认为最为有效的也最有研究前景的方法。目前在文档列表法中最为有效的方法是Christopher J.C.Burges提出的Lambda MART。本文提出结合用户个性化的信息来提高信息检索结果的准确性,即为个性化信息检索,个性化信息检索是一个弥补传统搜索引擎无法准确获取用户搜索意图的一种手段,针对如何把个性化信息加入搜索结果排序中,本文在Lambda MART算法的基础上对其进行了改进,结合了用户的个性化信息,包括用户的性别、年龄、职业、地址信息、历史网络浏览信息,然后根据用户的搜索关键词,预测用户的搜索意图并把预测结果融合在排序结果中。Lambda MART是以决策迭代树做为框架,并根据Rank Net和Lambda Rank来推出的负梯度方向做为每次迭代的方向,该梯度是具有实际的物理意义的梯度。并且该算法最大的优势在于能结合信息检索中的评价指标,使得其在实际应用中更加有效。本文提出在使用决策迭代树进行模型训练时,特征的选择加入用户的个性化信息,并对Lambda MART在无初始模型的情况下提出通过优化每次迭代的学习率来达到快速收敛的效果,解决了原始算法在无初始模型情况下无法训练的缺陷。接着本文对比了Rank Net、GBDT与本文采用的Lambda MART算法进行实验,通过MAP与NDCG指标得出结论,Lambda MART做为文档列表法算法在信息检索具有很大的优势。接着本文在Lambda MART的基础上加入个性化信息,提出了本文的个性化信息检索模型,与原始Lambda MART,以及Rank Net,GBDT进行实验对比,并参照MAP与NDCG指标发现,在加入个性化信息之后,模型的信息检索准确率有大幅度提升,尤其是在主题性较强的领域。本文不仅提出算法,给出算法的具体过程,给出了实验验证,并且在最后给出了实际应用结果数据。结果显示,本文的个性化信息检索模型,在检索准确率,以及用户满意度,对比原始的算法有较大的提升,个性化检索是信息检索的未来的方向,本文算法的提出,以及系统的设计实现对未来个性化检索都有重要的参考价值。
[Abstract]:In recent years, the rapid development of the Internet has made the scale of information resources skyrocketing, prompting people to continue to increase their dependence on the network. The fast pace of life makes it very important for the masses to get the information they want quickly and accurately in the complex network. Search engines are the most common people in search of network resources. The important entrance is becoming more and more important. As more and more users rely on the search engine to obtain resources, the experience of the search engine has seriously affected people's life. The most important thing that affects the effect of the user experience is the degree of correlation between the results of the retrieval and the needs of the users. The key technology to determine the correlation between the return of the search engine and the needs of the user is the retrieval model of the search engine. The main research direction of the early search model is based on the user's input search keyword to Xiang Guanwen. But through the study, there are two problems, one is that the user may not have clear resources to search for themselves, and the two is that the key words that the user input through the search engine usually do not fully express their needs. Based on the above two questions, the researchers bring up the retrieval model that applies the machine learning to the search engine. But this scheme is still at the stage of research. The purpose of this paper is to discuss and study how to apply machine learning to the retrieval model, to improve the accuracy of information retrieval and to shorten the time of query information. The method of applying the machine learning to information retrieval is called learning sort, and the common learning sort is at present. For the three class, single document method, document pair method, and document list method, document list method is the most effective and the most promising method for machine learning to be applied to information retrieval. The most effective method in the document list method is the Lambda MART. proposed by Christopher J.C.Burges, which combines user personalization in this paper. Information retrieval results are more accurate, that is, personalized information retrieval, personalized information retrieval is a means to make up for the traditional search engine can not accurately obtain the user's search intention. In view of how to sort the personalized information into the search results, this paper changes it on the basis of the Lambda MART algorithm. It combines the user's personalized information, including the user's gender, age, occupation, address information, historical network browsing information, and then according to the user's search key words, predict the user's search intention and merge the prediction results into the ranking results.Lambda MART is the decision of the iterative tree as the framework, and based on the Rank Net and Lambda Rank The negative gradient direction is introduced as the direction of each iteration, and the gradient is the gradient of actual physical meaning. And the greatest advantage of the algorithm is that it can combine the evaluation index in information retrieval so that it is more effective in practical application. The user's personalized information and the effect of fast convergence by optimizing the learning rate of each iteration by optimizing the learning rate of each iteration in the absence of the initial model, and solving the defects that the original algorithm can not train in the absence of the initial model. Then this paper compares the Rank Net, GBDT and the Lambda MART algorithm used in this paper to carry out the experiment. Through the MAP and NDCG indicators, it is concluded that the Lambda MART as the document list algorithm has a great advantage in information retrieval. Then the personalized information is added to the Lambda MART, and the personalized information retrieval model is proposed, which is compared with the original Lambda MART, as well as Rank Net, GBDT. It is found that after adding personalized information, the accuracy of information retrieval of the model has been greatly improved, especially in the field of strong theme. This paper not only proposes algorithms, gives the specific process of the algorithm, gives the experimental verification, and finally gives the actual application result data. The results show that the personalized information retrieval model of this paper is shown. In the retrieval accuracy and the user satisfaction, the original algorithm has been greatly improved. The personalized retrieval is the future direction of the information retrieval. The proposed algorithm and the design of the system have important reference value for the future personalized retrieval.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3;TP181

【相似文献】