RankNet学习排序算法的一种改进

发布时间：2018-11-28 12:07

【摘要】：随着信息科技的迅猛发展,使用Search Engin获得网络资源是大众的生活方式。同时,海量的网页信息对搜索引擎的带来极大的挑战,比如如何快速准确的从信息的汪洋大海中找到用户想要的信息,如何将最有用的信息最先展现在用户搜索结果中。而衡量搜索引擎性能好坏的关键因素就是搜索排序算法。早期的网页排序算法考虑的排序因子比较简单,同时检索出结果的准确性难以保证。随着人工智能的不断发展,近年来机器学习和排序学习的研究也受到了国内外广大学者的广泛关注,排序学习算法在IR、协同过滤、NLP、情感分析、在线广告、系统推荐等领域发挥着重要作用,并且越来越多的人工智能学者把它作为热点研究方向。本论文旨在研究基于RankNet神经网络学习排序算法,该算法主要由Chris Burges等人第一次提出,并且在相关的搜索引擎中广泛采用,通过对RankNet神经网络算法的研究来提高网页搜索结果的用户体验。总结起来,论文的重点内容包括以下3点:(1)论文整体研究了排序算法的演变过程和现在研究状况,概要性地对Learn to Rank算法做了描述,其中对网页搜索排序算法的评价标准和优化方向做了相关研究,用于评价RankNet算法改进后的性能,做了两点改进和优化。(2)第一点改进:论文使用交叉熵和均方差的线性组合的损失函数改进了RankNet算法,对改进后的损失函数选取正确性加以证明,以解决原始算法中一个样本对儿中两个文档与查询的相关性大小被忽略的问题;第二点改进:通过增加查询的权重,解决了不同查询对应文档数量差异很大时,对学习过程产生的误导,使得算法训练出来的模型更加准确,实现了查询平等性。(3)最后运用BP神经网络模型对RankNet和改进后的算法在微软的数据集中进行验证比较,用不同的排序算法指标对改进前后的算法进行分析,表明改造损失函数后对排序的准确度有提升,验证了改进后的效果。
[Abstract]:With the rapid development of information technology, the use of Search Engin to obtain network resources is a popular way of life. At the same time, massive web information brings great challenges to search engines, such as how to quickly and accurately find the information users want from the ocean of information, how to first display the most useful information in the user search results. The key factor to measure the performance of search engines is the search sorting algorithm. The early web page sorting algorithm considered the sorting factor is relatively simple, and the accuracy of retrieval results is difficult to ensure. With the development of artificial intelligence, the research of machine learning and ranking learning has been paid more and more attention by many scholars at home and abroad in recent years. The sorting learning algorithm is applied in IR, collaborative filtering, NLP, emotional analysis, online advertising, and so on. System recommendation and other fields play an important role, and more artificial intelligence scholars regard it as a hot research direction. The purpose of this paper is to study the learning sorting algorithm based on RankNet neural network, which was proposed by Chris Burges et al for the first time and is widely used in related search engines. The RankNet neural network algorithm is studied to improve the user experience of web search results. To sum up, the main contents of this paper include the following three points: (1) the evolution process and current research status of the sorting algorithm are studied in this paper, and the Learn to Rank algorithm is described briefly. The evaluation standard and optimization direction of web search sorting algorithm are studied, which is used to evaluate the improved performance of RankNet algorithm. Two improvements and optimizations are made. (2) the first one is improved: the RankNet algorithm is improved by the linear combination of cross entropy and mean square error, and the correctness of the improved loss function is proved. In order to solve the problem that the correlation between two documents and query is ignored in a sample pair in the original algorithm; The second improvement: by increasing the weight of the query, it solves the misdirection of the learning process when the number of corresponding documents of different queries is very different, which makes the model trained by the algorithm more accurate. Finally, the BP neural network model is used to verify and compare the RankNet and the improved algorithm in Microsoft data set, and the improved algorithm is analyzed with different sorting algorithm indexes. It is shown that the accuracy of sequencing is improved after the loss function is modified, and the improved effect is verified.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18;TP391.3

【相似文献】