基于GPU的并行排序学习算法研究

发布时间：2018-03-26 11:44

本文选题：排序学习　切入点：有序对　出处：《哈尔滨工业大学》2012年硕士论文

【摘要】：搜索引擎的出现帮助用户在纷繁杂乱的互联网信息中寻找相关的信息，因此检索结果的排序是至关重要的。作为新兴信息检索技术的排序学习算法是解决互联网信息检索问题的新的解决方案。传统排序学习算法是在小规模文本基础上进行，针对互联网信息总量规模快速增加问题，大规模数据对于传统的排序学习算法出现瓶颈，排序学习算法的性能成为未来排序学习研究的一个方向。因此，本文提出新的排序学习算法并结合图形处理器（Graphic Processing Unit，GPU）并行计算技术并验证算法的效果。本文主要研究内容如下：（1）对排序学习算法的相关理论和GPU并行计算进行归纳和阐述，总结现有的排序学习算法，阐述排序学习算法的评价度量准则和并行编程模型。（2）深入分析信息检索技术特点并结合相关度更高的信息更重要的特点，本文采用基于有序对的排序学习算法研究方向。对数据输入空间进行重新划分，以大于偏序关系的文档对作为输入空间。（3）提出一种基于贝叶斯个性化排名框架的排序学习算法，即线性评分排序学习模型（Linear Scoring Learning to Rank Model，，LSLRM）。通过估计输入文档对的正确排序而构建的排序学习模型来解决查询排序问题，将排序学习训练模型问题转换为二值分类问题，并对特征进行分析，找出对相关度区分具有决定性的重要特征。（4）算法结合GPU并行编程模型和存储器模型等特点，解决排序学习算法在大规模数据的性能瓶颈。（5）实验证明基于GPU的并行排序学习算法的优越性。将本文算法与RankSVM-Struct等算法在微软亚洲研究院发布的基准实验结果进行对比分析，得到的结论是本文算法总体优于其他算法，在大规模数据集上相对于CPU取得10-11倍的加速比。
[Abstract]:The emergence of search engines has helped users find relevant information in a sprawling web of information. As a new information retrieval technology, sort learning algorithm is a new solution to Internet information retrieval problem. Traditional sorting learning algorithm is based on small text. In view of the problem of the rapid increase of the total amount of information on the Internet, there is a bottleneck in large-scale data for the traditional sort learning algorithm, and the performance of the sorting learning algorithm becomes a direction of future ranking learning research. In this paper, a new sort learning algorithm is proposed and the parallel computing technology of graphic Processing Unit GPU is combined to verify the effectiveness of the algorithm. The main contents of this paper are as follows:. The main contents of this paper are as follows: 1) summarize and expound the related theory of sorting learning algorithm and GPU parallel computing, summarize the existing sort learning algorithm, and expound the evaluation metric and parallel programming model of sort learning algorithm. 2) deeply analyzing the characteristics of information retrieval technology and combining with the more important characteristics of more relevant information, this paper adopts the research direction of ranking learning algorithm based on ordered pair, and redivides the data input space. Use document pairs larger than partial order as input space. This paper proposes a ranking learning algorithm based on Bayesian personalized ranking framework, that is, linear Scoring Learning to Rank ranking model LSLRM.A sort learning model is constructed by estimating the correct sort of input document pairs to solve the query scheduling problem. The problem of ranking learning training model is transformed into a binary classification problem, and the features are analyzed to find out the important features that are decisive to the classification of relevance. The algorithm combines the characteristics of GPU parallel programming model and memory model to solve the performance bottleneck of sorting learning algorithm in large-scale data. The experiment proves the superiority of parallel sorting learning algorithm based on GPU. By comparing the results of benchmark experiment published by RankSVM-Struct and this algorithm in Microsoft Asia Research Institute, the conclusion is that the algorithm in this paper is superior to other algorithms in general. An acceleration ratio of 10 to 11 times that of CPU is obtained on large data sets.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP391.3

【参考文献】