推荐系统中矩阵分解算法研究

发布时间：2018-01-05 01:04

本文关键词：推荐系统中矩阵分解算法研究　出处：《中国科学技术大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着在线服务的快速发展,互联网上拥有的信息量呈现爆炸性增长趋势,导致人们很难有效地获取感兴趣的内容。推荐系统是帮助用户发现符合其兴趣偏好的物品,缓解信息过载问题的有效工具之一。矩阵分解算法是目前推荐系统领域研究的前沿之一。该算法将用户对物品的评分矩阵分解为隐因子空间上用户、物品隐因子矩阵,具有理论基础好、预测准确性高等诸多优点。目前,矩阵分解算法仍存在学习速率调参耗时、倾斜数据集上并行效果差和稀疏评分数据集上推荐效果不理想等问题。本文围绕矩阵分解算法,深入分析其存在的三个问题,并提出了相应的改进方案。本文的主要内容和贡献如下:我们提出了一个求解矩阵分解模型的自适应学习速率算法。随机梯度下降算法是求解矩阵分解模型的有效算法之一,其性能很大程度上依赖于训练过程中学习速率的调整方案。在优化矩阵分解算法的目标函数时,由于学习速率选取的不合适,目标函数会出现收敛速度慢、收敛结果不理想等问题。本文在分析各种学习速率方案缺点的基础上,提出了一个求解矩阵分解模型的自适应学习速率算法AALRSMF。该算法来源于ADADELTA算法,不需要手动设置全局学习速率,并且表现出对超参数选择的鲁棒性。和ADADELTA算法相比,AALRSMF算法将空间复杂度从O(k(m+ n))降低为O(m + n),将每次迭代的计算代价减少了 O(10k)。实验结果表明,AALRSMF算法能够显著地减少目标函数收敛的迭代次数。我们提出了一个并行矩阵分解算法。矩阵分解算法的并行化一直是一个研究热点,但是当用户评分矩阵倾斜时,已有的并行矩阵分解算法会导致目标函数出现收敛速度慢、收敛结果不理想等问题。本文在分析已有的并行算法在分解倾斜评分矩阵缺点的基础上,提出了一个基于KD树的并行矩阵分解算法KDMF。该算法利用KD树对用户评分矩阵进行划分,使得每个分区块中的评分数目尽可能相近,然后基于部分匹配查询,设计出一个异步调度算法,最小化调度分区块的时间花费。实验结果表明,KDMF算法能够显著地减少目标函数收敛所需要的时间,并且收敛结果好于其他并行算法。我们提出了一个基于文本上下文的矩阵分解算法。在实际应用中,用户评分矩阵往往是极度稀疏的。在这样的数据集上,传统的矩阵分解算法学习效果往往不理想。研究人员考虑将一些辅助信息加入到矩阵分解模型中,来提高推荐算法的性能。本文基于前人的工作,并针对他们工作中存在的问题,提出了一个基于字符表征信息的矩阵分解算法CharConvMF。该算法将物品的文本内容作为辅助信息,利用深度卷积神经网络从字符的角度提取文本内容的表征信息,然后将提取到的表征信息集成到基于邻居的矩阵分解算法中。实验结果表明,即使是在用户评分矩阵极度稀疏的情况下,CharConvMF算法评分预测的准确性依然好于其他矩阵分解算法。
[Abstract]:With the rapid development of online services, the amount of information on the Internet have explosive growth trend, so it is hard to effectively acquire the content of interest. Recommendation system is to help users find consistent with its preference items, one of the effective tools to alleviate the information overload problem. Matrix decomposition algorithm is one of the forefront of research in the field of recommender system. The algorithm will score matrix of the user on the items of the decomposition for the user implicit factor space, implicit item factor matrix has good theoretical basis and predictive accuracy advantages. At present, there are still learning rate matrix decomposition algorithm to adjust the parameters of time, the inclination of data sets in parallel effect and sparse data sets on the recommendation results score is not ideal. This paper focuses on the matrix decomposition algorithm, in-depth analysis of the three existing problems, and puts forward the corresponding improvement scheme. The main contents of this paper And the contributions are as follows: we propose an adaptive matrix decomposition model learning rate algorithm. The stochastic gradient descent algorithm is one of the most effective algorithms to solve the matrix decomposition model, its performance depends largely on the training process of learning rate adjustment scheme. The optimization objective function matrix decomposition algorithm, the learning rate is selected not appropriate, the objective function will appear slow convergence speed, convergence result is not ideal. The learning rate based on the analysis of various disadvantages, the adaptive learning rate AALRSMF. algorithm this algorithm derived from a ADADELTA algorithm for solving matrix decomposition model, do not need to manually set the global learning rate, and showed the the robustness of parameter selection. Compared with ADADELTA algorithm, AALRSMF algorithm, the space complexity from O (K (m+ n)) decreased to O (M + n), each iteration The computational cost is reduced by O (10K). The experimental results show that AALRSMF algorithm can significantly reduce the number of iterations. The convergence of the objective function we propose a parallel matrix decomposition algorithm. The parallel matrix decomposition algorithm is a hotspot, but when the user score matrix tilt, parallel matrix decomposition algorithm will the target function appeared slow convergence speed and the convergence result is not ideal. Based on the analysis of the existing parallel algorithms in the decomposition of tilt score matrix defects, we propose a parallel matrix decomposition algorithm KDMF. the algorithm uses KD tree to classify user rating matrix based on the KD tree, the number of scores for each partition in the block as close as possible, and then partial match query based on the design of an asynchronous scheduling algorithm, scheduling to minimize the time spent in blocks. The experimental results show that KDMF algorithm Method can significantly reduce the time required for convergence of objective function, and the convergence results are better than other parallel algorithm. We propose a matrix decomposition algorithm based on the context of the text. In practical application, the user score matrix is often very sparse. In this data set, the traditional matrix decomposition algorithm learning effect is often not ideal. The researchers consider some auxiliary information into the matrix decomposition model, to improve the performance of the recommendation algorithm. In this paper, based on the previous work, aiming at the existing problems in their work, proposed a matrix based on the character of the characterization information of the decomposition algorithm of CharConvMF. algorithm the text content items as auxiliary information. Characterization of information extraction of text from the perspective of characters by using the depth of convolutional neural network, and then to extract the characterization of information integration to neighbors based on matrix In the decomposition algorithm, experimental results show that even if the user rating matrix is extremely sparse, the prediction accuracy of CharConvMF algorithm is still better than that of other matrix decomposition algorithms.

【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【相似文献】