基于协同过滤的冷用户相似度算法

发布时间：2018-03-14 05:00

本文选题：推荐系统　切入点：协同过滤　出处：《安徽工业大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着互联网技术以及电子商务的迅速发展,网络服务信息的种类和数量越来越多,由此造成了不可避免的信息过载现象。推荐系统在这种背景下应运而生。协同过滤算法是推荐系统的核心,它根据用户的历史活动行为和个人信息来挖掘其兴趣偏好,帮助他们找到最感兴趣的产品或服务,为用户提供高质量的个性化推荐。然而在实际应用中,该技术也面临着一系列挑战性问题。冷启动和数据稀疏是目前协同过滤推荐技术中尚未得到有效解决的关键问题。现存的基于协同过滤的用户相似度算法在计算冷启动用户与其他用户的相似度时,仅仅使用评分矩阵中的数值评分,忽略了用户间共同评分的偏好差异、用户自身的评分偏好以及项目流行度对用户相似度的影响。这种情况下得到的用户相似度的准确性将大大地降低,因此很难准确高效地预测出目标用户的兴趣,最终导致协同过滤推荐算法产生的推荐结果准确率不高。本文针对目前基于协同过滤的用户相似度算法在处理新用户冷启动和数据稀疏问题时存在的一些问题进行了详细的分析研究,提出了一些改进思路,并取得了一定的研究成果。主要内容和创新点可归纳如下:1)提出一种考虑用户间共同评分偏好差异的启发式相似度算法。该算法基于PIP和MJD相似度算法的思想,利用用户间共同评分的差值信息来计算用户相似度。首先通过用户间共同评分的差值比例计算出共同评分的各偏好权重;然后计算出每一种差值下的三种影响因子:Proximity、Impact和Popularity;最后,通过加权得到一个全局的用户相似度。该算法同时考虑了评分数据的特地领域含义和用户间共同评分的偏好差异。有效地避免了不合理的用户相似度增加,提高了相似用户的区分度,共同评分偏好的权重计算相对简单且不耗时。2)提出一种考虑流行度和用户评分差异的启发式相似度算法。该算法由三个相似度因子(PMSD、SD和Preference)构成,考虑了在一个特定的数据集中项目流行度对用户相似度的影响,并将其与均方差结合。算法充分利用了用户评分信息包括数值信息和非数值信息,表达了用户间不同的特征。另外,新算法还考虑了用户间共同评分的偏好差异,并根据偏好差异给出不同的惩罚值。最后引入用户评分的均值和方差反映了用户的个人评分偏好。本文通过实验测试了两种新算法的性能,并与其他传统的和改进的用户相似度算法进行了比较。实验对比结果和理论分析表明,在新用户冷启动和数据稀疏条件下,本文所提算法在MAE、覆盖度、准确度以及召回率上都取得了比较优越的表现,显著地提高了协同过滤算法的预测精度和推荐系统的推荐质量。
[Abstract]:With the rapid development of Internet technology and electronic commerce, the variety and quantity of network service information are more and more. Under this background, collaborative filtering algorithm is the core of recommendation system, which is based on the user's historical activity behavior and personal information. Help them find the product or service they are most interested in and provide users with high-quality personalized recommendations. However, in practical applications, This technology also faces a series of challenging problems. Cold start and data sparsity are the key problems that have not been effectively solved in collaborative filtering recommendation technology. The existing user similarity algorithm based on collaborative filtering is computing. When a cold boot user is similar to other users, Using only the numerical scores in the scoring matrix ignores the differences in preferences between users for common ratings. In this case, the accuracy of the user similarity will be greatly reduced, so it is difficult to accurately and efficiently predict the interest of the target user. Finally, the recommendation accuracy of collaborative filtering recommendation algorithm is not high. This paper aims at some problems existing in the current user similarity algorithm based on collaborative filtering in dealing with the problems of cold start and data sparsity of new users. Have carried out detailed analysis and research, Some improvements are put forward. The main contents and innovations can be summarized as follows: 1) A heuristic similarity algorithm considering the differences of users' common scoring preferences is proposed. The algorithm is based on the idea of PIP and MJD similarity algorithms. The user similarity is calculated by using the difference information of the common score between users. First, the weight of each preference of the common score is calculated by the difference ratio of the common score among users; then the three influence factors under each difference are calculated:: maximum impact and popularity. finally, A global user similarity is obtained by weighted method. The algorithm takes into account the special domain meaning of the scoring data and the preference difference of the users' common rating, which effectively avoids the unreasonable increase of user similarity. A heuristic similarity algorithm considering the difference between popularity and user rating is proposed. The algorithm is composed of three similarity factors, PMSDSD and preference. The influence of item popularity on user similarity in a given dataset is considered and combined with RMS. The algorithm makes full use of user scoring information, including numerical information and non-numerical information. In addition, the new algorithm also takes into account the differences in preferences of users' common scores. Finally, the mean and variance of the user's score reflect the user's personal rating preference. The performance of the two new algorithms is tested by experiments in this paper. Compared with other traditional and improved user similarity algorithms, the experimental results and theoretical analysis show that under the new user cold start and data sparse conditions, the proposed algorithm in mae, coverage, The accuracy and recall rate are superior to each other and the prediction accuracy of collaborative filtering algorithm and the recommendation quality of recommendation system are improved significantly.
【学位授予单位】：安徽工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3;F274

【参考文献】