面向稀疏性数据的协同过滤推荐算法的研究

发布时间：2018-03-19 01:32

本文选题：推荐系统　切入点：数据稀疏性　出处：《吉林大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着互联网和电子商务的迅速发展,网络上的信息迅速膨胀,出现了“信息过载”现象。个性化推荐技术能够帮助用户快速、准确地从杂乱无章的信息找到用户所需的信息,一定程度上缓解了“信息过载”问题。作为当前应用最广泛的个性化推荐技术之一,协同过滤技术在现实应用中已经获得了相当大的成功,但是由于现实的数据往往都十分稀疏,导致了协同过滤技术出现数据稀疏性问题。冷启动问题可以看作是数据稀疏性问题的极端情况,本文将其视为数据稀疏性问题研究。数据稀疏性问题严重影响了协同过滤推荐算法的推荐质量。引起数据稀疏性问题是由于推荐系统中的用户数量和项目数量越来越多,用户对项目的评分数量又很少,这样用户评分矩阵必然很稀疏,而协同过滤算法又非常依赖用户评分矩阵。为了解决数据稀疏性问题,研究人员针对用户评分矩阵提出了许多方法,主要分两大类:第一类对评分矩阵进行填充,降低其稀疏程度;第二类是对评分矩阵进行分解,删除对计算相似度影响不大的用户和项目,降低评分矩阵维度。在第二类方法中,选择删除的信息很可能会含有用户的有用信息,影响推荐质量,所以本文选择在第一类方法的基础上解决推荐系统里的数据稀疏性问题。具体工作如下:1)针对用户冷启动问题,提出了融合用户特征和项目关系的协同过滤算法(User-Item-Mix CF)。传统的协同过滤算法在计算用户间相似性时,没有考虑项目之间的关系,这样会导致计算出的用户相似性不准确。基于该问题本文提出一种融合项目关系的用户间相似性计算方法(Item-Based User Sim),旨在提高用户间相似性计算的准确性;其后,在改进的用户间相似性算法的基础上,在计算用户相似性时,加入了用户特征属性,并通过动态平衡权值?将其与项目之间的关系融合,提出User-Item-Mix CF算法。最后,在Movie Lens数据集上,将User-Item-Mix CF算法与众数法进行对比实验,实验结果表明:在选取不同的新用户个数时,User-Item-Mix CF算法的平均绝对误差(MAE)值均小于众数法。2)针对数据稀疏性问题,提出了基于用户评分预测的协同过滤算法(User-SP CF)。该算法在计算项目之间相似性时,利用Item-Based User Sim算法计算用户间的相似性,并将计算得到的用户间相似性值填充到评分矩阵中未评分的项,降低矩阵稀疏性;在填充得到的评分矩阵中,寻找目标项目的最近邻居集,完成推荐。最后在Movie Lens数据集上,将User-SP CF算法同基于项目评分预测的协同过滤算法和基于项目的协同过滤算法进行对比实验,实验结果表明:在选取不同邻居个数时,User-SP CF算法的平均绝对误差(MAE)值均小于另外两种算法。
[Abstract]:With the rapid development of the Internet and electronic commerce, the information on the network expands rapidly, and the phenomenon of "information overload" appears. Personalized recommendation technology can help users find the information they need quickly and accurately from the random information. To some extent, it alleviates the problem of "information overload". As one of the most widely used personalized recommendation technologies, collaborative filtering technology has achieved considerable success in practical applications. However, due to the fact that the data are often very sparse, the problem of data sparsity in collaborative filtering technology is caused. The cold start problem can be regarded as the extreme case of data sparsity problem. In this paper, the problem of data sparsity is considered as a study of data sparsity, which seriously affects the recommendation quality of collaborative filtering recommendation algorithm. The problem of data sparsity is caused by the increasing number of users and items in the recommendation system. In order to solve the problem of data sparsity, the user rating matrix is very sparse, and the collaborative filtering algorithm relies heavily on the user score matrix to solve the problem of data sparsity. Researchers have proposed a number of methods for user rating matrices, which are divided into two main categories: the first is to fill the scoring matrix to reduce its sparsity, and the second is to decompose the scoring matrix. Delete users and items that have little effect on computing similarity, and reduce the score matrix dimension. In the second method, the information selected to delete is likely to contain useful information of users and affect the quality of recommendation. So this paper chooses to solve the problem of data sparsity in recommendation system based on the first method. In this paper, a collaborative filtering algorithm combining user features and item relationships is proposed. The traditional collaborative filtering algorithm does not consider the relationship between items when calculating the similarity between users. This will lead to inaccurate user similarity calculation. Based on this problem, this paper proposes an Item-Based User simulation method to improve the accuracy of user similarity calculation. On the basis of the improved similarity algorithm between users, the user characteristic attribute is added in the calculation of user similarity, and the dynamic balance weight is adopted. The relationship between User-Item-Mix CF and the project is fused, and the User-Item-Mix CF algorithm is proposed. Finally, on the Movie Lens data set, the User-Item-Mix CF algorithm is compared with the mode method. The experimental results show that the average absolute error (mae) of User-Item-Mix CF algorithm is smaller than that of mode method. In this paper, a collaborative filtering algorithm based on user score prediction is proposed, which uses Item-Based User Sim algorithm to calculate the similarity between users when calculating the similarity between items. The calculated similarity value between users is filled into the ungraded items in the score matrix to reduce the sparsity of the matrix. In the filled score matrix, the nearest neighbor set of the target item is found and the recommendation is completed. Finally, on the Movie Lens data set, The User-SP CF algorithm is compared with the co-filtering algorithm based on item score prediction and the co-filtering algorithm based on item. The experimental results show that the mean absolute error (mae) of the User-SP CF algorithm is lower than that of the other two algorithms when the number of neighbors is selected.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】