统计视角下面向数据稀疏问题的协同过滤推荐算法研究

发布时间：2018-01-17 21:28

本文关键词：统计视角下面向数据稀疏问题的协同过滤推荐算法研究　出处：《重庆工商大学》2016年硕士论文　论文类型：学位论文

【摘要】：随着网络的普及以及电子商务的飞速发展,信息资源呈爆发式增长,用户在海量资源中快速而准确得找到自己喜欢的信息或商品变得越来越困难。为了解决这个问题,便产生了推荐系统。推荐算法一直是推荐系统的核心技术。目前,协同过滤推荐算法是众多推荐算法中应用最成功且最广泛的推荐技术。它主要根据用户留在网上的评分进行推荐。然而在实际应用中,由于用户数据和项目数据规模相当庞大,且用户对自己接触过的项目评分数量又非常有限,从而导致了严重的数据稀疏性问题,该问题是导致传统的协同过滤推荐算法推荐精度较差的主要原因之一。本文试图站在统计学的角度,针对数据稀疏性问题对协同过滤推荐算法进行研究。实现了基于描述性统计的简单推荐,并探究了将统计量填充、聚类分析、矩阵分解等方法应用到协同推荐算法中的效果。在详细分析了数据稀疏性问题的起因以及对协同推荐的影响途径基础上,本文提出了采用统计量填充的方法缓解数据稀疏性问题,进而用K-Means聚类方法对用户进行聚类,根据轮廓系数确定用户类别数,对每类用户的缺失评分使用同类别的评分统计量作为固定值进行填充。除了固定值填充缺失评分外,本文还采用奇异值分解(SVD)降维技术实现评分预测,利用预测评分对原始矩阵进行填充,形成新的用户—项目评分矩阵,再进行协同推荐。最后从推荐过程修正的角度出发,对传统的用户间相似度计算采用加权的方式进行改进,提出了基于用户偏好相似度与用户评分相似度进行加权计算用户间相似度的方法。采用MovieLens数据集对上述方法进行实验,通过平均绝对偏差(MAE)比较不同方法对推荐算法的改进效果,算法过程主要采用EXCEL,R语言辅助编程实现。实验证明,本文提出的方法均能在一定程度上缓解数据稀疏问题,从而提高推荐质量。统计量填充、聚类、相似度计算等都属于统计学中的基础方法,考虑将统计学方法应用于推荐领域,不应该只注重于繁杂的模型,将基础的统计方法加入到推荐算法的研究中来,也能够有效得解决推荐算法所面临的问题。在未来发展中,统计学方法将会应用于更多领域,获得更长足的发展。
[Abstract]:With the popularity of the network and the rapid development of electronic commerce, information resources are explosive growth. In order to solve this problem, it is becoming more and more difficult for users to find their favorite information or goods quickly and accurately in a large amount of resources. Recommendation algorithm has always been the core technology of recommendation system. Collaborative filtering recommendation algorithm is the most successful and widely used recommendation technology among many recommendation algorithms. Due to the large scale of user data and project data, and the limited number of items that users have come into contact with, it leads to serious data sparsity problem. This problem is one of the main reasons for the poor recommendation accuracy of the traditional collaborative filtering recommendation algorithm. To solve the problem of data sparsity, the collaborative filtering recommendation algorithm is studied. The simple recommendation based on descriptive statistics is realized, and the statistic filling and clustering analysis are explored. Matrix decomposition and other methods are applied to collaborative recommendation algorithms. Based on the detailed analysis of the causes of the data sparsity problem and its influence on collaborative recommendation. In this paper, the statistical filling method is proposed to alleviate the problem of data sparsity, and then K-Means clustering method is used to cluster the users, and the number of user categories is determined according to the contour coefficient. In addition to the fixed value fill the missing score, this paper also uses the singular value decomposition (SVD) to reduce the dimension to achieve the score prediction. The original matrix is filled with the prediction score to form a new user-item scoring matrix, and then collaborative recommendation is carried out. Finally, from the point of view of the revision of the recommendation process. The traditional similarity calculation between users is improved by weighted method. This paper proposes a method of calculating user similarity based on user preference similarity and user score similarity, and makes experiments on the above methods by using MovieLens data set. Through the mean absolute deviation (mae) to compare the improvement effect of different methods on the recommended algorithm, the algorithm is mainly implemented by excel language assisted programming. The methods proposed in this paper can alleviate the problem of data sparsity to a certain extent, thus improving the quality of recommendations, statistical filling, clustering, similarity calculation and other basic methods in statistics. Considering the application of statistical methods in the field of recommendation, we should not only focus on the complicated models, but also add the basic statistical methods to the research of recommendation algorithms. In the future, the statistical method will be applied in more fields and will make great progress.
【学位授予单位】：重庆工商大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3;F713.36

【相似文献】