基于谱聚类的个性化推荐系统研究
发布时间:2019-05-10 14:25
【摘要】:随着web2.0和电子商务的快速发展,信息资源正在指数型增长。目前,解决信息过载的一种有效方法就是采用推荐系统,而协同过滤是推荐系统中运用最广泛的算法,但是其依然存在数据稀疏性、可扩展性以及冷启动等问题。与此同时,大多数个性化推荐系统往往忽略用户本身的一些特征属性,比如年龄、性别和职业,在用户-项目评分数据难以获得的情况下,会严重影响个性化推荐系统的推荐精度。在分析比较各种常用个性化推荐算法及相关技术之后,本文以数据稀疏性和冷启动问题为立足点,旨在提高个性化推荐系统的推荐精度并降低推荐算法的时间复杂性,对基于谱聚类的个性化推荐系统进行了研究,具体研究内容包括:(1)将谱聚类引进到个性化推荐系统中,利用加权核模糊聚类和初始质心选择算法对谱聚类进行改进,并修正Person相关性。最后将改进的谱聚类和相似性度量方法与协同过滤结合,得到了两种改进的基于用户谱聚类的协同过滤推荐算法。在MovieLens 100K数据集上,上述两种算法的平均绝对误差(Mean Absolutre Error, MAE)以及均方根误差(Root Mean Square Error, RMSE)较传统的K-means聚类协同过滤算法至少降低了4%,运行时间至少减少了一半;在MovileLens 1M数据集上,MAE与RMSE值至少改善了2%,运行时间减少了80%。(2)基于用户特征属性,提出了用户年龄、性别、职业的预处理方式,获得用户特征属性矩阵后,提出了基于用户特征属性谱聚类协同过滤算法。(3)针对偏差奇异值分解(Bias Singular Value Decomposition, BSVD)算法存在的过拟合问题,综合利用用户特征属性和用户-项目历史评分记录,将上述所提出的基于用户特征属性谱聚类与BSVD模型相结合,并在模型中增加了一个新用户判断来解决冷启动问题,最后得到了一种改进的推荐算法。在MovieLens 100K数据上,该算法与BSVD分解算法相比较,其MAE和RMSE值至少减少了6%,在数据集MovieLens 1M上,MAE与RMSE值至少降低了2%。实验表明,该算法不仅提高了推荐准确率并具有一定的可扩展性。(4)利用已有的数据集合设计多个实验,将提出的算法与传统的算法进行验证比较,通过实验可以得出,将谱聚类运用到个性化推荐系统中能够大大地提高预测精度和系统的实时响应速度,最终为企业和商家带来更大的经济收益。
[Abstract]:With the rapid development of web2.0 and e-commerce, information resources are growing exponential. At present, one of the effective methods to solve information overload is to use recommendation system, and collaborative filtering is the most widely used algorithm in recommendation system, but it still has some problems, such as data sparsity, scalability and cold start. At the same time, most personalized recommendation systems tend to ignore some of the user's own characteristics, such as age, gender and occupation, when user-project rating data is difficult to obtain. It will seriously affect the recommendation accuracy of personalized recommendation system. After analyzing and comparing various commonly used personalized recommendation algorithms and related technologies, this paper takes the data sparsity and cold start problem as the foothold, in order to improve the recommendation accuracy of personalized recommendation system and reduce the time complexity of recommendation algorithm. In this paper, the personalized recommendation system based on spectral clustering is studied. The main contents are as follows: (1) the spectral clustering is introduced into the personalized recommendation system, and the weighted kernel fuzzy clustering and initial centroid selection algorithm are used to improve the spectral clustering. The correlation of Person was corrected. Finally, the improved spectral clustering and similarity measurement are combined with collaborative filtering, and two improved collaborative filtering recommendation algorithms based on user spectral clustering are obtained. On MovieLens 100K datasets, the average absolute error (Mean Absolutre Error, MAE) and root mean square error (Root Mean Square Error, RMSE) of the above two algorithms are at least 4% lower than those of the traditional K-means clustering collaborative filtering algorithm. Running time has been reduced by at least half; On the MovileLens 1m data set, the MAE and RMS values are improved by at least 2%, and the run time is reduced by 80%. (2) based on the user feature attributes, the pre-processing method of the user's age, gender and occupation is proposed, and the user feature attribute matrix is obtained. A cooperative filtering algorithm based on user feature attribute spectrum clustering is proposed. (3) aiming at the problem of over-fitting existing in deviation singular value decomposition (Bias Singular Value Decomposition, BSVD) algorithm, the user feature attribute and user-project history score record are comprehensively utilized. The proposed spectral clustering based on user characteristics is combined with the BSVD model, and a new user judgment is added to the model to solve the cold start problem. Finally, an improved recommendation algorithm is obtained. On MovieLens 100K data, compared with BSVD decomposition algorithm, the MAE and RMSE values of this algorithm are reduced by at least 6%, and the MAE and RMSE values are reduced by at least 2% on the dataset MovieLens 1m. The experimental results show that the algorithm not only improves the recommendation accuracy and has a certain degree of scalability. (4) using the existing data sets to design a number of experiments, the proposed algorithm is compared with the traditional algorithm, and the experimental results can be obtained. The application of spectral clustering to personalized recommendation system can greatly improve the prediction accuracy and real-time response speed of the system, and finally bring greater economic benefits to enterprises and businesses.
【学位授予单位】:福建农林大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.3
[Abstract]:With the rapid development of web2.0 and e-commerce, information resources are growing exponential. At present, one of the effective methods to solve information overload is to use recommendation system, and collaborative filtering is the most widely used algorithm in recommendation system, but it still has some problems, such as data sparsity, scalability and cold start. At the same time, most personalized recommendation systems tend to ignore some of the user's own characteristics, such as age, gender and occupation, when user-project rating data is difficult to obtain. It will seriously affect the recommendation accuracy of personalized recommendation system. After analyzing and comparing various commonly used personalized recommendation algorithms and related technologies, this paper takes the data sparsity and cold start problem as the foothold, in order to improve the recommendation accuracy of personalized recommendation system and reduce the time complexity of recommendation algorithm. In this paper, the personalized recommendation system based on spectral clustering is studied. The main contents are as follows: (1) the spectral clustering is introduced into the personalized recommendation system, and the weighted kernel fuzzy clustering and initial centroid selection algorithm are used to improve the spectral clustering. The correlation of Person was corrected. Finally, the improved spectral clustering and similarity measurement are combined with collaborative filtering, and two improved collaborative filtering recommendation algorithms based on user spectral clustering are obtained. On MovieLens 100K datasets, the average absolute error (Mean Absolutre Error, MAE) and root mean square error (Root Mean Square Error, RMSE) of the above two algorithms are at least 4% lower than those of the traditional K-means clustering collaborative filtering algorithm. Running time has been reduced by at least half; On the MovileLens 1m data set, the MAE and RMS values are improved by at least 2%, and the run time is reduced by 80%. (2) based on the user feature attributes, the pre-processing method of the user's age, gender and occupation is proposed, and the user feature attribute matrix is obtained. A cooperative filtering algorithm based on user feature attribute spectrum clustering is proposed. (3) aiming at the problem of over-fitting existing in deviation singular value decomposition (Bias Singular Value Decomposition, BSVD) algorithm, the user feature attribute and user-project history score record are comprehensively utilized. The proposed spectral clustering based on user characteristics is combined with the BSVD model, and a new user judgment is added to the model to solve the cold start problem. Finally, an improved recommendation algorithm is obtained. On MovieLens 100K data, compared with BSVD decomposition algorithm, the MAE and RMSE values of this algorithm are reduced by at least 6%, and the MAE and RMSE values are reduced by at least 2% on the dataset MovieLens 1m. The experimental results show that the algorithm not only improves the recommendation accuracy and has a certain degree of scalability. (4) using the existing data sets to design a number of experiments, the proposed algorithm is compared with the traditional algorithm, and the experimental results can be obtained. The application of spectral clustering to personalized recommendation system can greatly improve the prediction accuracy and real-time response speed of the system, and finally bring greater economic benefits to enterprises and businesses.
【学位授予单位】:福建农林大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 杨艺芳;王宇平;;基于核模糊相似度度量的谱聚类算法[J];仪器仪表学报;2015年07期
2 居斌;钱l勌,
本文编号:2473717
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/2473717.html