协同过滤推荐算法研究及MapReduce实现

发布时间：2019-03-06 10:00

【摘要】：随着互联网技术的高速发展,数据信息呈现出爆炸式增长,互联网将人类带入了大数据时代。用户要在海量数据中挑选出自己真正需要的信息好比大海捞针,如何在众多信息中迅速挖掘用户感兴趣的关键信息并推送给用户,成为当下学界和业界共同关注的热点问题。近年来,推荐系统作为一种智能的个性化信息服务技术在国内外得到迅速崛起,并在电子商务、视频娱乐、社交网络等多个领域得到广泛应用。经过多年的发展,推荐系统已经衍生出基于内容的推荐、基于数据挖掘的推荐、协同过滤推荐等多种推荐技术。其中,协同过滤推荐技术是应用最为广泛的推荐技术。但是,协同过滤推荐算法存在着数据稀疏、推荐精度低等问题,特别在大数据背景下,协同过滤推荐算法的数据稀疏问题、推荐精度问题被进一步放大,使之成为推荐系统的发展和应用的瓶颈。基于此,本文完成了如下工作:第一,针对协同过滤推荐系统中的数据稀疏性问题,提出了基于专家用户和项目信任度的数据填充方法。该方法根据专家信任度值,选择评分数量多、评分质量好的用户作为专家用户。同时,该方法综合考虑项目评分数和标准差作为项目信任度的评估值,使信任度高的项目作为可行项目,并采用专家用户的评分对高信任度项目的缺失项进行填充,从而在保证填充质量的前提下有效降低数据的稀疏度,并通过实验验证该算法的有效性。第二,结合K-Means算法和基于项目的协同过滤推荐算法,提出了基于聚类和非对称权重混合相似度的协同过滤推荐算法(CFCA)。该算法首先完成了基于评分稳定项目的K-Means聚类,然后在类中采用非对称权重混合相似度进行相似度计算,并据此给出推荐结果。该算法综合考虑项目之间共同用户评分的交叠状况和项目的评分数,提高了相似度计算的准确性,进而提高推荐质量。针对本文提出的算法,论文完成了在不同条件下CFCA算法与传统协同过滤推荐算法的实验对比。实验结果表明,本文提出的算法,能够有效的提高算法的推荐精度。第三,为提高算法效率、降低算法运算时间,本文设计了CFCA算法MapReduce并行编程模型,并完成了该模型下数据预处理、基于评分稳定项目的K-Means聚类、基于非对称权重混合相似度计算和预测评分阶段的并行化处理。通过并行运算解决了算法处理的效率问题。
[Abstract]:With the rapid development of Internet technology, the data and information shows explosive growth. The Internet has brought human beings into the era of big data. Users want to pick out the information they really need from the massive data is like looking for a needle in a haystack. How to quickly mine the key information that the user is interested in and push it to the user among the numerous information becomes a hot issue which is concerned by both the academic circles and the industry at present. In recent years, as an intelligent personalized information service technology, recommendation system has emerged rapidly at home and abroad, and has been widely used in many fields such as e-commerce, video entertainment, social network and so on. After years of development, the recommendation system has derived a variety of recommendation technologies, such as content-based recommendation, data mining-based recommendation, collaborative filtering recommendation and so on. Among them, collaborative filtering recommendation technology is the most widely used recommendation technology. However, the collaborative filtering recommendation algorithm has some problems such as sparse data and low recommendation precision, especially in the background of big data, the data sparse problem of collaborative filtering recommendation algorithm, and the recommendation accuracy problem is further enlarged. Make it become the bottleneck of the development and application of recommendation system. In order to solve the problem of data sparsity in collaborative filtering recommendation system, a data filling method based on expert users and project trust is proposed in this paper. According to the trust value of experts, this method selects users with many scores and good quality as expert users. At the same time, the method comprehensively considers the project score and standard deviation as the evaluation value of the project trust, makes the project with high trust as a feasible item, and uses the score of the expert user to fill the missing item of the high trust item. As a result, the sparsity of the data is effectively reduced under the premise of ensuring the filling quality, and the effectiveness of the algorithm is verified by experiments. Secondly, combining the K-Means algorithm and the item-based collaborative filtering recommendation algorithm, a collaborative filtering recommendation algorithm (CFCA). Based on the mixed similarity of clustering and asymmetric weights is proposed. The algorithm first completes the K-Means clustering based on the score-stable items, and then computes the similarity degree by using the mixed similarity degree of asymmetric weights in the class, and then gives the recommended result. This algorithm considers the overlap of common user scores between items and the score of items, improves the accuracy of similarity calculation, and then improves the quality of recommendation. According to the algorithm proposed in this paper, the CFCA algorithm is compared with the traditional collaborative filtering recommendation algorithm under different conditions. The experimental results show that the proposed algorithm can effectively improve the recommendation accuracy of the algorithm. Thirdly, in order to improve the efficiency of the algorithm and reduce the operation time, this paper designs the CFCA algorithm MapReduce parallel programming model, and completes the data preprocessing under the model, and the K-Means clustering based on the grading stable item. Parallel processing based on mixed similarity calculation of asymmetric weights and prediction scoring stage. The efficiency of the algorithm is solved by parallel operation.
【学位授予单位】：四川师范大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3

【参考文献】