基于Hadoop架构的商业推荐引擎协同过滤算法设计与实现

发布时间：2018-05-19 05:02

本文选题：推荐引擎 + Hadoop　；参考：《电子科技大学》2016年硕士论文

【摘要】：推荐系统已被广泛使用在互联网的各个方面,其中电子商务的高速发展离不开个性化地推荐系统。尤其近年来,推荐系统越来越明显且有力地推动电子商务的业务增长。基于协同过滤的推荐算法是当今电子商务推荐系统中最被广泛采用的关键技术之一。在本文中通过全面介绍和研究当今推荐系统采用的主要推荐算法,例如基于内容、协同过滤算法等,以及学习了大数据处理平台Hadoop,简要介绍Hadoop的工作原理和Map Reduce计算方式和HDFS分布式存储平台。针对基于内存的协同过滤算法的缺点,从相似度和加权平均方法切入,提出若干算法改进,改善推荐质量和提高性能。对于采用皮尔逊系数的协同过滤算法来说,就是当两个用户同时评分的项目数目比较少的时候,那么皮尔逊相关系数表现不佳,通过引入默认预测值能较好地解决这个问题;当某个项目被很多人同时评分,那么它容易和别的项目评分度较高,通过采取TF-IDF来解决;通过引入指数算法,来惩罚相似度低的项目的权重来提高推荐质量;Weighted Slope One算法在保持预测准确度的情况下提高系统性能。在协同过滤推荐算法中,因稀疏的用户项目评分矩阵而导致的矩阵规模膨胀是一个十分棘手的问题。稀疏的用户评价矩阵大大加重系统计算的时间。在本文中通过研究各种对矩阵降维的方法,例如奇异值矩阵分解技术,非负矩阵因式分解等概率统计模型,解决稀疏矩阵的计算问题。由于电子商务平台的迅速发展带来的几千万的用户数和数以亿计的商品量(例如亚马逊商城,天猫商城和京东商城等),对现有的推荐系统运行性能提出了严峻的挑战。依靠单机的推荐系统难以承受如此海量用户和数据的计算量,因此如今电子商务网站普遍采用分布式集群计算机来实现商品推荐引擎。本文尝试基于Hadoop实现一个具有伸缩性、高弹性、高容灾性、稳定的商品推荐引擎的设计和实现。
[Abstract]:Recommendation system has been widely used in all aspects of the Internet, among which the rapid development of e-commerce can not be separated from personalized recommendation system. Especially in recent years, recommendation system is more and more obvious and powerful to promote the business growth of e-commerce. Collaborative filtering-based recommendation algorithm is one of the most widely used key technologies in e-commerce recommendation systems. In this paper, we introduce and study the main recommendation algorithms, such as content-based, collaborative filtering algorithms, which are used in today's recommendation systems. We also study the big data processing platform Hadoop, and briefly introduce the working principle of Hadoop, the Map Reduce computing method and the HDFS distributed storage platform. Aiming at the shortcomings of the memory-based collaborative filtering algorithm, this paper proposes some improvements from similarity and weighted average methods to improve the recommendation quality and performance. For the collaborative filtering algorithm with Pearson coefficient, when the number of items scored by two users at the same time is relatively small, then Pearson correlation coefficient is not good, through the introduction of default prediction value can solve this problem better; When a project is rated by many people at the same time, it is easy to score higher with other items, by adopting TF-IDF; by introducing an exponential algorithm. To punish the weights of items with low similarity to improve the quality of recommendation weighted Slope One algorithm improves system performance while maintaining prediction accuracy. In collaborative filtering recommendation algorithm, the expansion of matrix size caused by sparse user item scoring matrix is a very difficult problem. The sparse user evaluation matrix greatly increases the computing time of the system. In this paper, we study various methods to reduce the dimension of matrices, such as singular value matrix decomposition technique, non-negative matrix factorization and other probability and statistical models, to solve the problem of sparse matrix calculation. Due to the rapid development of e-commerce platform, tens of millions of users and hundreds of millions of goods (such as Amazon Mall, Tmall Mall and JingDong Mall, etc.) have posed a severe challenge to the performance of the existing recommendation system. It is difficult for a recommendation system to rely on a single computer to support such a huge amount of users and data, so nowadays e-commerce websites generally use distributed cluster computers to implement the commodity recommendation engine. This paper attempts to design and implement a product recommendation engine with scalability, high flexibility, high disaster tolerance and stability based on Hadoop.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3

【相似文献】