基于Spark的混合推荐系统

发布时间：2018-03-31 20:42

本文选题：混合推荐　切入点：Spark　出处：《中国科学技术大学》2017年硕士论文

【摘要】：随着信息技术的快速发展,信息过载已经成为互联网领域面临的重要挑战。为了缓解互联网用户与海量数据间日益加剧的矛盾,研究人员提出了推荐系统的概念。作为推荐系统的一个重要分支,混合推荐系统通过组合多种推荐算法提高系统性能,目前广泛应用于电子商务、社交网络和视频网站等领域。然而,用户量与数据量的急速增长对混合推荐系统的性能提出了更高的要求。例如,视频网站要求混合推荐系统为用户精准推荐各类视频,并根据用户行为的变化训练新的模型,及时更新推荐结果。由于数据量的增加,开发人员难以利用经验确定各推荐算法对最终结果的影响程度。因此,粗粒度权重计算方法影响混合推荐系统的精度,增加开发难度。此外,由于系统基于大规模数据训练特征模型,训练过程包含大量迭代计算,使得训练一次模型的时间为一天甚至几天,难以满足用户对推荐系统效率的需求。本文通过分析不同的数据集、推荐算法以及权重计算方法的特点,引入适用于迭代计算的通用大规模数据处理平台Spark,设计并实现了基于Spark的混合推荐系统,以提高推荐系统的精度、多样性和效率。本文的主要工作及创新点如下:1.首先,本文提出一种细粒度权重计算方法,将各推荐算法的权值扩展为权重向量。该方法提高了评分预测推荐的精度,并有效缓解数据稀疏带来的冷启动问题:2.其次,本文基于大规模数据处理框架Spark,以细粒度权重计算方法为核心,设计实现细粒度权重混合子系统。该子系统基于分布式计算框架Spark降低模型训练时间,并利用细粒度权重计算方法提高推荐精度。实验结果表明,细粒度权重混合推荐比单一推荐算法的精度提高5%~30%,比粗粒度权重混合推荐的精度提高1.5%~3%。同时,该系统的模型训练速度比单机推荐系统提高了 90%,比基于Hadoop框架的推荐系统的训练时间提高了 2倍左右;3.最后,本文设计实现基于Spark的交叉调和推荐系统。该系统以细粒度权重混合子系统为核心,引入基于内容的推荐算法,实现了一个高精度、高效率、多样性和可扩展的混合推荐系统。
[Abstract]:With the rapid development of information technology, information overload has become an important challenge in the field of Internet. Researchers put forward the concept of recommendation system. As an important branch of recommendation system, hybrid recommendation system improves system performance by combining multiple recommendation algorithms, and is widely used in electronic commerce. However, the rapid growth in the number of users and the amount of data put higher demands on the performance of hybrid recommendation systems. For example, video sites require hybrid recommendation systems to recommend all kinds of videos to users accurately. According to the change of user behavior, the new model is trained to update the recommended results in time. Because of the increase of data volume, it is difficult for developers to use experience to determine the impact of each recommendation algorithm on the final result. The coarse-grained weight calculation method affects the precision of the hybrid recommendation system and makes it more difficult to develop. In addition, because the system is based on the large-scale data training feature model, the training process includes a large number of iterative calculations. This paper analyzes the characteristics of different data sets, recommendation algorithms and weight calculation methods, because the training time of a model is one day or even a few days, and it is difficult to meet the needs of users for the efficiency of recommendation system. A universal large-scale data processing platform, Spark, which is suitable for iterative computation, is introduced, and a hybrid recommendation system based on Spark is designed and implemented in order to improve the accuracy, diversity and efficiency of the recommendation system. The main work and innovations of this paper are as follows: 1. In this paper, a fine-grained weight calculation method is proposed, in which the weight of each recommendation algorithm is extended to a weight vector. This method improves the accuracy of prediction recommendation and effectively alleviates the cold start problem: 2. 2, which is caused by sparse data. Based on Spark-based large-scale data processing framework, a hybrid fine-grained weight subsystem is designed and implemented with fine-grained weight calculation method as the core. The subsystem is based on the distributed computing framework Spark to reduce the training time of the model. The experimental results show that the precision of the hybrid recommendation is increased by 5% than that of the single recommendation algorithm, and the accuracy of the mixed recommendation is 1.5% higher than that of the coarse-grained weight. The model training speed of the system is 90 times faster than that of the single machine recommendation system, and the training time of the recommendation system based on the Hadoop framework is about 2 times higher than that of the single machine recommendation system. In this paper, we design and implement a hybrid recommendation system based on Spark, which is based on the fine-grained weight hybrid subsystem, and introduces the content-based recommendation algorithm to realize a hybrid recommendation system with high precision, high efficiency, diversity and expansibility.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】