基于hadoop大数据框架的个性化推荐系统研究与实现

发布时间：2018-11-22 19:42

【摘要】：信息过载问题在当今世界越来越突出,目前有三种比较成熟的处理方法,即网站导航、搜索引擎以及推荐系统。网站导航通过收录著名网站并分门别类的方式解决信息过载问题。而搜索引擎通过为海量网页建立索引的方式解决信息过载问题。但是当用户不能明确表述自己的需求时,前两者就略显无力了,而推荐系统就可以解决此类问题。推荐系统通过分析用户历史行为记录,主动为用户推荐其潜在感兴趣的内容。但是随着互联网的高速发展,信息量也呈几何倍数增加,传统的推荐系统在海量数据下容易遭遇计算瓶颈。此外传统推荐系统未充分考虑用户兴趣多变且呈现一定的离散性的问题。针对以上问题,本文参考以往推荐系统设计方案,以搜索引擎下图书的个性化推荐系统为目标,研究并实现一种基于潜在语义分析和分片聚类的混合推荐系统方案。并使用hadoop大数据处理框架解决推荐系统海量数据处理问题。本文首先研究搜索引擎下用户行为数据采集方法。分析搜索引擎下用户行为类型及其特性,针对各数据类型及其特性使用不同的数据采集方式以及标准化方法,从而完成用户行为数据采集工作。其次,针对搜索引擎下用户行为独特性和用户兴趣多变问题,提出潜在语义分析模型和分片聚类模型分别挖掘用户行为大数据下的长久兴趣和即时兴趣。其中,潜在语义分析推荐模型以内容进行推荐,可以缓解用户和图书冷启动问题,并提升系统推荐的覆盖率。而基于分片聚类的协同过滤推荐模型中的将用户行为按属性和内容分片,可以抽取出用户不同时期的兴趣,从而进一步提升推荐性能,且推荐结果具有一定的新颖性。此外,针对分片聚类过程中搜索引擎下用户相似度计算问题,提出一种基于用户检索词的改进混合类型数据相似度计算方法。最后,基于Hadoop大数据处理框架研究用户行为预处理以及推荐算法的并行化方法,完成搜索引擎下图书的个性化推荐系统的设计与实现。通过引入Hadoop大数据处理平台,设计并行化的推荐算法,系统处理海量数据的能力有很大提升。通过基于潜在语义分析的推荐模型和分片聚类的推荐模型协同作用,搜索引擎下图书的个性化推荐精准度和覆盖率也有一定改善。最后,通过系统测试以及算法实验证明其正确性。
[Abstract]:The problem of information overload is becoming more and more prominent in the world. There are three more mature methods, that is, website navigation, search engine and recommendation system. Website navigation through the collection of famous websites and classified ways to solve the problem of information overload. The search engine solves the problem of information overload by indexing massive web pages. However, when users can not express their needs clearly, the first two are slightly powerless, and recommendation system can solve such problems. The recommendation system actively recommends the content of potential interest to the user by analyzing the user's historical behavior record. However, with the rapid development of the Internet, the amount of information is increasing in geometric multiples. Traditional recommendation systems are prone to encounter computational bottlenecks under the massive data. In addition, the traditional recommendation system does not fully consider the problem that user interest is variable and present a certain degree of discreteness. In order to solve the above problems, this paper studies and implements a hybrid recommendation system based on latent semantic analysis and piecewise clustering, aiming at the personalized recommendation system of books under search engine. And use hadoop big data processing framework to solve the problem of mass data processing in recommendation system. This paper first studies the method of user behavior data acquisition under search engine. This paper analyzes the user behavior types and their characteristics under search engine, and uses different data collection methods and standardization methods according to different data types and their characteristics to complete user behavior data collection. Secondly, aiming at the problem of user behavior uniqueness and user interest variability under search engine, a latent semantic analysis model and a piecewise clustering model are proposed to mine the long-term interest and instant interest of user behavior big data respectively. Among them, the potential semantic analysis recommendation model recommends content, which can alleviate the cold start problem of users and books, and improve the coverage of system recommendation. In the collaborative filtering recommendation model based on piecewise clustering, user behavior can be segmented according to attributes and content, which can extract the interest of users in different periods, thus further improve the performance of recommendation, and the recommendation results have some novelty. In addition, an improved hybrid data similarity calculation method based on user search words is proposed to solve the problem of user similarity calculation under search engines in the process of segmented clustering. Finally, based on the Hadoop big data processing framework, the user behavior preprocessing and the parallelization of recommendation algorithm are studied, and the design and implementation of personalized recommendation system for books under search engine is completed. By introducing Hadoop big data processing platform and designing parallel recommendation algorithm, the system's ability to deal with massive data has been greatly improved. Through the collaborative effect of recommendation model based on latent semantic analysis and recommendation model based on piecewise clustering, the personalized recommendation accuracy and coverage of books under search engine are improved to some extent. Finally, it is proved to be correct by system test and algorithm experiment.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3

【参考文献】