基于Hadoop面向社交网络的好友推荐系统的研究与应用

发布时间：2018-11-03 17:26

【摘要】：在进入到2000年以来,互联网技术迅速发展,早已深入到我们的日常生活中,像一些购物网站、交友网站、视频网站每天都会有大量数据产生,人们面临着一个巨大的问题就是信息过载,搜索引擎和推荐系统都可以很好的解决信息过载的问题。与搜索引擎不同的是,推荐系统不需要用户本身主动去手动查询,当用户本身不知道自己需要什么的时候,推荐系统可以根据海量数据进行分析,挖掘出用户的兴趣爱好,发现有价值的内容。在我国最著名的社交网站新浪微博中有着很多用户,而这些用户每天都在自己的微博里发表各种各样的评论或者心情、内容等等,我们可以从这些微博内容里获取用户相关的兴趣爱好,提供个性化好友推荐,在此基础上,本文提出了基于Map Reduce编程模型的分布式并行化算法,设计和实现了一个基于Hadoop的好友推荐系统。主要工作内容如下:1.重点研究了基于内容的推荐算法在好友推荐系统中的应用,主要研究了TF-IDF算法,并提出了TF-IDF算法的不足,在特征词的分布方面进行改进,最后得到改进后的TF-DFI-DFO算法,并对TF-DFI-DFO算法和原始TF-IDF算法进行相关实验,对改进后的TF-DFI-DFO算法进行评估。2.对好友推荐系统的设计和实现,分别对数据采集、数据处理和推荐决策模块进行详细的分析,重点在推荐决策模块里,对TF-DFI-DFO算法进行Map Reduce分布式实现进行分析。3.在Map Reduce模型下对TF-DFI-DFO算法进行分布式实现,然后对得到的结果建立空间向量模型,计算文本之间的相似度,最终得到推荐结果。
[Abstract]:Since the beginning of the year 2000, Internet technology has developed rapidly and has already penetrated into our daily life, such as some shopping websites, dating websites, video sites, and there are a lot of data generated every day. People face a huge problem is information overload, search engine and recommendation system can solve the problem of information overload. Unlike search engines, recommendation systems do not require users to manually query themselves. When users themselves do not know what they need, the recommendation system can be analyzed according to massive data to find out the interests of users. Find valuable content. There are a lot of users in our country's most famous social networking site, Sina Weibo, and these users make various comments, feelings, content and so on every day in their Weibo. We can get user related interests from these Weibo content and provide personalized friend recommendation. On this basis, this paper proposes a distributed parallelization algorithm based on Map Reduce programming model. A friend recommendation system based on Hadoop is designed and implemented. The main work is as follows: 1. This paper focuses on the application of content-based recommendation algorithm in friend recommendation system, mainly studies the TF-IDF algorithm, and puts forward the deficiency of TF-IDF algorithm, and improves the distribution of feature words. Finally, the improved TF-DFI-DFO algorithm is obtained, and the TF-DFI-DFO algorithm and the original TF-IDF algorithm are tested, and the improved TF-DFI-DFO algorithm is evaluated. 2. For the design and implementation of friend recommendation system, the data acquisition, data processing and recommendation decision-making module are analyzed in detail, especially in the recommendation decision module, and the distributed implementation of TF-DFI-DFO algorithm based on Map Reduce is analyzed. 3. The distributed implementation of the TF-DFI-DFO algorithm is carried out in the Map Reduce model, and then the spatial vector model is established to calculate the similarity between the texts, and finally the recommended results are obtained.
【学位授予单位】：西安工程大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP391.3

【相似文献】