基于新浪微博的好友推荐系统设计与实现

发布时间：2018-08-12 13:14

【摘要】：随着互联网和移动通信技术的快速发展,越来越多的人使用新浪微博等社交平台进行交友和分享。数以亿计的用户在线交互产生了海量的数据,以至于出现"信息过载"现象,使得人们查找好友的时间比与好友交流的时间还长。为此,本文设计实现了一个好友推荐系统,向用户推荐可能成为好友的其他用户。本文通过网络爬虫方式获得目标用户的二度好友的个人信息和微博信息,然后通过分析采集到的数据,并基于用户兴趣相似度、用户间的地理相似度和用户影响力这三种因素来综合地向目标用户进行好友推荐。本文首先介绍了课题的研究背景与意义,并且分析了课题的国内外研究现状。接着,通过分析好友推荐系统的用户需求和功能需求,对系统进行了概要设计并对好友推荐系统进行了功能模块划分,设计了系统的数据库。然后,详细设计并实现了各个模块。其中,微博数据获取模块实现了一种基于用户好友关注关系的新浪微博爬虫。该爬虫通过广度优先搜索好友,获得目标用户的二度好友,通过解析网页获得好友的个人信息以及微博信息,完成了数据持久化,同时解决了使用微博公开API获取数据的多种限制问题。好友推荐模块通过对微博历史内容文本使用Ansj中文分词和TF-IDF算法提取文本特征词,并使用朴素贝叶斯分类算法对特征词进行分类以获得用户兴趣向量,同时通过余弦距离计算用户的兴趣相似度。接着,通过用户所在地信息和用户的签到数据来计算用户间距离,并将距离转换为地理相似度,通过正态分布函数对地理相似度归一化。然后,通过用户的粉丝数、发送的微博数以及发送的微博的转发量、评论量和点赞量来衡量用户的影响力。最后,通过分配不同的权重综合三种因素并且加上用户的教育背景和工作经历信息生成好友推荐列表,通过Top-N方法向用户进行好友推荐。实验结果表明,综合多因素的好友推荐结果比单一因素的推荐准确率更高。
[Abstract]:With the rapid development of Internet and mobile communication technology, more and more people use social platforms such as Sina Weibo to make friends and share. Hundreds of millions of online users interact with each other to generate so much data that "information overload" occurs, which makes it take longer to find friends than to communicate with friends. Therefore, this paper designs and implements a friend recommendation system to recommend other users who may become friends. In this paper, the personal information and Weibo information of the second degree friends of the target user are obtained by crawler method, and then the collected data are analyzed, and based on the similarity of user interest, The geographic similarity and user influence among users are three factors to make friend recommendation to the target user synthetically. This paper first introduces the research background and significance of the subject, and analyzes the current research situation at home and abroad. Then, by analyzing the user requirements and functional requirements of the friend recommendation system, the system is designed briefly, and the function modules of the friend recommendation system are divided, and the database of the system is designed. Then, each module is designed and implemented in detail. Among them, the Weibo data acquisition module implements a Sina Weibo crawler based on the user's friend relationship. The crawler gets the second best friend of the target user by searching for friends in the range first, and gets the personal information and Weibo information of the friend by analyzing the web page, and completes the data persistence. At the same time, the problem of using Weibo to expose API to obtain data is solved. The good friend recommendation module extracts the text feature words by using Ansj Chinese word segmentation and TF-IDF algorithm to the Weibo historical content text, and classifies the feature words by using naive Bayes classification algorithm to obtain the user interest vector. At the same time, the interest similarity of users is calculated by cosine distance. Then, the distance between users is calculated by user location information and user check-in data, and the distance is converted into geographical similarity, and the geographical similarity is normalized by normal distribution function. Then, the influence of the user is measured by the number of followers, the number of Weibo sent and the amount of Weibo forwarded, comments and likes. Finally, by assigning different weights to synthesize the three factors and adding the user's educational background and work experience information to generate a friend recommendation list, the Top-N method is used to recommend friends to the user. The experimental results show that the recommendation accuracy is higher than that of single factor.
【学位授予单位】：西南交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【相似文献】