基于多目标决策的微博用户影响力评价算法研究
发布时间:2018-07-18 20:32
【摘要】:微博作为社交媒体的一种形式,具有信息传播快、来源广和多角度等特征,已经成为人们日常信息交流和分享的主要渠道,吸引了国内外学者的广泛关注。研究用户影响力对于微博的用户推荐、信息扩散、舆情监测和定向营销等具有重要的意义。首先,通过分析微博消息传播机制,构建了微博网络模型,并把微博网络细分为两个网络:用户关系网络和博文传播网络。然后,结合新浪微博的特征,为了有效地避免“僵尸粉”的影响,防止用户采用对博文的自我转发、自我评论等操作来恶意提升自己的影响力,从用户关系网络和博文传播网络两个角度,定义了四个评价用户影响力的指标:LeaderRank影响力、博文平均被转发数、博文平均被评论数和博文平均被赞数。在此基础上,为避免给不同指标确定合适的权重参数,引入了多目标决策中经典的Skyline计算方法,提出了WeiboLeaderRank影响力评价算法,并分析了该算法的特点。为了验证算法的有效性,使用网络爬虫技术,设计并实现了新浪微博数据采集系统,建立了包含125207个用户的微博研究数据集。由于微博服务器检测到异常的访问请求时,会采取重定向访问请求或禁止用户访问等措施,这会严重影响采集的速度。为解决这一问题,采用了多账号模拟登陆,一个账号开启一个线程,多线程同时采集的方法。线程使用匿名代理服务器请求数据,并动态改变请求HTTP头部信息,同时加入异常检测模块,及时发现异常情况并采取相应的操作,尽量模仿正常的用户访问行为,提高采集效率。最后在采集的数据集上进行实验,验证了四个影响力评价指标的有效性,并把WeiboLeaderRank算法和其他常用的用户影响力算法进行比较,结果表明WeiboLeaderRank算法评价效果更好,并且计算时间是随着数据量地增长而线性增加的,算法能适应超大规模的真实微博环境,同时具有较好的实时性。
[Abstract]:As a form of social media, Weibo has the characteristics of fast information dissemination, wide sources and multiple angles. It has become the main channel for people to exchange and share information on a daily basis, and has attracted wide attention of scholars at home and abroad. The study of user influence is of great significance for Weibo user recommendation, information diffusion, public opinion monitoring and targeted marketing. Firstly, by analyzing the mechanism of Weibo message propagation, the Weibo network model is constructed, and the Weibo network is subdivided into two networks: the user relationship network and the blog transmission network. Then, according to the features of Sina Weibo, in order to effectively avoid the influence of "zombie powder" and prevent users from using self-forwarding, self-comment and other operations to increase their influence maliciously. From the perspective of user relationship network and blog post communication network, this paper defines four indexes to evaluate the influence of users: LeaderRank influence, the average number of posts being forwarded, the average number of comments and the average number of likes of blog posts. On this basis, in order to avoid determining appropriate weight parameters for different indexes, the classical Skyline calculation method in multi-objective decision making is introduced, and Weibo LeaderRank influence evaluation algorithm is proposed, and the characteristics of the algorithm are analyzed. In order to verify the validity of the algorithm, a Sina Weibo data acquisition system is designed and implemented by using web crawler technology, and a Weibo research data set including 125,207 users is established. When the Weibo server detects an abnormal access request, it will take measures such as redirecting the access request or prohibiting the user from accessing the request, which will seriously affect the speed of the acquisition. In order to solve this problem, multi-account simulation login, one account opened a thread, multi-thread at the same time. The thread uses anonymous proxy server to request data and dynamically changes the request HTTP header information. At the same time, the thread adds anomaly detection module, finds the abnormal situation in time and takes appropriate actions to imitate the normal user access behavior as far as possible. Improve the efficiency of collection. Finally, experiments are carried out on the collected data sets to verify the effectiveness of the four impact evaluation indexes. The Weibo LeaderRank algorithm is compared with other commonly used user influence algorithms. The results show that the Weibo LeaderRank algorithm is more effective than the Weibo LeaderRank algorithm. The computation time increases linearly with the increase of data volume. The algorithm can adapt to the large scale real Weibo environment and has good real-time performance.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.09
本文编号:2132690
[Abstract]:As a form of social media, Weibo has the characteristics of fast information dissemination, wide sources and multiple angles. It has become the main channel for people to exchange and share information on a daily basis, and has attracted wide attention of scholars at home and abroad. The study of user influence is of great significance for Weibo user recommendation, information diffusion, public opinion monitoring and targeted marketing. Firstly, by analyzing the mechanism of Weibo message propagation, the Weibo network model is constructed, and the Weibo network is subdivided into two networks: the user relationship network and the blog transmission network. Then, according to the features of Sina Weibo, in order to effectively avoid the influence of "zombie powder" and prevent users from using self-forwarding, self-comment and other operations to increase their influence maliciously. From the perspective of user relationship network and blog post communication network, this paper defines four indexes to evaluate the influence of users: LeaderRank influence, the average number of posts being forwarded, the average number of comments and the average number of likes of blog posts. On this basis, in order to avoid determining appropriate weight parameters for different indexes, the classical Skyline calculation method in multi-objective decision making is introduced, and Weibo LeaderRank influence evaluation algorithm is proposed, and the characteristics of the algorithm are analyzed. In order to verify the validity of the algorithm, a Sina Weibo data acquisition system is designed and implemented by using web crawler technology, and a Weibo research data set including 125,207 users is established. When the Weibo server detects an abnormal access request, it will take measures such as redirecting the access request or prohibiting the user from accessing the request, which will seriously affect the speed of the acquisition. In order to solve this problem, multi-account simulation login, one account opened a thread, multi-thread at the same time. The thread uses anonymous proxy server to request data and dynamically changes the request HTTP header information. At the same time, the thread adds anomaly detection module, finds the abnormal situation in time and takes appropriate actions to imitate the normal user access behavior as far as possible. Improve the efficiency of collection. Finally, experiments are carried out on the collected data sets to verify the effectiveness of the four impact evaluation indexes. The Weibo LeaderRank algorithm is compared with other commonly used user influence algorithms. The results show that the Weibo LeaderRank algorithm is more effective than the Weibo LeaderRank algorithm. The computation time increases linearly with the increase of data volume. The algorithm can adapt to the large scale real Weibo environment and has good real-time performance.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.09
【引证文献】
相关硕士学位论文 前1条
1 赵倩;基于社区结构的Top-K影响力节点发现算法研究[D];华中科技大学;2015年
,本文编号:2132690
本文链接:https://www.wllwen.com/guanlilunwen/yingxiaoguanlilunwen/2132690.html