云环境下基于多目标优化算法的微博意见领袖挖掘
发布时间:2019-04-10 08:31
【摘要】:近几年,随着互联网的普及,人们的交流方式也发生了翻天覆地的变化。微博作为一种新兴的网络信息传播媒体受到广大网民的追捧,它的时效性和裂变性可以使信息在很短的时间内得到广泛的传播,因此对信息的有效控制就变得十分重要。因为,一旦有人蓄意传播虚假信息或者危害社会安全的言论,后果将十分严重。那么,对在网络信息传播中占主导地位的意见领袖的挖掘和监控就显得意义重大。 目前对于意见领袖的挖掘方法主要有统计学方法、聚类分析法、基于SNA社会网络分析法等,这些方法都有各自的特点,但是面对331亿微博网民所产生的海量微博数据,它们并没有表现出很好的处理能力。 本文以微博用户属性为立足点,将其多个属性特征与多目标优化问题结合起来,提出把Skyline查询引入到微博意见领袖的挖掘中,Skyline查询是解决多目标优化问题的一类方法。面对海量的微博数据,本文引入Hadoop关键技术MapReduce框架,将Skyline计算中的BNL块嵌套循环算法和SFS排序过滤算法在该编程框架下实现,使其在处理海量数据时有更好的性能。接着对意见领袖评估模型进行建立,提出从用户影响力和用户参与度两个指标评价微博意见领袖,并利用AHP层次分析法确定各属性权重,最后给出意见领袖计算公式。在挖掘实验阶段,搭建Hadoop集群环境,设计微博爬虫获取微博数据,将数据用并行化后的SFS算法进行处理,再将处理结果运用于意见领袖模型中进行计算。最后将本文挖掘的意见领袖与新浪微博官方人气用户进行对比,结果显示本文的方法挖掘出的意见领袖分布领域相对广泛,在一定程度上避免了新浪微博官方用户排名中娱乐人物一家独大的现象。因此,本文的思路可以作为一种解决方法,处理海量、高维数据,为微博意见领袖的挖掘提供了一种可能的选择。
[Abstract]:In recent years, with the popularity of the Internet, people's way of communication has also undergone earth-shaking changes. As a new kind of network information communication media, Weibo has been pursued by many netizens, its timeliness and fragmentation can make information widely spread in a very short time, so the effective control of information becomes very important. As soon as someone deliberately spreads false information or endangers social security, the consequences will be very serious. So, it is of great significance to excavate and monitor the opinion leaders who play a dominant role in the network information dissemination. At present, there are statistical method, cluster analysis method and social network analysis method based on SNA for opinion leader mining. These methods have their own characteristics. However, in the face of the massive Weibo data generated by 33.1 billion Weibo netizens, They don't show good handling power. In this paper, taking Weibo user attribute as the foothold, combining its multi-attribute features with multi-objective optimization problem, this paper proposes to introduce Skyline query into the mining of Weibo opinion leader. Skyline query is a kind of method to solve multi-objective optimization problem. Facing the huge amount of Weibo data, this paper introduces MapReduce framework, the key technology of Skyline, and implements the BNL block nested loop algorithm and SFS sorting filtering algorithm in Skyline computing under this programming framework, so that it has better performance when dealing with massive data. Then the evaluation model of opinion leader is established, and two indexes of user influence and user participation are put forward to evaluate the opinion leader of Weibo. The weight of each attribute is determined by AHP, and the formula of opinion leader is given at last. In the mining experiment stage, the Hadoop cluster environment is built, the Weibo crawler is designed to obtain Weibo data, the data is processed by parallel SFS algorithm, and the processing results are applied to the opinion leader model for calculation. Finally, the author compares the opinion leaders excavated in this paper with the official popular users of Sina Weibo, and the results show that the opinion leaders are distributed in a relatively wide range of areas by the method of this paper. To some extent avoided Sina Weibo official user ranking entertainment figures in the big phenomenon. Therefore, the idea of this paper can be used as a solution to deal with massive, high-dimensional data, which provides a possible choice for the mining of Weibo opinion leaders.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2455648
[Abstract]:In recent years, with the popularity of the Internet, people's way of communication has also undergone earth-shaking changes. As a new kind of network information communication media, Weibo has been pursued by many netizens, its timeliness and fragmentation can make information widely spread in a very short time, so the effective control of information becomes very important. As soon as someone deliberately spreads false information or endangers social security, the consequences will be very serious. So, it is of great significance to excavate and monitor the opinion leaders who play a dominant role in the network information dissemination. At present, there are statistical method, cluster analysis method and social network analysis method based on SNA for opinion leader mining. These methods have their own characteristics. However, in the face of the massive Weibo data generated by 33.1 billion Weibo netizens, They don't show good handling power. In this paper, taking Weibo user attribute as the foothold, combining its multi-attribute features with multi-objective optimization problem, this paper proposes to introduce Skyline query into the mining of Weibo opinion leader. Skyline query is a kind of method to solve multi-objective optimization problem. Facing the huge amount of Weibo data, this paper introduces MapReduce framework, the key technology of Skyline, and implements the BNL block nested loop algorithm and SFS sorting filtering algorithm in Skyline computing under this programming framework, so that it has better performance when dealing with massive data. Then the evaluation model of opinion leader is established, and two indexes of user influence and user participation are put forward to evaluate the opinion leader of Weibo. The weight of each attribute is determined by AHP, and the formula of opinion leader is given at last. In the mining experiment stage, the Hadoop cluster environment is built, the Weibo crawler is designed to obtain Weibo data, the data is processed by parallel SFS algorithm, and the processing results are applied to the opinion leader model for calculation. Finally, the author compares the opinion leaders excavated in this paper with the official popular users of Sina Weibo, and the results show that the opinion leaders are distributed in a relatively wide range of areas by the method of this paper. To some extent avoided Sina Weibo official user ranking entertainment figures in the big phenomenon. Therefore, the idea of this paper can be used as a solution to deal with massive, high-dimensional data, which provides a possible choice for the mining of Weibo opinion leaders.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前10条
1 李名亮;;现实与隐忧:微博意见领袖的话语权力[J];今传媒;2012年05期
2 桑亮;许正林;;微博意见领袖的形成机制及其影响[J];当代传播;2011年03期
3 肖宇;许炜;夏霖;;一种基于情感倾向分析的网络团体意见领袖识别算法[J];计算机科学;2012年02期
4 王珏;曾剑平;周葆华;吴承荣;;基于聚类分析的网络论坛意见领袖发现方法[J];计算机工程;2011年05期
5 王甲生;吴晓平;廖巍;陈永强;;改进的加权复杂网络节点重要度评估方法[J];计算机工程;2012年10期
6 肖宇;许炜;夏霖;;网络社区中的意见领袖特征分析[J];计算机工程与科学;2011年01期
7 马宁;刘怡君;;基于超网络的舆论领袖识别应用研究[J];中国科学院院刊;2012年05期
8 张萌;余芬芬;;微博意见领袖对网络舆论的影响[J];新闻世界;2012年07期
9 肖宇;许炜;张晨;何丹丹;;社交网络中用户区域影响力评估算法研究[J];微电子学与计算机;2012年07期
10 王平;谢耘耕;;突发公共事件中微博意见领袖的实证研究——以“温州动车事故”为例[J];现代传播(中国传媒大学学报);2012年03期
,本文编号:2455648
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2455648.html