面向微博的社区发现研究
发布时间:2018-07-29 12:44
【摘要】:随着在线社交网络的快速发展,微博已被公认为是继电视、邮件、报纸、广播等之后的一种新型的网络信息传播方式,兼具着媒体和社交网络的特征。随着微博这个新型媒体的兴起,越来越多的广告商在这个社交网络平台上看到了机会。虽然微博这个社交化网络对于用户来说仅仅是一个获取信息与知识,实现与好友互动,展现自我的平台,但是对于商家来说,一个有着那么多注册用户的平台,它就不仅仅是一个分享信息的平台了,商家可以利用社交平台对其产品和服务进行病毒营销、口碑营销。如今,不管是文本或是富媒体信息,在微博平台.上均能快速的进行传递、互通、互动,由此,越来越多的社交行为从线下转向线上,而这种从线下到线上的转变,不仅仅体现在线下的社交行为上,线下社区的产品宣传到线上社区的产品宣传的转变尤为明显。但是如何将线下社区的广告宣传无缝隙移植到线上社区,这是如今广告商越来越想解决的问题。虽然广告商可以借助微博社交平台采用海投广告的策略,但这无疑会造成广告成本较高,而且会引起那些对广告不感冒的微博用户的反感,从而将这种反感情绪通过发布微博迅速传播到其它微博用户视线中,进而产品的知名度亦会随之降低。因为不同的广告面向不同的用户群体所属的虚拟社区,如果能挖掘出每个用户所属的社区后,根据微博的局部聚类特性识别出领域广告对应的社区,就可提高广告投放的精准度。这样,广告商不仅可以有针对性的对相应虚拟社区的微博用户投放广告,而且可以获得更多投放用户,从而更能提高广告的转化率。 本文首先对与社区发现相关的一些理论技术进行介绍,然后总结了目前几类较经典的社区发现算法,包括基于标签传播的社区发现、以层次为中心的社区发现、基于团渗透的社区发现、基于混合概率模型的社区发现,并分析了以上几类社区发现算法的适用场景及优缺点。在领会了以上几类社区发现算法之后,本文针对微博这种相互性网络的独特性,提出了一种基于RDD (Resilient DistributedDataset)数据模型的Spark集群并行计算思想的多标签传播社区发现算法。本算法借鉴了Raghava等人提出的标签传播算法LPA的思想,解决了LPA标签传播算法社区发现不稳定及准确度不高的两大缺陷。为了消除标签传播算法中不稳定的特性,在对自身社区标签进行更新时,若有多个可选社区标签时,不会任意地对某一社区标签进行挑选,而是利用一定规则,将符合这一规则的社区标签全部选择。而为了解决LPA算法准确率不高的缺点,本文引入了微博传播系数,因为微博传播系数能够识别出当用户节点关注某一个人时,该用户属于被关注用户所在社区的隶属程度,不会平等地对待每一个被关注用户所属社区。另外,本算法也借鉴了Gregory等人提出的COPRA算法,能够对具有重叠的社区结构进行发现,但与其思想也有本质的区别:(1)不会限制每个微博用户所从属的社区标签,因为微博用户的兴趣爱好是广泛的,而且是动态变化的;(2)首先根据用户所发微博内容挖掘出那些活跃微博用户所属社区,然后利用这些已被划分到相关社区的用户作为已标注社区的节点进行社区标签传播。最后,本文将基于RDD(Resilient Distributed Dataset)数据模型的Spark集群计算的消息传递机制的并行计算思想运用于本文提出的算法中,使得算法在处理大规模图计算问题方面成为可能。因此,本文所提出的算法不仅能高效的发现微博用户所属的重叠社区,而且能处理大规模的图计算问题。
[Abstract]:With the rapid development of online social networks, micro-blog has been recognized as a new form of network information communication after TV, mail, newspaper and radio. With the features of media and social networks, more and more advertisers have seen opportunities on this social networking platform with the rise of the new media of micro-blog. But micro-blog is a social network for users to get information and knowledge, to interact with friends, to display their own platform, but for businesses, a platform with so many registered users, it is not only a platform to share information, businesses can use social platforms for their products and services. Virus marketing, word of mouth marketing. Today, both text and rich media information, on the micro-blog platform, can be delivered quickly, interworking, and interactive, and thus more and more social behavior is transferred from the line to line, and the change from the line to the line is not only a reflection of the social behavior under the line, the product of the offline community. The promotion of product publicity to the online community is particularly obvious. But how to transplant advertising campaigns from offline communities to online communities is a problem that advertisers are increasingly trying to solve. Although advertisers can use micro-blog's social platform to adopt a strategy for advertising in the sea, it will undoubtedly lead to higher advertising costs. It causes the disgust of micro-blog users who do not catch a cold, so that this disgust is spread quickly to other micro-blog users through the release of micro-blog, and the popularity of the product will be reduced, as different ads face virtual communities belonging to different user groups, if they can excavate the society of each user. After the area, according to the local clustering characteristics of micro-blog to identify the community corresponding to the field advertising, it can improve the accuracy of advertising. In this way, advertisers can not only put advertising on the micro-blog users in the corresponding virtual community, but also can get more users, which can improve the conversion rate of advertising.
This paper first introduces some theories and techniques related to community discovery, and then summarizes several classical community discovery algorithms, including community discovery based on label propagation, community discovery based on hierarchy, community discovery based on group penetration, community discovery based on mixed probability model, and the analysis of the above categories. The community discovery algorithm is applicable scene and advantages and disadvantages. After understanding the above classes of community discovery algorithms, this paper proposes a multi label propagation community discovery algorithm based on the RDD (Resilient DistributedDataset) data model based on the RDD (Resilient DistributedDataset) data model. This algorithm is used for reference. The idea of label propagation algorithm (LPA) proposed by Raghava and others has solved two major defects in the community found that the LPA tag propagation algorithm is unstable and not highly accurate. In order to eliminate the unstable characteristics of the label propagation algorithm, when the community labels are updated, if there are multiple selectable community labels, it will not be arbitrary to a community. In order to solve the shortcomings of the low accuracy of the LPA algorithm, this article introduces the micro-blog propagation coefficient, because the micro-blog propagation coefficient can recognize that when the user node pays attention to one person, the user belongs to the community where the user is concerned. The degree of membership is not equal to the community of every concerned user. In addition, this algorithm also draws on the COPRA algorithm proposed by Gregory et al. To discover the overlapping community structure, but it is also essentially different from its thought: (1) it will not limit the community labels for each micro-blog user, because micro-blog users The interests are wide and dynamic; (2) first, based on the users' micro-blog content, excavate the community that the active micro-blog users belong to, and then use these users who have been divided into the communities to be tagged as nodes in the community. Finally, this article will be based on RDD (Resilient Distributed Da). Taset) the parallel computing idea of the message transfer mechanism of the Spark cluster calculation in the data model is applied to the algorithm proposed in this paper, which makes it possible to deal with the computation of large scale graphs. Therefore, the proposed algorithm can not only efficiently discover the overlapping communities of micro-blog users, but also deal with large scale graphs. Calculate the problem.
【学位授予单位】:浙江工商大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2152712
[Abstract]:With the rapid development of online social networks, micro-blog has been recognized as a new form of network information communication after TV, mail, newspaper and radio. With the features of media and social networks, more and more advertisers have seen opportunities on this social networking platform with the rise of the new media of micro-blog. But micro-blog is a social network for users to get information and knowledge, to interact with friends, to display their own platform, but for businesses, a platform with so many registered users, it is not only a platform to share information, businesses can use social platforms for their products and services. Virus marketing, word of mouth marketing. Today, both text and rich media information, on the micro-blog platform, can be delivered quickly, interworking, and interactive, and thus more and more social behavior is transferred from the line to line, and the change from the line to the line is not only a reflection of the social behavior under the line, the product of the offline community. The promotion of product publicity to the online community is particularly obvious. But how to transplant advertising campaigns from offline communities to online communities is a problem that advertisers are increasingly trying to solve. Although advertisers can use micro-blog's social platform to adopt a strategy for advertising in the sea, it will undoubtedly lead to higher advertising costs. It causes the disgust of micro-blog users who do not catch a cold, so that this disgust is spread quickly to other micro-blog users through the release of micro-blog, and the popularity of the product will be reduced, as different ads face virtual communities belonging to different user groups, if they can excavate the society of each user. After the area, according to the local clustering characteristics of micro-blog to identify the community corresponding to the field advertising, it can improve the accuracy of advertising. In this way, advertisers can not only put advertising on the micro-blog users in the corresponding virtual community, but also can get more users, which can improve the conversion rate of advertising.
This paper first introduces some theories and techniques related to community discovery, and then summarizes several classical community discovery algorithms, including community discovery based on label propagation, community discovery based on hierarchy, community discovery based on group penetration, community discovery based on mixed probability model, and the analysis of the above categories. The community discovery algorithm is applicable scene and advantages and disadvantages. After understanding the above classes of community discovery algorithms, this paper proposes a multi label propagation community discovery algorithm based on the RDD (Resilient DistributedDataset) data model based on the RDD (Resilient DistributedDataset) data model. This algorithm is used for reference. The idea of label propagation algorithm (LPA) proposed by Raghava and others has solved two major defects in the community found that the LPA tag propagation algorithm is unstable and not highly accurate. In order to eliminate the unstable characteristics of the label propagation algorithm, when the community labels are updated, if there are multiple selectable community labels, it will not be arbitrary to a community. In order to solve the shortcomings of the low accuracy of the LPA algorithm, this article introduces the micro-blog propagation coefficient, because the micro-blog propagation coefficient can recognize that when the user node pays attention to one person, the user belongs to the community where the user is concerned. The degree of membership is not equal to the community of every concerned user. In addition, this algorithm also draws on the COPRA algorithm proposed by Gregory et al. To discover the overlapping community structure, but it is also essentially different from its thought: (1) it will not limit the community labels for each micro-blog user, because micro-blog users The interests are wide and dynamic; (2) first, based on the users' micro-blog content, excavate the community that the active micro-blog users belong to, and then use these users who have been divided into the communities to be tagged as nodes in the community. Finally, this article will be based on RDD (Resilient Distributed Da). Taset) the parallel computing idea of the message transfer mechanism of the Spark cluster calculation in the data model is applied to the algorithm proposed in this paper, which makes it possible to deal with the computation of large scale graphs. Therefore, the proposed algorithm can not only efficiently discover the overlapping communities of micro-blog users, but also deal with large scale graphs. Calculate the problem.
【学位授予单位】:浙江工商大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前1条
1 武志昊;林友芳;Steve Gregory;万怀宇School of Computer and Information Technology,Beijing Jiaotong University;田盛丰;;Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks[J];Journal of Computer Science & Technology;2012年03期
,本文编号:2152712
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2152712.html