异构社交网络中社区发现算法研究
发布时间:2018-03-23 09:01
本文选题:社区发现 切入点:异构社交网络 出处:《中国矿业大学(北京)》2016年博士论文 论文类型:学位论文
【摘要】:社会网络中的社区发现作为数据挖掘研究领域的一个热点,近几年发展迅速,研究内容主要集中在通过对网络中存在的关系进行分析,得到社区划分的结果。随着Web2.0的兴起和社交网络的蓬勃发展,出现了多种新型的在线社交方式,单一的社会网络关系结构已经不足以应对解决现实世界的问题,所以学者们进一步提出了异构社交网络(Heterogeneous Social Networks)的概念。这是一个复杂的网络抽象结构,网络中通常包含多种关系和实体,这些不同的关系和实体组合形成了网络的多样化复杂结构。如何处理这些复杂的结构和获取社区结构信息,是对传统的社会网络社区发现的一个新的挑战。本文将针对划分异构网络过程中,多维度、多维复杂关系、多类型节点等特性所带来的网络数据重构与降维问题;传统研究仅仅局限于图链接关系,并未考虑语义信息,也就是主题对划分社区的帮助作用;再者传统划分算法需要预先知识和预先设定社区个数来得到划分结果,但真实世界社交网络中的社区个数往往是不可知的,尤其在大规模的社交网络中预先知识不可知等问题展开研究。主要包括异构社交网络通用分析框架,基于标签传播近似线性的社区发现算法,基于主题感知的社区发现算法等,从而设计出高效快速的异构网络社区发现算法。研究内容及创新点包括:1)提出了一种异构社交网络分析框架,针对异构网络进行数据重构,利用降维方法得到同构网络或者二分图,然后使用社区发现算法对同构网络或者二分图进行社区划分,从而将异构网络社区划分问题进行有效转化。2)将多维异构网络转化为同构网络后,提出了一种并行种子扩展算法PHSE,用来发现社交网络中的重叠社区结构。算法PHSE通过局部适应度函数优化和混合种子扩展策略来得到自然社区。相较于算法LFM,算法PHSE不仅在合成网络中有非常好的划分结果,同时在真实世界社交网络中也有非常好的划分结果。尤其,当合成网络的节点重叠度高达On=50%时,依旧可以准确的划分出重叠社区。3)提出了一种基于标签传播的社区发现算法iSLPA,同时支持有向图、无向图和二分图的社区划分,算法在迭代过程中采用标签混合更新模式,并且在真实社交网络数据集上表现出准确的社区划分结果。4)提出了一种基于并行计算框架Dpark的标签传播算法HLPA,针对不同的网络使用了不同的节点标签初始化策略,包括有向图、无向图和二分图,同时还使用了混合标签更新策略使得算法更加稳定,标签衰减策略使得算法可以避免划分出“monster”社区,也可以使得较小的社区得到充分成长。相较于之前的基于标签传播的算法,算法HLPA在划分benchmark基准真实社交网络时表现出了非常高的准确性,而且在划分大规模真实社交网络时表现得非常有竞争力,大大提高算法效率,针对300万节点、1.7亿条边的二分图划分社区只需要37.12分钟,而且通过分析验证了划分出的社区是有真实含义的社区结构。5)根据1)中提出的异构社交网络分析框架,提出了一种基于主题感知的异构网络社区发现算法,该算法通过对数据重构将多模网络转化为二模网络(用户-文档),采用算法LDA-light从异构网络转化得到的二模网络映射为带权重的二分图网络(用户-主题),采用新提出的带权重的二分图社区发现算法WLPA对二分图进行社区划分,最终将用户和主题两种不同的实体划分在同一社区内,即划分出的社区带有语义信息,从而可以更好地进行社区结构分析。本文提出的社区发现算法具有一般性,可以推广到许多同构或者异构社交网络和数据集,并且可以应用到更广泛的实际问题中。
[Abstract]:In a social network community discovery as the data mining is a hot research field, the rapid development in recent years, the study focused on the relationship existing in the network analysis, get the results of community division. With the rise of Web2.0 and social networks flourish, the emergence of a variety of new online socializing. The social network structure is not enough to cope with solve real-world problems, so scholars further proposed heterogeneous social network (Heterogeneous Social Networks) concept. It is a complex network abstract structure, the network usually contains a variety of relations and entities, and the relationship between these different entities combined to form a network diverse and complex structure. How to deal with the complicated structure and obtain the community structure information, is a new challenge to the traditional social network community discovery. This article will focus on the classification of heterogeneous networks in the process of multi dimension, multi dimension and complex relationship, and dimension of network data reconstruction caused by multiple types of nodes and other characteristics of the problem; the traditional study focused only on map links, did not consider the semantic information, also is the theme of community with the help of the division; in addition to the traditional division algorithm requires knowledge and advance set number of community division results get in advance, but the number of community real world social networks are often unknown, especially in large-scale social networks prior knowledge of unknown issues. The research mainly includes the analysis of the general framework of heterogeneous social networks, discovery algorithm approximate linear label propagation algorithm based on community, theme the perception of the community based on, to design a heterogeneous network community discovery algorithm is efficient and fast. The research contents and innovations include: 1) proposed a different The social network analysis framework for data reconstruction in heterogeneous networks, using dimensionality reduction method to get two points or homogeneous network graph, and then use the community discovery algorithm of homogeneous network community division or two figure, which will be the effective transformation of.2 community into the heterogeneous network) will be transformed into multidimensional heterogeneous network homogeneous network, is proposed an extended PHSE algorithm parallel seeds, used to find the overlapping community structure in social networks. The PHSE algorithm through local fitness function optimization and hybrid seed expansion strategy to get the natural community. Compared to the LFM algorithm, PHSE algorithm not only has very good classification results in the synthesis of the network, but also has very good classification results in the real world social networks. Especially, when the node network synthesis overlap is as high as On=50%, still can be accurately divided into overlapping community.3) put forward a ISLPA found the label propagation algorithm based on community, while supporting the directed graph, undirected graph and graph two community division algorithm using hybrid label update mode in the iterative process, and show the community classification results accurate.4 in the data set on the real social network) proposed a parallel HLPA label propagation algorithm framework based on Dpark, according to the different network using node label different initialization strategies, including directed graph, undirected graph and two points, while also using the hybrid tag update strategy which makes the algorithm more stable label attenuation strategy makes the algorithm can avoid the division of the "monster" community, also can make small the community has been fully grow. Compared to the previous label propagation algorithm based on HLPA algorithm in the division of benchmark benchmark real social network showed very high accuracy, but also in row Divided into large-scale real social networks is very competitive, greatly improve the efficiency of the algorithm for the 3 million node, two graph partitioning community 170 million edges need 37.12 minutes only, and verified through the analysis of the community is a community structure of.5) 1) according to the true meaning of the heterogeneous social network analysis framework proposed that presents a heterogeneous network community discovery algorithm based on perception of the subject, based on data reconstruction multimode networks into the network (the second mock exam user - document), use LDA-light algorithm to get the second mock exam network mapping transformation from heterogeneous network two network graph with weight (user topic), using the new with the weight of the two figure WLPA community discovery algorithm of community division two figure, will end users and theme two different entity in the same community, which is divided into the community with Semantic information can better analyze community structure. The community detection algorithm proposed in this paper is general, and can be extended to many isomorphic or heterogeneous social networks and data sets, and it can be applied to a wider range of practical problems.
【学位授予单位】:中国矿业大学(北京)
【学位级别】:博士
【学位授予年份】:2016
【分类号】:O157.5;TP393.09
【参考文献】
相关期刊论文 前5条
1 辛宇;杨静;谢志强;;基于标签传播的语义重叠社区发现算法[J];自动化学报;2014年10期
2 王莉;程学旗;;在线社会网络的动态社区发现及演化[J];计算机学报;2015年02期
3 张俊丽;常艳丽;师文;;标签传播算法理论及其应用研究综述[J];计算机应用研究;2013年01期
4 刘欣;Tsuyoshi Murata;;Detecting Communities in K-Partite K-Uniform (Hyper)Networks[J];Journal of Computer Science & Technology;2011年05期
5 杨博;刘大有;金弟;马海宾;;复杂网络聚类方法[J];软件学报;2009年01期
,本文编号:1652785
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1652785.html
教材专著