基于标签传播及适合度的社团聚类算法研究
发布时间:2018-04-02 12:10
本文选题:社会网络 切入点:标签传播 出处:《西南大学》2015年硕士论文
【摘要】:现实世界中的很多网络系统都可以抽象成社会网络,在这些网络中,节点表示个体,节点之间的边表示个体之间的相互联系。随着对社会网络研究的不断深入,人们发现网络具有社团结构的特性,在社团结构内部,节点之间的连接比较紧密,而社团与社团之间的节点连接较为稀疏。社团结构往往代表了网络中具有某一相同属性特征的节点集合,挖掘网络中这种社团结构对研究社会网络的演化过程、分析网络的拓扑结构以及了解网络潜在的功能都具有非常重要的意义。本文围绕着如何快速且有效地对网络中的社团结构进行聚类这一问题进行了相关研究。本文首先分析总结了社团聚类的相关背景与研究现状,研究了有关社会网络的重要理论,为后续研究奠定了理论基础。对基于局部信息的标签传播算法(LPA算法)进行了深入研究与分析,发现LPA算法在算法迭代过程中会出现“标签逆流”现象,同时初始化标签数目过多,标签更新条件单一、不全面,标签更新策略具有随机性等问题,针对这些问题给出了总体的解决方案。在具体实现这个解决方案的基础上,本文提出了一种带有适合度的标签传播算法,记为LPA-FA算法。LPA-FA算法首先将网络中所有的节点按照其度的大小进行排序,组成了一个有序的节点序列,在每次算法迭代过程中,都按照这个节点序列依次进行节点标签更新,避免了LPA算法由于采用随机节点序列而造成的“标签逆流”现象,提高了算法效率。在节点标签初始化过程中,在LPA-FA算法中设计了一种简单线性初始化方法,减少了网络中初始标签的个数,从而降低算法的运行时间。然后在节点标签传播过程中,当节点的更新标签出现不唯一时,LPA-FA算法引入了邻接子系统聚集系数、邻接边权重、节点属性相似度三个参数,进而通过Topsis方法得到三者的综合评价值标签适合度,最终选择使标签适合度最大的邻接子系统内的标签作为更新标签,不仅解决了LPA算法节点标签更新条件单一、不全面的缺陷,也克服了LPA算法的随机性,从而降低算法的时间开销,提高了聚类社团的质量。最后,将LPA-FA算法与LPA算法分别在Zachary空手道俱乐部网络和科学家合著网络两个数据集上进行了实验,并对实验结果与实验数据进行比对分析。最终,实验表明本文提出的LPA-FA算法无论在运行时间,还是聚类社团的质量上都要优于LPA算法,从而验证了LPA-FA算法的有效性。
[Abstract]:Many network systems in the real world can be abstracted into social networks in which nodes represent individuals and edges between nodes represent individual relationships.With the development of the research on social network, it is found that the network has the characteristics of community structure. Within the community structure, the connections between nodes are relatively close, and the connections between communities and communities are sparse.The community structure often represents a set of nodes with the same attribute in the network, mining this kind of community structure in the network to study the evolution process of the social network.It is very important to analyze the topological structure of the network and to understand the potential functions of the network.This paper focuses on how to quickly and effectively cluster the community structure in the network.Firstly, this paper analyzes and summarizes the relevant background and research status of community clustering, and studies the important theory of social network, which lays a theoretical foundation for further research.In this paper, the label propagation algorithm based on local information is deeply studied and analyzed. It is found that the LPA algorithm will appear the phenomenon of "label countercurrent" in the iterative process of the algorithm, at the same time, the number of initialized tags is too many, and the condition of tag updating is single.In view of these problems, the overall solution is given.Based on the implementation of this solution, this paper proposes a label propagation algorithm with degree of fitness, which is called LPA-FA algorithm. LPA-FA algorithm first sorts all nodes in the network according to their degree.An ordered node sequence is formed. In each iteration process, the node label is updated according to the node sequence in turn, which avoids the "label countercurrent" phenomenon caused by the adoption of random node sequences in the LPA algorithm.The algorithm efficiency is improved.In the process of node label initialization, a simple linear initialization method is designed in the LPA-FA algorithm, which reduces the number of initial tags in the network and thus reduces the running time of the algorithm.Then, in the process of node label propagation, the LPA-FA algorithm introduces three parameters, such as the clustering coefficient of adjacent subsystem, the weight of adjacent edge, and the similarity of node attribute, when the updated label of nodes is not unique.Then through the Topsis method to get the three comprehensive evaluation value label fitness, finally select the label in the adjacent subsystem with the greatest label fitness as the update label, not only solve the LPA algorithm node label update condition is single.The defect of incompleteness also overcomes the randomness of LPA algorithm, which reduces the time cost of the algorithm and improves the quality of clustering community.Finally, the LPA-FA algorithm and the LPA algorithm are experimented on the Zachary karate club network and the scientist coauthor network, respectively, and the experimental results are compared with the experimental data.Finally, the experimental results show that the proposed LPA-FA algorithm is superior to the LPA algorithm in terms of running time and clustering community quality, thus validating the effectiveness of the LPA-FA algorithm.
【学位授予单位】:西南大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:O157.5;TP311.13
【参考文献】
相关期刊论文 前4条
1 李晓佳;张鹏;狄增如;樊瑛;;复杂网络中的社团结构[J];复杂系统与复杂性科学;2008年03期
2 李军利;赵红领;范明;;邮件社区划分和小世界网络[J];计算机应用;2008年S1期
3 沙爱晖;黄树成;李甜;;一种基于网络社团结构和模块化函数的聚类算法[J];计算机应用与软件;2014年04期
4 金弟;杨博;刘杰;刘大有;何东晓;;复杂网络簇结构探测——基于随机游走的蚁群算法[J];软件学报;2012年03期
相关硕士学位论文 前2条
1 汪大明;复杂网络社团模型与结构研究[D];国防科学技术大学;2010年
2 李明涛;结合话题的社会网络社团发现技术研究[D];解放军信息工程大学;2012年
,本文编号:1700323
本文链接:https://www.wllwen.com/kejilunwen/yysx/1700323.html