大型社交网络中社团挖掘算法的研究

发布时间：2018-08-02 10:41

【摘要】：挖掘社交网络中的社团结构对于揭示网络潜在规律和把握网络宏观特征具有重要意义。目前,已涌现出诸多社团挖掘算法,其中有些虽然具有复杂度近线性的优势,但是却在应用于大型网络方面仍然存在两种局限性,其一是需要预知网络社团数目,其二算法的低复杂度是以牺牲准确率来换取的。因此,为了使现有社团挖掘算法能够适用于大型网络,本文针对这两种局限性分别进行了相应地研究和改进。主要工作如下:(1)讨论常见社团挖掘算法在应用于大型网络时存在的问题,得出低时间复杂度的算法具有优势但仍然存在两种局限性的结论。因此,对社团数目估计方法和标号传播社团挖掘算法的国内外现状进行调研,并分析这两方面已有方法的优劣。(2)针对已有社团数目估计方法的准确率低、计算不高效和适用范围受限等问题,本文给出一种利用正则无回路矩阵的社团数目估计方法。该方法通过定义一种正则无回路矩阵来描述网络,并观察分析其特征值的分布规律,最后利用本征间隙的最大位置对网络社团数目进行估计。由两种经典网络生成模型合成的人工网络上验证该方法。(3)针对标号传播算法以牺牲算法准确率和稳定性为代价来换取低复杂度的问题,本文给出一种改进的标号传播算法。该算法在标号传播顺序上用新定义的复合权重对所有节点进行优先级排序,在标号传播过程中采用节点贡献度对候选标号进行筛选,最后用新定义的平衡节点过滤机制对算法的收敛条件进行优化。在两种标准大型社交网络数据集上验证该算法。实验结果表明:利用正则无回路矩阵的社团数目估计方法重点消除了度异质分布对社团数目估计的影响,从而提高了估计结果的准确率,且适用范围不受限;较之标号传播算法和其他两种大型网络社团挖掘算法,改进的标号传播算法不仅在性能和质量方面具有显著优势,而且社团挖掘的效率也有所提高。
[Abstract]:It is of great significance to excavate the community structure in social networks for revealing the potential laws of the network and grasping the macroscopic characteristics of the network. At present, many community mining algorithms have emerged, some of which have the advantage of near linear complexity, but there are still two limitations in the application of large networks. One is the need to predict the number of network communities. Second, the low complexity of the algorithm is achieved at the expense of accuracy. Therefore, in order to make the existing community mining algorithms applicable to large networks, the two limitations are studied and improved accordingly in this paper. The main work is as follows: (1) the problems of common community mining algorithms in large networks are discussed. It is concluded that low time complexity algorithms have advantages but still have two limitations. Therefore, the current situation of community number estimation methods and labeling propagation community mining algorithms at home and abroad are investigated, and the advantages and disadvantages of these two existing methods are analyzed. (2) the accuracy of existing community number estimation methods is low. In this paper a method of estimating the number of communities using regular loop-free matrix is presented in which the computation is not efficient and the scope of application is limited. In this method, a regular loop free matrix is defined to describe the network, and the distribution of eigenvalues is observed and analyzed. Finally, the number of network communities is estimated by using the maximum position of the eigenspace. The proposed method is verified on artificial networks composed of two classical network generation models. (3) an improved labeling propagation algorithm is presented to deal with the problem of low complexity at the expense of the accuracy and stability of the label propagation algorithm. In this algorithm, all nodes are prioritized by the newly defined composite weights in the label propagation sequence, and the candidate labels are filtered by the contribution degree of the nodes in the process of label propagation. Finally, the new balanced node filtering mechanism is used to optimize the convergence conditions of the algorithm. The algorithm is validated on two standard large social network datasets. The experimental results show that the influence of degree heterogeneity distribution on the estimation of community number is mainly eliminated by using the regular loop free matrix estimation method, thus the accuracy of the estimation results is improved, and the range of application is not limited. Compared with label propagation algorithm and other two large network community mining algorithms, the improved labeling propagation algorithm not only has significant advantages in performance and quality, but also improves the efficiency of community mining.
【学位授予单位】：西安理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13;TP393.09

【参考文献】