社交网络的非重叠社团划分算法研究

发布时间：2018-06-04 14:16

本文选题：复杂网络 + 社区发现　；参考：《重庆大学》2016年硕士论文

【摘要】：随着互联网的蓬勃发展,社交网络变得日益庞大和复杂。社交网络是指人与人之间关系而组成的复杂网络。复杂的社交网络存在着一个共同的特性——社团结构,即社团内部的节点联系紧凑,而社团之间的节点联系稀疏。在整个网络中,我们根据节点之间关联关系,寻找联系紧密小型子网络结构的过程称为社团划分或社区发现。网络中的类型社团根据所有节点是否所属于多个社团情况分为非重叠社团和重叠社团。社交网络的社团划分可以应用于多个场景:在同一个社团中推荐好友、对同一个社团中用户进行商品推荐等。由于社交网络具有复杂网络的一般特性,人们常常使用复杂网络对社交网络建立模型进行研究。获得复杂的社交网络中高效而又准确的社团划分,这是研究复杂网络中社团划分问题的出发点和落脚点。在此研究领域中,众多学者提出许多经典的算法:GN算法、谱平分算法、随机游走算法、标签传播算法、CPM算法和EAGLE算法等。然而,近年来,社交网络呈指数式发展,许多经典的算法面临了更多的挑战。本论文着力于无向网络的非重叠社团划分的研究问题,根据不同的社交网络特性提出了三种社团划分算法。(1)面对规模较小已知社团个数网络时,本论文针对非重叠社团划分问题,提出了ECFM算法(Easy Community Finding based on Matrix Algorithm)。该算法划分后的准确率较高,但是该算法存在着不足:需要指定网络中的社团个数,否则无法判断算法终止条件。(2)针对ECFM算法的缺陷,本论文利用遗传算法和聚类算法的思想,提出了GKNM算法(Genetic K-Means based on Normal Matrix Algorithm)。该算法较以前的经典算法在划分准确率上得到明显地提升。该算法由于采用了遗传算法的架构,该算法的运行时间耗费较长。同时,该算法选取了Normal矩阵作为聚类模型,不适用于规模较大的网络。(3)针对复杂的网络模型,本论文利用了LPA算法具有超低时间复杂度的特性,采用了多重标签传播模型,提出了CMLPA算法(Multiple Label Propagation based on Clique Algorithm)。该算法不仅在时间复杂度上与LPA算法一样近乎线性,而且在结果准确率上高于其它一些算法。在设计算法时,CMLPA算法遵循了Pregel计算框架,该算法是可以运行在Spark大数据框架下。
[Abstract]:With the vigorous development of the Internet, social networks have become increasingly large and complex. Social network is a complex network formed by the relationship between people. A common feature of complex social networks is the community structure, in which the nodes within the community are closely connected and the connections between the communities are sparse. In the whole network, the process of searching for a compact sub-network is called community division or community discovery according to the relationship between nodes. The types of societies in the network are divided into non-overlapping associations and overlapping societies according to whether all nodes belong to more than one community. Social network community division can be used in many scenarios: recommend friends in the same community, recommend goods to users in the same community, etc. Because social networks have the general characteristics of complex networks, people often use complex networks to model social networks. It is the starting point and end point of studying the problem of community division in complex social networks to obtain efficient and accurate community partition in complex social networks. In this field, many scholars have proposed many classical algorithms, such as: GN algorithm, spectral partition algorithm, random walk algorithm, label propagation algorithm, EAGLE algorithm and so on. However, with the exponential development of social networks in recent years, many classical algorithms face more challenges. In this paper, we focus on the study of the non-overlapping community partitioning of undirected networks. According to different characteristics of social networks, we propose three community partitioning algorithms. In this paper, ECFM algorithm is proposed to solve the problem of nonoverlapping community division. The accuracy of the algorithm after partition is high, but the algorithm has some shortcomings: we need to specify the number of communities in the network, otherwise we can not judge the termination condition of the algorithm. (2) aiming at the defects of the ECFM algorithm, this paper uses the idea of genetic algorithm and clustering algorithm. The genetic K-Means based on Normal Matrix algorithm (GKNM) is proposed. Compared with the previous classical algorithm, this algorithm can improve the partition accuracy obviously. Because of the structure of genetic algorithm, the running time of the algorithm is very long. At the same time, the algorithm selects Normal matrix as the clustering model, which is not suitable for large-scale network. Aiming at the complex network model, this paper makes use of the characteristic of LPA algorithm with ultra-low time complexity and adopts the multi-label propagation model. In this paper, CMLPA algorithm is proposed. The algorithm is not only as linear as the LPA algorithm in time complexity, but also more accurate than some other algorithms. When designing the algorithm, the CMLPA algorithm follows the Pregel computing framework, and the algorithm can run under the framework of Spark big data.
【学位授予单位】：重庆大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：O157.5

【参考文献】