社会网络中的社区发现与节点评估算法研究

发布时间：2018-03-06 09:53

本文选题：社会网络　切入点：社区发现　出处：《吉林大学》2014年硕士论文　论文类型：学位论文

【摘要】：随着互联网的发展和普及，在线社会网络已经渗透到人们生活中的每个角落，拉近了人们彼此之间的距离，对社会网络的研究能够让我们了解社会网络的结构特征以及演化规律让其更好地为人类服务。在网络研究方面有两个热点问题：社区发现和节点的重要性评估，这两个问题的研究对于我们认识复杂社会网络的结构以及特征具有非常重要的意义。社会网络研究中的社区发现工作可以把大的网络分成粒度更小的社区，让我们发现内部个体联系紧密的团体，节点评估可以对网络中的节点以不同的角度进行重要性评估，发现重要节点。目前的社区发现算法大部分基于图形分割和层次聚类思想，虽然这些算法大部分情况下能够有效地对社区进行识别，但都必须指定社区的数量或社区的规模，显然这是不合理的。遗传算法作为一种搜索最优解的方法能够在没有先验信息的情况下自动识别社区的数量，高效准确的对社区进行发现，但是传统的种群的初始化方法仅仅考虑到邻接信息，没有充分考虑网络的拓扑结构，因此得到的种群的质量比较差，影响算法的收敛速度。节点的评估算法也存在许多，有的依据节点的局部特征，有的依据整个网络的拓扑结构，作为用在搜索引擎中评估网页重要程度的PageRank算法，在社会节点评估中也有很广泛的应用，但传统的PageRank算法在权值分配的时候都是均匀分配，这种分配方式在社会网络中是不合理的，因为社会网络反映的是用户与用户之间的关系，，这种关系是有亲疏之分的，不能同等对待。针对上面的分析，本文主要对社区发现的遗传算法和节点评估的PageRank算法存在的不足进行改进，主要的工作如下：首先，对用于社区发现的遗传算法的种群初始化方法进行了改进，根据社会网络的特性，给出了信息在网络中传播的特征定义，然后根据社会网络的自身特征和信息在网络中传播的特性提出了能够充分应用网络拓扑结构的初始化方法k-path方法,并且给出了基于k-path初始化的遗传算法的计算过程。然后，依据节点之间的亲密程度，提出了节点间的认可度概念，针对用于节点评估的PageRank算法权值均匀分配的不合理性问题，提出以节点间的认可度为依据来分配权值，最后给了改进的节点评估算法ARank算法。最后，在数据集上验证了改进的遗传算法和PageRank算法，实验结果表明改进的遗传算法在收敛速度要比传统的算法快，改进的PageRank算法对节点的评估比传统的评估方式得到结果合理。
[Abstract]:With the development and popularity of the Internet, the online social network has penetrated into every corner of people's lives, drawing closer the distance between people. The study of social networks allows us to understand the structural characteristics and evolutionary laws of social networks so that they can better serve humanity. There are two hot issues in network research: community discovery and the importance of nodes. The study of these two issues is of great significance for us to understand the structure and characteristics of complex social networks. Community discovery in social network studies can divide large networks into smaller ones. Let us find the group which is closely connected with each other. The node evaluation can evaluate the importance of the nodes in the network from different angles and find the important nodes. Most of the current community discovery algorithms are based on the idea of graph segmentation and hierarchical clustering. Although these algorithms can effectively identify communities in most cases, they must specify the number of communities or the size of communities. Obviously, this is not reasonable. Genetic algorithm, as a method of searching the optimal solution, can automatically identify the number of communities without prior information, and efficiently and accurately find the communities. However, the traditional initialization method only considers the adjacent information and does not fully consider the topology of the network, so the quality of the population is poor, which affects the convergence speed of the algorithm, and there are many evaluation algorithms for nodes. Some are based on the local characteristics of nodes, some are based on the topology of the entire network, as a PageRank algorithm used to evaluate the importance of web pages in search engines, and they are also widely used in social node evaluation. However, the traditional PageRank algorithm is distributed evenly when the weights are allocated, which is unreasonable in the social network, because the social network reflects the relationship between the user and the user. In view of the above analysis, this paper mainly improves the genetic algorithm found in the community and the PageRank algorithm based on node evaluation. The main work is as follows:. Firstly, the population initialization method of genetic algorithm for community discovery is improved. According to the characteristics of social network, the characteristic definition of information transmission in the network is given. Then, according to the characteristics of social network and the characteristics of information spreading in the network, an initialization method, k-path method, which can make full use of the network topology, is proposed, and the calculation process of genetic algorithm based on k-path initialization is given. Then, according to the degree of intimacy between nodes, the concept of recognition degree between nodes is proposed. In view of the unreasonable distribution of weight values of PageRank algorithm used for node evaluation, it is proposed that the weights should be assigned according to the recognition degree between nodes. Finally, an improved node evaluation algorithm, ARank algorithm, is presented. Finally, the improved genetic algorithm and PageRank algorithm are verified on the data set. The experimental results show that the improved genetic algorithm converges faster than the traditional algorithm. The result of the improved PageRank algorithm is more reasonable than that of the traditional method.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP301.6

【参考文献】