重叠社区发现中的边聚类算法研究
发布时间:2018-06-20 16:57
本文选题:重叠社区发现 + 社区发现 ; 参考:《吉林大学》2016年博士论文
【摘要】:重叠社区发现问题是复杂网络分析中很热门的一个问题。重叠社区发现旨在揭示复杂网络中的重叠的社区结构。重叠的社区结构接近真实网络中存在的社区结构,因此,对于重叠社区发现问题的研究是有着实际意义的。传统的重叠社区发现研究都是以节点作为主要研究目标。近年来,一些工作开始以边作为研究目标,旨在通过对网络中存在的边社区进行发现,进而达到解决重叠社区发现问题的目的。我们笼统地称这类算法为边社区发现算法。与传统的重叠社区发现算法不同,边社区发现算法将社区看作是由边构成的。由于将边作为研究对象,边社区发现算法在解决重叠社区发现问题时,会具有独特的优势。例如,通常,边在真实网络中是属于单一社区的。边社区发现算法在对边社区进行发现工作时,会自然地形成节点的重叠社区结构;近来的一些边社区发现算法会使用层次聚类算法对边进行聚类,这样做能够同时考虑到相应的节点的层次关系和重叠关系。边社区发现算法虽然有着独特优势,但仍存在着发现的边社区质量低、边相似关系考虑不全和节点的重叠度过高的问题。为进一步利用相似性关系拓展传统重叠社区发现策略,本文从边相似关系角度开展了针对边相似关系模型的阶段性改进工作,从考虑扩展边之间的关系的基于线图相似度,到考虑具有共同邻居边之间的关系的拓展余弦边距离,再到考虑具有公共邻居边之间的极值情况的极值非相邻边的边相似度,进行了递推式改进。(1)利用线图模型来对边聚类问题建模。考虑到边与边之间的相似关系不仅存在于具有共同节点的边之间,还存在于具有共同邻居的边之间以及不相邻的边之间。提出了一种基于线图的边相似度计算方法;结合马尔科夫聚类算法和基于线图的边相似度计算方法,提出了基于线图的边聚类LCLG算法;在真实网络和生物网络上的实验结果验证了LCLG算法的有效性。(2)考虑到具有公共邻居的边之间的相似关系,结合余弦相似度,提出了一个拓展的余弦边距离计算方法。将快速密度峰值搜索聚类算法引入到边社区发现中,并提出了基于盒图的社区中心边的自动选取策略;结合拓展的余弦边距离计算方法、快速密度峰值搜索聚类算法及基于盒图的社区中心边自动选取策略,提出了边密度聚类LDC算法;在真实网络上的实验结果表明,LDC算法能够发现具有良好模块度和高覆盖率的重叠社区。(3)考虑到具有共同邻居的边之间的两种极值情况,提出了极值非相邻边的边相似度计算方法;使用拓展的模块度评价指标来改善层次聚类算法的划分结果,并提出了极值非相邻边的边聚类MLC算法;在真实网络上的实验结果表明,MLC算法能够发现具有良好模块度的重叠社区结构。本文提出的LCLG算法、LDC算法和MLC算法,能够对复杂网络中的边社区进行发现,进而达到解决重叠社区发现问题的目的,在理论和实际应用方面都具有一定意义。
[Abstract]:Overlapping community discovery is a hot issue in complex network analysis. Overlapping community discovery aims to reveal overlapping community structures in complex networks. Overlapping community structures are close to the community structure in the real network. Therefore, the study of overlapping community discovery is of practical significance. In recent years, some work has begun to use the edge as a research goal. In recent years, the aim of the research is to find the side community in the network to solve the problem of overlapping community discovery. In general, we call this kind of algorithm a side community discovery algorithm. The edge community discovery algorithm takes the community as a side. As the edge is used as the research object, the edge community discovery algorithm has a unique advantage when solving the problem in the overlapping community. For example, the edge is a single community in the real network. The edge community discovery algorithm is found in the border community. It will naturally form the overlapping community structure of nodes; recently, some edge community discovery algorithms use hierarchical clustering algorithms to cluster edges, which can take into account the hierarchy and overlapping relations of the corresponding nodes. Although the edge community finds that the algorithm has a unique advantage, there is still a low quality of the border community found. In order to further utilize the similarity relationship to extend the traditional overlapping community discovery strategy, this paper develops a phase improvement work on the edge similarity relation model from the angle of edge similarity. The extended cosine edge distance between the neighbors and the neighbourhood is extended, and then the edge similarity of the extremal non adjacent edges with the extremal conditions between the common neighbors is considered, and a recursive improvement is carried out. (1) the linear graph model is used to model the edge clustering problem. In addition, a line graph based edge similarity calculation method is proposed. A line graph based edge clustering LCLG algorithm is proposed with a line graph and a line graph based edge similarity calculation method. Experimental results on real network and biological network are tested. The effectiveness of the LCLG algorithm is proved. (2) considering the similarity relation between the edges of the public neighbours and the cosine similarity, a extended cosine edge distance calculation method is proposed. The fast density peak search clustering algorithm is introduced into the border community discovery, and the automatic selection strategy based on the community center edge of the box graph is proposed. The extended cosine edge distance calculation method, the fast density peak search clustering algorithm and the box graph based community center edge automatic selection strategy are proposed, and the edge density clustering LDC algorithm is proposed. The experimental results on real network show that the LDC algorithm can find overlapping communities with good modularity and high coverage. (3) considering the common neighbour. Two extreme value cases between the sides of the house are presented, and the edge similarity calculation method of extreme value non adjacent edges is proposed. Using the extended module degree evaluation index to improve the division results of the hierarchical clustering algorithm and the edge clustering MLC algorithm of extreme value non adjacent edges is proposed. The actual results on the real network show that the MLC algorithm can find a good model. Overlapping community structure blocks. The proposed LCLG algorithm, LDC algorithm and MLC algorithm to complex edges in the network community, and achieve the purpose of solving the problem that the overlapping community, have certain significance in theory and practical application.
【学位授予单位】:吉林大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP311.13
【参考文献】
相关期刊论文 前1条
1 潘磊;金杰;王崇骏;谢俊元;;社会网络中基于局部信息的边社区挖掘[J];电子学报;2012年11期
,本文编号:2044979
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2044979.html