基于密度聚类的复杂网络社团结构探测算法与应用
本文选题:复杂网络 + 社团结构探测 ; 参考:《中央财经大学》2016年博士论文
【摘要】:复杂网络理论为各学科领域中的复杂系统研究提供了简单而有效的研究方法,在近十多年里得到了学术界的高度关注。不同领域中看似迥异的真实系统在不断变化发展中会自然形成共同的特征,比如节点度的幂律度分布特征、小世界特征和社团结构特征等。由于突破了传统还原论将对象孤立研究的局限性,复杂网络能够准确刻画真实系统中的关键特征,本文研究的出发点就是网络的社团结构特征。一方面复杂网络中的社团结构可以对应于真实系统中自然形成的子系统,比如蛋白质合作网络会形成对应不同器官的功能模块,科学家合作网络会形成不同学科的研究群体,社交网络会形成基于不同兴趣爱好的俱乐部等。另一方面学术界对于复杂网络社团结构的主流认识只停留在直观层面,即把社团结构看作以更高概率相互连接的节点的集合,同时现有的社团结构探测算法也存在很多问题亟待解决,因此复杂网络社团结构研究仍然是一个开放性的课题,研究其探测算法具有重要的理论意义和现实意义。类似于聚类算法,社团结构探测算法也属于无监督学习的范畴,其目标在于把网络中的节点赋值到不同的社团中,因此基于聚类方法的理论框架提出社团结构探测算法是一个有前景的思路。密度聚类相比传统聚类技术有天然的优势,比如DBSCAN相比K-means来说无需类簇个数作为先验参数,不会落入局部最优陷进,抗噪声等,然而DBSCAN的聚类结果对两个参数的取值非常敏感,因此密度聚类的参数选择问题一直是机器学习领域亟待解决的难题。2014年发表在Science的论文中提出了新的密度聚类算法,即快速密度峰值算法(FDP:Fast Density Peak),该算法在保留DBSCAN优点的同时巧妙的克服了其参数敏感问题,为无监督学习提供了新的研究思路。受到这样的启发,本文拟利用FDP算法探测复杂网络中的社团结构。然而本文通过理论分析和数值实验发现,由于网络节点分布在极端高维空间中,导致FDP算法无法识别社团核心节点,从而无法直接应用于社团结构探测任务。除此之外,本文进一步发现当网络结构变得模糊或社团数量过多时FDP算法的决策图机制很难通过人机交互的方式准确识别社团个数。基于以上两点事实,本文提出了基于流形学习框架的ISOFDP算法,该算法有效克服了网络数据维数灾难的问题从而使得社团核心点显著区别于其它成员节点,同时提出了改进的分割密度函数以作为自动识别网络社团个数的依据,最终ISOFDP算法相比经典算法取得了更好的社团结构探测结果。更进一步的,本文将ISOFDP算法框架进行推广,分别提出了能够探测重叠社团结构、有权有向网络社团结构和动态网络社团结构的算法,并取得了优于经典算法的实验结果。本论文的内容包括九章:第1章介绍了本论文的选题背景和研究意义,并说明了本论文的研究框架和创新点。第2章介绍了复杂网络的基本理论和相关概念,然后对几类社团结构探测算法进行了综述。第3章分析了密度聚类算法fdp在探测社团结构时的缺陷,研究了克服其缺陷的方法之后提出了社团结构探测算法isofdp。第4章基于l-isomap、lle、le和lpp四种局部流形学习算法提出了快速社团结构探测算法,克服了isofdp计算性能低下的缺陷。第5章首先定义了非核心节点的社团隶属度测度,根据节点隶属度和阈值η的对比情况识别出网络的重叠节点,以此提出能够探测重叠社团结构的isofdpov算法。第6章基于信号理论的方法计算网络节点的相似度测度,该相似度充分利用了网络结构中连边的权重和方向信息,以此为基础提出了有权有向网络的社团结构探测算法isofdpdw。第7章结合时间维度的信息利用3-阶张量对所有时间点上的网络序列建模得到邻接张量,然后利用非负张量分解模型把邻接张量分解为节点社团隶属度矩阵和社团动态行为矩阵,以此求得动态社团张量,基于其各纵向切片探测各时间点上的社团结构和社团变化规律,最终提出了动态网络社团结构探测算法tenfdp。第8章基于2006年6月16日至2014年7月9日中信行业指数收盘价日数据,计算指数收益率之间的相关系数并以此为连边得到中信行业指数的平面最大过滤图(pmfg)网络,利用isofdpdw算法探测该网络金融危机前中后三个阶段的社团结构,以揭示金融危机对该网络社团结构的影响。第9章对全文进行总结,并给出了本文现阶段提出算法的缺点和不足,同时对本文算法的进一步推广和改进工作进行了展望。本文的主要创新点包括以下几个方面:第一,提出了一种基于密度聚类的社团结构探测算法isofdp。该算法无需社团个数作为先验信息,且在准确度、参数敏感性和稳健性3个方面都优于经典算法。isofdp继承了密度聚类算法fdp的优点同时克服了其缺点。首先通过理论分析和数值实验发现网络节点具备高维数据的特点,即节点之间的距离高度同质化导致fdp无法识别社团核心节点,本文假设存在低维的本征维度可以保存原网络的社团结构特征,提出使用经典流形学习算法isomap推断出网络在其本征维度上的映射,克服同质化距离的问题。本文进一步发现当网络结构变得模糊或社团数量太多时,在fdp的决策图上通过人机交互的方法无法正确识别社团个数,本文提出了改进的分割密度函数以自动识别社团个数,避免人机交互的过程。第二,提出了一种新的重叠社团结构探测算法ISOFDPOV。首先利用ISOFDP算法对网络进行硬分割,可以得到社团核心节点以及每个社团的成员节点。本文定义了成员节点的社团隶属度测度,该隶属度测度可以衡量节点归属于某社团的强度,然后根据各成员节点关于各社团的隶属度与阈值η的对比判断成员节点的社团归属情况,社团归属不唯一的节点为重叠节点,从而找到网络中的重叠社团结构。ISOFDPOV在人工和真实网络上取得了相比经典算法更好的社团探测结果。第三,提出了有权有向网络上的社团结构探测算法ISOFDPDW。首先分析了结构相似度只能利用二值对称邻接矩阵信息描述节点之间相似度的缺陷,提出使用信号相似度。信号相似度可以充分利用有权有向网络中连边的权重和方向信息,客观描述节点之间的相似度。另外使用有权有向模块度函数作为算法的收敛目标。最后在有权无向、无权有向和有权有向网络上验证了ISOFDPDW算法的有效性。第四,提出了基于非负张量分解模型的动态网络社团结构探测算法TenFDP。该算法假设每个时间片上的社团结构不仅与当前网络结构有关,还与上一时刻的社团结构有关。因此提出利用3-阶张量模型对每个时间点上的网络结构进行建模,得到领接张量以更好刻画网络结构随时间的变化。利用非负张量分解模型把领接张量分解为节点社团隶属度矩阵、社团动态行为矩阵,以此求得动态社团张量,基于该张量的每个纵向切片探测各时间点上的社团结构和社团变化规律。在5种类型的动态网络上验证了TenFDP算法的有效性。
[Abstract]:Complex network theory provides a simple and effective research method for the research of complex systems in the field of various disciplines. In the last more than 10 years, the academic circle has received high attention. The seemingly disparate real systems in different fields will naturally form the common characteristics in the changing development, such as the distribution of the power law degree of the node degree, the small world Characteristics and community structure characteristics. Because of the limitation of the traditional reductionism, the complex network can accurately depict the key features of the real system. The starting point of this study is the community structure characteristics of the network. On the one hand, the social solidarity structure in the complex network can correspond to the natural formation of the real system. The subsystems, such as the protein cooperation network, will form functional modules for different organs. The cooperative network of scientists will form different subjects, and social networks will form clubs based on different interests. On the other hand, the mainstream understanding of the complex structure of the complex network society is only in the visual level, that is, the society. The group structure is regarded as a set of nodes connected by higher probability, and there are many problems to be solved urgently. Therefore, the study of complex network community structure is still an open issue. It is of great theoretical meaning and practical significance to study its detection algorithm. The method of construction and measurement is also a category of unsupervised learning. Its goal is to assign nodes in the network to different societies. Therefore, it is a promising idea to propose a community structure detection algorithm based on the theoretical framework of clustering method. Density clustering has a natural advantage over traditional clustering techniques, such as DBSCAN compared with K-means. The number of clusters is required as a priori parameters, which will not fall into the local optimal trap and resist noise. However, the clustering results of DBSCAN are very sensitive to the values of the two parameters. Therefore, the parameter selection problem of density clustering has been a difficult problem to be solved in the field of machine learning. A new density clustering algorithm is proposed in the paper published in Science in.2014. The fast density peak algorithm (FDP:Fast Density Peak), which overcomes its parameter sensitivity while preserving the advantages of DBSCAN, provides new research ideas for unsupervised learning. Inspired by this, this paper uses FDP algorithm to detect the community structure in complex networks. However, this paper is based on theoretical analysis and numerical value. It is found that because the network nodes are distributed in extremely high dimensional space, the FDP algorithm can not identify the core nodes of the community, and can not be applied directly to the community structure detection task. In addition, this paper further finds that the decision drawing mechanism of the FDP algorithm is difficult to pass through human-computer interaction when the network structure becomes blurred or the number of societies is too large. Based on the above two facts, the ISOFDP algorithm based on the manifold learning framework is proposed in this paper. The algorithm effectively overcomes the problem of the network data dimension disaster and makes the community core distinctions distinctions from other member nodes. At the same time, the modified segmentation density function is proposed as an automatic identification network. According to the basis of the number of associations, the final ISOFDP algorithm has obtained a better detection result of the community structure compared with the classical algorithm. Further, this paper extends the framework of the ISOFDP algorithm, and puts forward the algorithms that can detect the overlapping community structure, have the right to the network community structure and the dynamic network community structure, and get better than the classic calculation. The contents of this paper include nine chapters: the first chapter introduces the background and significance of this thesis, and explains the research framework and innovation of this paper. The second chapter introduces the basic theory and related concepts of complex networks, and then summarizes several kinds of community structure detection algorithms. The third chapter analyzes density clustering. The defect of algorithm FDP in detecting community structure, and after studying the method to overcome its defects, we put forward the association structure detection algorithm isofdp. fourth chapter based on four local manifold learning algorithms, l-isomap, LLE, le and LPP, put forward the fast community structure detection algorithm, overcome the defects of the low performance of isofdp computing. In the fifth chapter, the non kernel is first defined. The degree of membership degree of the heart node is measured, and the overlapping nodes of the network are identified according to the degree of the node membership and the threshold value. The isofdpov algorithm that can detect the overlapping community structure is proposed. The sixth chapter calculates the similarity measure of the network nodes based on the signal theory. The similarity degree makes full use of the right of the edge of the network structure. Based on the information of weight and direction, this paper proposes a community structure detection algorithm with weighted directed networks isofdpdw. seventh chapter combined with the time dimension information using the 3- order tensor to build the adjacency tensor for the network sequence modeling at all time points, and then decomposes the adjacent tensor into the membership degree matrix of the node community by using the nonnegative tensor decomposition model. The dynamic community behavior matrix is used to obtain the dynamic community tensor. Based on its longitudinal slices, the community structure and the law of community change are detected. Finally, the dynamic network community structure detection algorithm tenfdp. eighth chapter is based on the closing price day data of the CITIC industry index from June 16, 2006 to July 9, 2014, and calculates the index income. The correlation coefficient between the rate and the PMFG network of the CITIC industry index is used to detect the community structure of the three stages of the network before the financial crisis by isofdpdw algorithm to reveal the impact of the financial crisis on the structure of the network community. The ninth chapter summarizes the full text and gives the present stage of this paper. The shortcomings and shortcomings of the algorithm are proposed, and the further promotion and improvement of the algorithm are prospected. The main innovation points of this paper include the following aspects: first, a community structure detection algorithm based on density clustering isofdp. is proposed, which does not need the number of communities as a priori information, and is sensitive to the parameters and is sensitive to parameters. The 3 aspects of nature and robustness are superior to the classical algorithm.Isofdp, which inherit the advantages of the density clustering algorithm FDP and overcome their shortcomings. First, the characteristics of the network nodes with high dimensional data are found through theoretical analysis and numerical experiments, that is, the homogeneity of the distance between nodes leads to the non method recognition of the core nodes of the community by FDP. This paper assumes the existence of the existence of the core nodes of the community. The low dimensional eigendimensions can preserve the community structure characteristics of the original network, and propose the use of the classical manifold learning algorithm Isomap to deduce the mapping of the network on its eigendimensions and overcome the problem of homogeneity distance. This paper further finds that when the network structure becomes blurred or the number of societies is too large, the interaction of human-computer interaction on the FDP's decision map is found. The method can not correctly identify the number of community. In this paper, an improved segmentation density function is proposed to automatically identify the number of communities and avoid the process of human-computer interaction. Second, a new overlapping community structure detection algorithm ISOFDPOV. is proposed, first of all, the ISOFDP algorithm is used to cut the network hard, and the core nodes of the community and each community can be obtained. The membership degree of member nodes is defined in this paper. The membership degree measure can measure the strength of the node belonging to a community. Then, according to the membership degree of each member's Association and the threshold value, the membership node is judged by the comparison of the membership degree. The overlapping community structure in the network.ISOFDPOV has been obtained on the artificial and real network, which is better than the classical algorithm. Third, the community structure detection algorithm on the weighted directed network ISOFDPDW. is proposed. First, the similarity degree of the structure similarity can only be described by the two value pair called adjacency matrix information. The signal similarity is used. The signal similarity can make full use of the weight and direction information of the connected edges in the network, and objectively describe the similarity between the nodes. In addition, the weighted directed modularity function is used as the convergence target of the algorithm. Finally, the ISOFD is proved to be unweighted and unauthorized and entitled to the network. The effectiveness of the PDW algorithm. Fourth, a dynamic network community structure detection algorithm based on the non negative tensor decomposition model TenFDP. is proposed. The algorithm assumes that the community structure on each time slice is not only related to the current network structure, but also is related to the community structure at the last time. Therefore, it is proposed to use the 3- order tensor model to the network at each time point. The network structure is modeled and the collar tensor is obtained to better describe the change of network structure with time. By using the nonnegative tensor decomposition model, the collar tensor is decomposed into the membership degree matrix of the node community, the dynamic behavior matrix of the community is used to obtain the dynamic community tensor, and the community structure on each time point of the tensor is detected based on each longitudinal section of the tensor. The effectiveness of the TenFDP algorithm is verified on 5 types of dynamic networks.
【学位授予单位】:中央财经大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:O157.5
【相似文献】
相关期刊论文 前8条
1 张颖;陈其峰;曹小林;蔡灵仓;陈栋泉;卢铁城;;微喷射粒子团簇探测算法及其应用[J];计算物理;2006年04期
2 袁斌;李敬宏;许琰;王弘X;;各向异性光源的辐射探测算法[J];计算物理;2007年03期
3 管志强;陈钱;钱惟贤;胡永生;;一种背景自适应调整的弱点目标探测算法[J];光学学报;2007年12期
4 雷震,吴玲达,老松杨;一种新的基于背景色度的镜头边界探测算法[J];应用科学学报;2004年03期
5 杨晓云;梁鑫;岑敏仪;;适用于不规则DEM数据的粗差探测算法[J];自然科学进展;2007年04期
6 郭婕;刘军;王宝林;;基于机器视觉的早期火焰探测算法研究[J];西北大学学报(自然科学版);2008年04期
7 叶锡恩;毛科益;夏银水;;一种新的用于探测Pure Reed-Muller逻辑的算法[J];浙江大学学报(理学版);2007年03期
8 吴林林;;新一代天气雷达冰雹探测算法及在业务中的应用[J];气象;2006年01期
相关会议论文 前3条
1 唐文杰;骆志刚;陆斌;李聪;;一种基于高光谱压缩数据的亚像元级目标探测算法[A];中国通信学会第六届学术年会论文集(下)[C];2009年
2 陈南;;智能建筑中火灾信息探测算法分析及应用[A];中国仪器仪表学会测控技术在资源节约和环境保护中的应用学术会议论文集[C];2001年
3 张辉;李国辉;陈俊;;一种基于新闻要素建模的新事件探测方法[A];第七届和谐人机环境联合学术会议(HHME2011)论文集【oral】[C];2011年
相关重要报纸文章 前1条
1 摩托罗拉公司提供;信号探测算法 提高蓝牙性能 降低干扰[N];电子资讯时报;2002年
相关博士学位论文 前1条
1 游涛;基于密度聚类的复杂网络社团结构探测算法与应用[D];中央财经大学;2016年
相关硕士学位论文 前10条
1 陈晓燕;基于多级探测算法的人体意外跌倒检测装置的开发[D];新疆大学;2015年
2 蓝方宇;基于微摄动与步态特征的人体探测算法研究[D];电子科技大学;2014年
3 高孝杰;引入空间信息的高光谱目标探测算法研究[D];成都理工大学;2016年
4 王宾宾;基于混合系统的航空器中期冲突探测方法研究[D];中国民航大学;2014年
5 牛进保;位置社会网络中重叠群组探测算法研究及并行化实现[D];华中科技大学;2015年
6 沈晓敏;光学分子影像仿真平台中探测算法的研究与实现[D];西安电子科技大学;2012年
7 吕曾望;非授权局域网拓扑探测算法的研究与实现[D];国防科学技术大学;2004年
8 秦薇薇;基于红外视频的火灾探测算法研究[D];西安建筑科技大学;2012年
9 马磊;大规模网络社团探测算法应用[D];华东师范大学;2012年
10 彭罡;强光背景下小目标探测算法研究[D];国防科学技术大学;2007年
,本文编号:1820166
本文链接:https://www.wllwen.com/kejilunwen/yysx/1820166.html