基于图论的复杂网络社团挖掘与结构分析
发布时间:2017-12-26 20:35
本文关键词:基于图论的复杂网络社团挖掘与结构分析 出处:《西安电子科技大学》2016年博士论文 论文类型:学位论文
【摘要】:网络科学是当前研究现实世界事物之间关系的有力工具之一。将复杂系统建模为复杂网络是进一步分析和研究现实世界对象之间关系的前提,其中点表示复杂系统中的对象,边表示对象之间的关系。面对建模成的复杂网络,迫切的问题是探索网络中蕴含的信息,从中等尺度角度揭示网络的组成及其物理意义,即如何从复杂网络中提取元数据组。元数据组是一组具有真实物理意义的节点,可通过实验验证节点的各种子集是否具有相同或相似的功能提取元数据组。由于材料成本、人工成本、时间成本等客观限制,面对当前的海量数据这种枚举式的验证是不可能完成的任务。因此人们将目光转向计算的方法,具体的流程是先将要研究的复杂系统建模成复杂网络,对其中的元数据组进行建模如通常建模为社团或模块,并设计相应的社团挖掘算法。通过计算方法,可方便、快速、低成本地提取复杂系统中的元数据组。本质上计算方法包含两个核心科学问题:首先是如何定义社团,即根据网络中元数据组的特性怎样描述社团的问题,如传统地将社团定义为内部边连接紧密外部边稀疏的稠密子图;其次是如何设计出高效的社团挖掘算法。本文主要基于图论围绕社团挖掘的两个核心科学问题开展研究,提出了边反三角中心性并基于此设计了社团挖掘算法EACH,定义了两种新型社团,即Cograph社团和2-club社团,并设计了相应的挖掘算法EPCA和DIVANC,最后给出2-club社团在蛋白质相互作用网络中功能模块结构分析中的应用。主要内容详述如下:1.针对当前经典的社团挖掘算法主要基于内紧外松的社团设计,且较多的考虑如何设计衡量社团‘内紧’的相关指标而对‘外松’关注不够的问题。本文从路径外切的角度提出用来衡量社团外松程度的边反三角中心性,进一步基于此设计划分的社团挖掘算法EACH。边反三角中心性定义为该边参与形成的P4(由4个节点,3条边组成的简单路径)数目与该边参与的可能的P4数目之比。EACH迭代地删去边反三角中心性最高的边直到当前所有边的边反三角中心性全部为0,并基于孤立点处理策略将删边过程中的孤立点加入到当前合适的连通分支中,输出的连通分支即为社团。该算法简单高效,挖掘的社团紧凑,物理意义显著。2.以稠密子图为中尺度结构研究复杂网络中元数据组时忽略了元数据组内部的结构,然而其对理解该元数据组的功能形成机制及运作模式至关重要。为了探索元数据组各成员之间的内部拓扑结构,定义了具有图论特征的Cograph社团,其结构唯一地对应着Cotree。基于Cotree,可进一步清晰地揭示Cograph社团中规模和层次可变且结构等价的子结构,为用户提供从微观尺度到中观尺度连续观察社团内部结构构造的独特视角。本文提出边P4中心性并基于此设计了挖掘Cograph社团的划分算法EPCA。边P4中心性定义为该边参与形成P4的数目,由于P4是由4个节点3条边顺序相连组成的简单无环路径,所以若边P4中心性较大,这条边可能是连接社团之间的边。EPCA迭代地删去边P4中心性最高的边,删到当前网络中所有边的边P4中心性为0停时结束,当前连通分支全为Cograph,输出其作为Cograph社团。源于边P4中心性,EPCA计算成本低、无参数、不依赖任何外部的度量。3.面对当前复杂网络社团挖掘研究中遇到的挑战:内紧外松的稠密子图并不能刻画元数据组的本质特征,以及受蛋白质相互作用网络中的元数据组由稠密的相互作用三元组模体组成启发,本文定义了富含相互作用三元组的2-club社团刻画复杂网络中的元数据组。2-club社团是连通的2-club,而2-club是直径为2的子图。基于边小生境中心性设计了算法DIVANC挖掘2-club社团,进一步提出简单的两步重叠策略将DIVANC扩展成可挖掘重叠的2-club社团。构造边小生境中心性时考虑其参与的P4数目和其参与形成的三角形数目。边小生境中心性较大,该边可能是社团之间的边。DIVANC迭代地删去边小生境中心性最高的边,删到当前网络中所有边的边小生境中心性为0停时结束,当前连通分支全为2-club,输出非重叠2-club社团,若需要重叠2-club社团,执行两步重叠策略,得到重叠2-club社团。4.透彻地理解PPI网络中的功能模块仍然是当前系统生物学研究中的一个重要挑战。针对当前大多数关于模块的分析主要集中在如何从整个PPI网络中挖掘具有生物意义的功能模块,而较少关注其内部结构,没有给出其内部结构与功能的关联分析的问题,本文基于图论分析功能模块的内部结构,揭示功能模块的相关特征情况,并进一步推断其在整个PPI网络中的功能,给出了理解PPI网络中功能模块组织结构和特性分布的新视角。
[Abstract]:Network science is one of the powerful tools to study the relationship between things in the real world. Modeling complex systems as complex networks is a prerequisite for further analysis and study of relationships between real-world objects, where points represent objects in complex systems, and edges represent relations between objects. Faced with the complex network modeled, the urgent problem is to explore the information contained in the network, and reveal the composition and physical meaning of the network from a moderate scale. That is to say, how to extract metadata from complex networks. Metadata group is a set of nodes that have real physical meaning. It can verify whether all kinds of subset of nodes have the same or similar functions to extract metadata group by experiment. Because of the objective limitations of material cost, labor cost and time and cost, it is impossible to complete the enumerated verification in the face of the current mass data. Therefore, people turn their attention to the calculation method. The specific process is to first model the complex system to be complex network, and model the metadata group, such as usually modeling as a community or module, and design the corresponding community mining algorithm. Through the calculation method, the metadata group in the complex system can be extracted conveniently, fast and low. The essence of calculation method consists of two core scientific problems: the first is how to define the community, according to the characteristics of the network in data set to describe associations, such as the traditional community is defined as the internal side is tightly connected with the external side of the sparse dense sub graph; second is how to design an efficient algorithm for mining association. In this paper, two key scientific problems in graph theory on Community Mining Based on the research, put forward and reverse triangle edge center is designed based on this association mining algorithm EACH, defines two new clubs, namely the Cograph community and the 2-club community, and designed the corresponding mining algorithm EPCA and DIVANC application are presented in the 2-club community the function in the protein interaction network module in structural analysis. The main contents are as follows: 1., aiming at the current classical community mining algorithm, mainly based on tight and loose community design, and much consideration is given to how to design related indicators to measure the "tightness" of the society. From the angle of path cutting, this paper puts forward the edge anti triangle centrality which is used to measure the degree of community loosening, and further is based on the association mining algorithm EACH of this design. The side inverse triangle centrality is defined as the ratio of the number of P4 (a simple path composed of 4 nodes, 3 sides) to the possible number of P4 that the side participates in. EACH iteratively cuts out the top edge of the reverse triangle center, until the edges of all the edges are all 0, and based on the outlier processing strategy, we add the outliers in the edge deletion process to the suitable connected branches, and the output connected components are the communities. The algorithm is simple and efficient, the mining community is compact and the physical meaning is significant. 2., the dense subgraph as a mesoscale structure is used to study the meta data set in complex network. It ignores the internal structure of metadata group. However, it is very important for understanding the function mechanism and operation mode of the metadata group. In order to explore the internal topology between the members of the metadata group, the Cograph community with graph theory features is defined, whose structure is uniquely corresponding to the Cotree. Based on Cotree, we can further reveal clearly the variable and equivalent sub structure of Cograph community in terms of scale and hierarchy, and provide users with a unique perspective to continuously observe the internal structure of communities from microscale to mesoscale. In this paper, the edge P4 centrality is proposed and based on this, the partition algorithm for mining Cograph community is designed, EPCA. The edge P4 centrality is defined as the number of P4 that the side participates in, because P4 is a simple acyclic path consisting of 4 nodes and 3 edges sequentially, so if the center of the edge P4 is large, the edge may connect the edges between the communities. EPCA iteratively eliminates the edges with the highest P4 edge, and deletes the edges of all the edges in the current network. The P4 centrality is 0 at the end of the stopping time, and the current connected branches are all Cograph, and output them as Cograph communities. Due to the edge P4 centrality, EPCA has low computing cost, no parameter, and no external measurement. 3. in the face of the current complex network community mining research challenges: the essential characteristics of the tight loose dense subgraph can not describe the metadata group, and metadata group in protein interaction networks by the interaction of dense three tuple module body inspiration, this paper in the interaction of three tuple meaning 2-club Association describe the set of metadata in complex networks. The 2-club community is a connected 2-club, and 2-club is a subgraph with a diameter of 2. Based on the niche niche design, we design algorithm DIVANC to mine 2-club communities, and further propose a simple two step overlap strategy to extend DIVANC to 2-club communities that can be mined. The number of P4 and the number of triangles involved in the formation of the niche centrality are considered. The side niche is centrality, which may be the edge of the community. DIVANC iteratively deleting edges of the highest niche center, to delete all the edges in the network side of the center of the end of the 0 niche to stop, the connectedness of all 2-club, the output of non overlapping 2-club community, if need to overlap the 2-club community, the implementation of the two step strategy to overlap, overlapping the 2-club community. 4. a thorough understanding of the functional modules in the PPI network is still an important challenge in the current research of system biology. For most of the current on the module of the analysis focused on how to dig out from the entire PPI network function module has a biological significance, and less concerned about its internal structure, correlation analysis has given its internal structure and function of the problem, this paper analyze the internal structure of graph theory based on function module, function module is revealed
【学位授予单位】:西安电子科技大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP311.13;O157.5
【相似文献】
相关期刊论文 前10条
1 钟柯;肖昱;许s,
本文编号:1338770
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1338770.html
教材专著