异构图中的Top-K兴趣子图匹配算法研究
本文选题:异构图 + 静态Top-K子图匹配 ; 参考:《辽宁大学》2017年硕士论文
【摘要】:图是一种用于描述实体间复杂关联关系的通用数据结构,而异构图是一种带标签的特殊图结构,在多种不同实体组成的复杂数据建模中被广泛应用,例如信息网络、生物网络等实际应用领域。在异构数据图中如何进行有效的子图匹配成为近些年的研究热点。子图匹配是用户给定查询图,从大数据图中找到与查询图的结构相同且节点标签相同的所有子图。但在实际应中,人们往往关注兴趣度比较高的匹配结果,因此引出更具针对性的Top-K兴趣子图匹配问题。Top-K兴趣子图匹配主要分为两个部分,一是根据查询图在数据图中找到所有的匹配子图;二是将所有的匹配子图根据兴趣度排名获得兴趣度最大的K个兴趣子图。本文针对异构图中的Top-K兴趣子图匹配问题进行研究发现,目前已有方法存在若干问题,如仅涉及静态图上中针对无权查询图的匹配问题,没有考虑用户的个性化需求,即对有权查询图的匹配处理;在实际应用中,随着时间推移、实际应用语义的改变,图将发生拓扑结构的变化,即图的动态演进,对图的动态演进过程中的Top-K兴趣子图匹配研究关注很少。因此,为解决上述问题,本文考虑用户的个性化需求,提出一种高效的针对有权查询图的静态Top-K兴趣子图匹配方法;同时,针对图的动态演进过程中Top-K兴趣子图匹配问题展开研究,提出一种有效的基于增量的动态Top-K兴趣子图匹配方法。主要内容如下:(1)提出一种高效的静态Top-K兴趣子图匹配方法(ERWM)。首先,根据异构图的特性,提出一种类邻接表的储存结构存储异构数据图;其次,建立两种新异的索引:节点的拓扑结构特性索引(NTFI)、边特性索引(EFI),充分利用它们提供的节点和边的信息,在查询检索阶段过滤掉不合格(不匹配)的节点和边,获得相对较少的候选集,避免了匹配阶段进行不必要的节点、边连通性检测,从而减少了算法的冗余匹配步骤;最后,提出查询图边的标签设定方法,利用其确定查询边进行子图匹配验证的顺序,同时引入RWM算法的边匹配边排序的思想,从而提高图的子图匹配效率,减少冗余的子图匹配步骤。(2)提出一种有效的动态Top-K兴趣子图匹配方法(DRWM)。该方法在ERWM算法基础上进行动态扩展。首先,利用滑动窗口模型处理动态异构图的局部动态变化数据;其次,利用滑动窗口的思想,实现基于图的增量变化动态更新的Top-K兴趣子图匹配。(3)在模拟数据集和真实数据上,将本文提出的方法与现有的RAM和RWM方法进行对比,分别对算法索引构建、Top-K子图匹配的实验结果进行比较和分析,验证了本文提出的Top-K兴趣子图匹配方法在模拟和现实网络上,都具有较好的查询匹配能力,Top-K兴趣子图匹配的效率得到了极大的提高。
[Abstract]:A graph is a general data structure used to describe complex relationships between entities, while a different composition is a special graph structure with labels, which is widely used in complex data modeling of many different entities, such as information networks. Biological networks and other practical applications. How to perform effective subgraph matching in heterogeneous data graph has become a hot topic in recent years. Subgraph matching is a user given query graph, from big data graph to find the same structure as the query graph and the same node label of all subgraphs. However, in practice, people tend to pay attention to the matching results with high interest degree, so the more targeted subgraph matching problem of Top-K interest. Top-K interest subgraph matching is divided into two parts. One is to find all the matching subgraphs in the data graph according to the query graph, and the other is to get K interest subgraphs with the highest interest degree according to the interest rank of all the matching subgraphs. In this paper, we study the problem of Top-K subgraph matching in heterogeneous graphs, and find that there are some problems in the existing methods, such as the matching problem of unauthorized query graphs in static graphs, and not considering the personalized needs of users. In practical application, the topology of graph will change with the change of semantic of actual application, that is, the dynamic evolution of graph. Little attention has been paid to subgraph matching of Top-K interest in the dynamic evolution of graphs. Therefore, in order to solve the above problems, this paper proposes an efficient matching method of static Top-K interest subgraph for the authorized query graph, considering the personalized requirements of the user. At the same time, Based on the research of subgraph matching of Top-K interest in the process of dynamic evolution of graphs, an effective method of dynamic subgraph matching of Top-K interest based on increment is proposed. The main contents are as follows: 1) an efficient static Top-K interest subgraph matching method is proposed. Firstly, according to the characteristics of heterogeneous graphs, a storage structure of a class of adjacent tables is proposed to store heterogeneous data graphs. Two kinds of new and different indexes are established: the topological structure characteristic index of nodes and the edge characteristic index of EFI, which make full use of the information of nodes and edges provided by them, and filter out the unqualified nodes and edges in the query and retrieval stage. Obtain relatively few candidate sets, avoid unnecessary nodes in the matching stage, edge connectivity detection, thereby reducing the redundant matching steps of the algorithm. Finally, a label setting method of query graph edge is proposed. Using it to determine the order of subgraph matching verification by query edge, and introducing the idea of edge matching edge sorting of RWM algorithm to improve the matching efficiency of subgraph. This paper presents an effective subgraph matching method for dynamic Top-K. The method is dynamically extended on the basis of ERWM algorithm. Firstly, the sliding window model is used to deal with the local dynamic change data of the dynamic isomerous graph. Secondly, using the idea of sliding window, the Top-K interest subgraph matching. 3) based on the incremental change of graph, is realized in the simulation data set and the real data. The proposed method is compared with the existing RAM and RWM methods, and the experimental results of constructing a Top-K subgraph matching algorithm are compared and analyzed respectively. It is verified that the proposed method of Top-K interest subgraph matching is in the simulated and realistic network. Both have better query matching ability and the efficiency of Top-K interest subgraph matching has been greatly improved.
【学位授予单位】:辽宁大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:O157.5
【参考文献】
相关期刊论文 前10条
1 杨艳;纪安娜;金虎;;大规模数据图上的个性化子图匹配算法[J];计算机研究与发展;2015年S1期
2 罗清华;焉晓贞;彭宇;彭喜元;;基于滑动窗口模式匹配的动态距离估计方法[J];仪器仪表学报;2015年03期
3 王楠;王斌;李晓华;杨晓春;;支持动态图数据的子图查询方法[J];计算机科学与探索;2014年02期
4 王爽;王国仁;;基于滑动窗口的Top-K概率频繁项查询算法研究[J];计算机研究与发展;2012年10期
5 汤克明;戴彩艳;陈];;一种基于滑动窗口的不确定数据流Top-K查询算法[J];南京大学学报(自然科学版);2012年03期
6 张忠平;王浩;薛伟;夏炎;;动态滑动窗口的数据流聚类方法[J];计算机工程与应用;2011年07期
7 张硕;高宏;李建中;邹兆年;;不确定图数据库中高效查询处理[J];计算机学报;2009年10期
8 周傲英;金澈清;王国仁;李建中;;不确定性数据管理技术研究综述[J];计算机学报;2009年01期
9 宋宝燕;张衡;于洋;奚丽娜;王大玲;;基于滑动窗口的支持泛在应用的流聚类挖掘算法[J];小型微型计算机系统;2008年12期
10 常建龙;曹锋;周傲英+;;基于滑动窗口的进化数据流聚类[J];软件学报;2007年04期
相关博士学位论文 前1条
1 苗又山;大规模动态演化图的存储与分析系统研究[D];中国科学技术大学;2015年
相关硕士学位论文 前1条
1 单晓欢;大规模演化图的分割技术研究[D];辽宁大学;2014年
,本文编号:1783993
本文链接:https://www.wllwen.com/kejilunwen/yysx/1783993.html