基于DFSTree邻域索引的大图匹配算法研究

发布时间：2018-11-26 12:21

【摘要】：图结构模型常被用于互联网,生物,计算机视觉和化学等领域中描述复杂的数据对象及对象之间的关联关系。图搜索在图计算研究领域扮演着十分重要的角色,图搜索可根据其应用场景的不同分为两类:子图查询与子图匹配,其中子图查询是在图数据库中找出所有与查询图匹配的超图集合并返回;子图匹配则是从图数据库中找到所有与查询图同构的子图集并返回。无论是图搜索中的图查询亦或是图匹配,均需要借助于子图同构算法来对查询图与返回结果中的图进行同构判定。众所周知子图同构问题是NP完全问题,在业界中图同构问题是通过搜索剪枝的快速启发算法来实现。基于搜索的子图同构算法在时间开销过大,所以若对数据库中的所有图逐一扫描并判断同构关系是不现实的。针对这一问题科研人员提出并运用的图索引技术,通过为图构建索引来将一部分不满足查询匹配条件的图过滤掉。通过这种方式将需要执行同构检测算法的图集规模尽可能的减少,以此来从全局提升图匹配查询算法的执行效率。科研人员也基于图索引算法提出了现今最为常用的,基于过滤-验证的图匹配查询的算法框架。传统基于挖掘与非挖掘图索引算法受限于图规模大小,当图规模变的十分庞大时便失去研究价值。在本文中,我们首先提出了一种基于树型结构的邻域索引,并通过该索引用来提升图搜索算法的执行效率。同时将邻域索引作为通用的模型,对基于邻域索引的图匹配框架进行抽象并总结出其算法框架的代价估算函数。不仅如此,我们还参照于使用十分广泛的VF2子图同构检测算法框架设计出了一种以树型索引项为单位的图匹配算法,并通过该算法来获取被查询图中所有与查询图满足子图同构的映射关系集合。在文章的最后,通过实验对比数据来验证我们所提出的基于邻域模式的DFSTree索引和基于该索引的图重构匹配算法要优于当今主流的基于邻域模式的SPath索引算法。在实验阶段我们分别在索引构建时间与空间,索引过滤候选集大小,以及基于索引的图匹配查询算法在返回匹配结果的平均响应时间与算法返回的匹配结果来进行对比分析。通过实验结果数据来验证得到,基于树型结构的DFSTree邻域索引虽然在构建时间与空间上稍逊色于当今主流基于最短路径的邻域索引SPath算法,但是在查询响应时间与过滤后对候选集中呈负相关的候选元素的过滤强度而言都优于SPath算法。
[Abstract]:Graph structure models are often used to describe complex data objects and their relationships in the fields of Internet, biology, computer vision and chemistry. Graph search plays a very important role in the field of graph computing. Graph search can be divided into two categories according to its application scenarios: subgraph query and subgraph matching. The subgraph query is to find out all the hypergraph sets matching the query graph in the graph database and return them. Subgraph matching is to find all the subsets isomorphic to the query graph from the graph database and return. Whether it is graph query or graph matching in graph search, it is necessary to use subgraph isomorphism algorithm to determine the isomorphism between the query graph and the graph in the returned result. It is well known that the subgraph isomorphism problem is a NP complete problem. In the industry, the graph isomorphism problem is realized by the fast heuristic algorithm of searching pruning. The time cost of the search based subgraph isomorphism algorithm is too large, so it is not realistic to scan all the graphs in the database one by one and judge the isomorphism relationship. Aiming at this problem, the graph index technology proposed and applied by researchers, by constructing indexes for graphs, filters out some graphs that do not meet the query matching conditions. In this way, the scale of graph set that needs to perform isomorphism detection algorithm will be reduced as much as possible, so as to improve the execution efficiency of graph matching query algorithm globally. Based on graph indexing algorithm, researchers also proposed the most commonly used, filter-based graph matching query algorithm framework. The traditional algorithms based on mining and non-mining graph index are limited by the size of the graph and lose the research value when the scale of the graph becomes very large. In this paper, we first propose a tree structure based neighborhood index, which is used to improve the execution efficiency of graph search algorithm. At the same time, the neighborhood index is used as a general model to abstract the graph matching framework based on neighborhood index and sum up the cost estimation function of its algorithm framework. Not only that, we also design a graph matching algorithm based on tree index terms according to the widely used framework of VF2 subgraph isomorphism detection algorithm. The algorithm is used to obtain all the mapping relation sets of the queried graph which are isomorphic to the subgraph of the query graph. At the end of the paper, we compare the experimental data to verify that our proposed DFSTree index based on neighborhood schema and graph reconstruction matching algorithm based on this index are better than the current mainstream SPath index algorithm based on neighborhood schema. In the experiment stage, we compare and analyze the time and space of index construction, the size of index filter candidate set, the average response time of graph matching query algorithm based on index and the matching result returned by the algorithm. The experimental results show that the DFSTree neighborhood index based on the tree structure is slightly inferior to the current SPath algorithm based on the shortest path in the construction time and space. However, the filtering intensity of candidate elements with negative correlation between query response time and filtering time is better than that of SPath algorithm.
【学位授予单位】：辽宁大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：O157.5

【参考文献】