基于Spark的子图匹配算法研究与实现
发布时间:2018-01-20 02:20
本文关键词: Spark 子图匹配 图挖掘 并行算法 出处:《北京交通大学》2017年硕士论文 论文类型:学位论文
【摘要】:图作为一种由顶点和边构成的数据结构,能够简洁有力的表达事物之间的联系。随着大数据时代的到来,数据的规模以前所未有的速度增长着,Facebook、Twitter、微博等社交媒体每天都产生大量的社交图数据。如何处理如此大规模的图数据成为目前研究的热点。其中,子图匹配问题又是图数据处理领域中最为重要的问题,图的搜索,模式匹配等算法都需要子图匹配算法的支持。子图匹配的数学基础是图论中的经典问题子图同构,一个著名的NP完全问题。目前,虽然有一些学者给出了一些方法来实现子图匹配,但是这些方法只能处理小规模的图数据,在应对如今大规模的数据时,其匹配效率与可扩展性都有很大不足。另外,多数子图匹配算法应用于无向图,在有向图的应用上存在着不适用或效率低下的问题。针对以上问题。本文依托近些年来兴起的大数据平台,利用其提供强大的存储与计算能力,研究并实现了以大数据处理平台Spark作为处理引擎,应用GraphX组件处理超大规模图数据的子图匹配算法。该算法以HDFS为存储平台,有效解决了集群负载均衡问题;计算过程分为分布式过滤阶段与分布式验证阶段。分布式过滤阶段充分考虑Spark平台特性与GraphX以顶点为分割的图分割策略,提出顶点签名数据结构,通过并行比对顶点签名的方式实现对数据图快速高效过滤。其中,顶点签名中表达了自身与邻域信息,邻域中又区分父邻域与子邻域,提升了对有向图的过滤效果。分布式验证阶段利用Spark平台分布式计算优势,提出候选子图匹配区域概念,通过对查询图中心点的计算,在数据图中得到多个与查询图规模相当的候选子图匹配区域,将经过过滤的超大规模图数据进一步进行分割,在更小规模候选子图匹配区域中执行高效子图匹配操作。最后,通过实验表明,本文分布式子图过匹配算法具有很好的匹配效率与可扩展能力,在与目前优秀子图匹配算法VF2的对比实验中,本文算法具有很好的执行效率优势。
[Abstract]:As a data structure composed of vertices and edges, graphs can express the relationship between things simply and forcefully. With the arrival of big data era, the scale of data is growing at an unprecedented rate. Social media like Facebook Twitter and Weibo generate a lot of social graph data every day. The subgraph matching problem is also the most important problem in the field of graph data processing, graph search. The mathematical foundation of subgraph matching is the classical subgraph isomorphism in graph theory, a famous NP-complete problem. Although some scholars have given some methods to achieve subgraph matching, these methods can only deal with small-scale graph data, when dealing with today's large-scale data. In addition, most subgraph matching algorithms are applied to undirected graph. In the application of directed graphs there are problems of inapplicability or inefficiency. In view of the above problems this paper relies on the big data platform which has emerged in recent years and uses it to provide powerful storage and computing power. A subgraph matching algorithm using big data processing platform Spark as processing engine and using GraphX component to process very large scale graph data is studied and implemented. The algorithm uses HDFS as storage platform. It effectively solves the problem of cluster load balancing. The computing process is divided into distributed filtering phase and distributed verification phase. The distributed filtering phase takes full account of the characteristics of Spark platform and GraphX segmentation strategy based on vertex. The data structure of vertex signature is proposed, and the data graph is filtered quickly and efficiently by parallel vertex signature, in which the information of itself and neighborhood is expressed in the vertex signature, and the parent neighborhood and the sub-neighborhood are distinguished in the neighborhood. The filtering effect of directed graph is improved. In the stage of distributed verification, the concept of candidate subgraph matching region is put forward by using the advantage of distributed computing on Spark platform, and the center point of the query graph is calculated. A number of candidate subgraph matching regions are obtained in the data graph, and the filtered super-large scale graph data is further segmented. An efficient subgraph matching operation is performed in a smaller candidate subgraph matching region. Finally, experiments show that the distributed subgraph over-matching algorithm has good matching efficiency and extensibility. In comparison with the current excellent subgraph matching algorithm VF2, the proposed algorithm has a good performance efficiency.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:O157.5
【参考文献】
相关期刊论文 前3条
1 Bin Cui;Hong Mei;Beng Chin Ooi;;Big data: the driver for innovation in databases[J];National Science Review;2014年01期
2 潘巍;李战怀;伍赛;陈群;;基于消息传递机制的MapReduce图算法研究[J];计算机学报;2011年10期
3 徐凯旋;鲁道夫;;扩展子图同构问题的优化算法[J];计算机工程;2011年19期
相关硕士学位论文 前3条
1 李燕;基于单邻域的子图查询算法研究[D];燕山大学;2016年
2 焦晓翠;基于V-index的子图查询算法的研究与实现[D];东北大学;2012年
3 黄博;图数据库中多子图匹配查询算法研究[D];复旦大学;2012年
,本文编号:1446453
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1446453.html