一种Spark环境下的高效率大规模图数据处理机制
发布时间:2018-11-28 13:52
【摘要】:针对现有的图处理和图管理框架存在的效率低下以及数据存储结构等问题,提出了一种适合大规模图数据的处理机制。首先分析了目前的一些图处理模型以及图存储框架的优势与存在的不足。其次,通过对分布式计算的特性分析采取适合大规模图的分割算法、数据抽取的优化以及缓存、计算层与持久层结合机制三方面来设计图数据处理框架。最后通过PageRank和SSSP算法设计实验,与MapReduce框架和采用HDFS作持久层的Spark框架进行性能对比。实验证明提出的框架要比MapReduce框架快90倍,比采用HDFS作持久层的Spark框架快2倍,能够满足高效率图数据处理的应用前景。
[Abstract]:Aiming at the inefficiency of the existing graph processing and graph management framework and the data storage structure, a processing mechanism suitable for large-scale graph data is proposed. Firstly, the advantages and disadvantages of some current graph processing models and graph storage framework are analyzed. Secondly, by analyzing the characteristics of distributed computing, we design the graph data processing framework from three aspects: the segmentation algorithm suitable for large-scale graph, the optimization of data extraction and the mechanism of cache, the combination of computing layer and persistence layer. Finally, the performance of PageRank and SSSP algorithm is compared with that of MapReduce framework and Spark framework with HDFS as persistence layer. Experiments show that the proposed framework is 90 times faster than the MapReduce framework and 2 times faster than the Spark framework using HDFS as the persistence layer. It can meet the application prospect of high efficiency graph data processing.
【作者单位】: 云南大学信息学院;
【基金】:国家自然科学基金资助项目(61170222)
【分类号】:TP311.13
,
本文编号:2363025
[Abstract]:Aiming at the inefficiency of the existing graph processing and graph management framework and the data storage structure, a processing mechanism suitable for large-scale graph data is proposed. Firstly, the advantages and disadvantages of some current graph processing models and graph storage framework are analyzed. Secondly, by analyzing the characteristics of distributed computing, we design the graph data processing framework from three aspects: the segmentation algorithm suitable for large-scale graph, the optimization of data extraction and the mechanism of cache, the combination of computing layer and persistence layer. Finally, the performance of PageRank and SSSP algorithm is compared with that of MapReduce framework and Spark framework with HDFS as persistence layer. Experiments show that the proposed framework is 90 times faster than the MapReduce framework and 2 times faster than the Spark framework using HDFS as the persistence layer. It can meet the application prospect of high efficiency graph data processing.
【作者单位】: 云南大学信息学院;
【基金】:国家自然科学基金资助项目(61170222)
【分类号】:TP311.13
,
本文编号:2363025
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2363025.html