基于MapRedcue的大规模栅格数据空间分析算法并行化研究
本文选题:MapReduce + 栅格数据 ; 参考:《江西理工大学》2013年硕士论文
【摘要】:随着对地观测技术的迅速发展,栅格数据量急剧增加,传统单节点架构的GIS系统已无法满足大规模栅格数据处理和分析的需求,如何在分布式并行环境下提高大规模栅格数据空间分析算法的效率,已成为目前地学领域研究的重点。栅格数据空间分析具有数据量和计算量大的特点,属于典型的数据密集型计算。目前,业界提出了多种并行计算模式,相对于传统的MPI并行编程模型,开源Hadoop框架下的MapReduce并行编程模型更适用于数据密集型计算,同时具有较高的性能。因此本文将并行编程模型MapReduce和栅格数据空间分析典型算法相结合,主要解决大规模栅格数据计算效率低的问题。 本文从大规模栅格数据并行的角度,对数据划分、数据并行导入和结果融合进行分析,并在此基础上设计栅格数据空间分析并行化算法。主要做了下面工作:首先,针对大规模栅格数据的特点,提出了Hadoop框架下利用分布式文件系统HDFS构建高效的数据组织模型,并针对栅格处理中邻域型算法的数据边界问题,提出了栅格数据重分块处理机制;其次,针对传统串行数据读取速度慢的问题,设计基于MapRedcue的栅格金字塔并行构建,实现大规模栅格数据的并行导入;然后,结合MapReduce并行编程模型,设计基本地形因子和地形特征提取的并行化算法,,以提高大规模栅格数据空间分析的效率;最后,与串行算法做了对比实验,验证了栅格数据空间分析并行化算法的效率。结果表明,基于MapReduce的栅格数据空间分析并行化算法效果较好。同时,随着数据节点和数据量的增加,并行化算法的效率逐步提高。 因此,本文设计的基于MapReduce的栅格数据空间分析并行化算法有效提升了大规模栅格数据的计算效率。
[Abstract]:With the rapid development of Earth observation technology, the amount of grid data increases rapidly, and the traditional single-node GIS system can not meet the needs of large-scale grid data processing and analysis. How to improve the efficiency of large scale raster spatial analysis algorithm in distributed parallel environment has become the focus of geoscience research. Raster data space analysis has the characteristics of large amount of data and computation, so it is a typical data intensive calculation. At present, many parallel computing models have been proposed in the industry. Compared with the traditional MPI parallel programming model, the MapReduce parallel programming model based on open source Hadoop framework is more suitable for data-intensive computing and has higher performance. So the parallel programming model MapReduce and the typical algorithms of raster data space analysis are combined to solve the problem of low efficiency of large scale raster data computation. From the point of view of large scale raster data parallelism, this paper analyzes data partition, data parallel import and result fusion, and then designs a parallel algorithm for raster data space analysis. The main work is as follows: firstly, according to the characteristics of large-scale raster data, an efficient data organization model based on distributed file system (HDFS) under Hadoop framework is proposed, and the data boundary problem of neighborhood algorithm in grid processing is also discussed. Secondly, aiming at the problem of slow reading speed of traditional serial data, the parallel construction of grid pyramid based on MapRedcue is designed to realize the parallel import of large scale raster data. Combined with MapReduce parallel programming model, a parallel algorithm for extracting basic terrain factors and terrain features is designed to improve the efficiency of large scale raster data space analysis. The efficiency of parallel algorithm for raster data space analysis is verified. The results show that the parallel algorithm of raster data space analysis based on MapReduce is effective. At the same time, with the increase of data nodes and data, the efficiency of parallelization algorithm is improved gradually. Therefore, the parallel algorithm of raster data space analysis based on MapReduce in this paper can effectively improve the computational efficiency of large scale raster data.
【学位授予单位】:江西理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:P208
【参考文献】
相关期刊论文 前10条
1 程承旗;张恩东;万元嵬;宋树华;;遥感影像剖分金字塔研究[J];地理与地理信息科学;2010年01期
2 王结臣;王豹;胡玮;张辉;;并行空间分析算法研究进展及评述[J];地理与地理信息科学;2011年06期
3 李建江;崔健;王聃;严林;黄义双;;MapReduce并行编程模型研究综述[J];电子学报;2011年11期
4 程果;景宁;陈荦;熊伟;欧阳柳;;栅格数据处理中邻域型算法的并行优化方法[J];国防科技大学学报;2012年04期
5 江岭;汤国安;刘凯;宋效东;阳建逸;张刚;;局部型地形因子并行计算方法研究[J];地球信息科学学报;2012年06期
6 宋效东;窦万峰;汤国安;江岭;赵菁;赵明伟;;分布式并行地形分析中数据划分机制研究[J];国防科技大学学报;2013年01期
7 张书彬;韩冀中;刘志勇;王凯;;基于MapReduce实现空间查询的研究[J];高技术通讯;2010年07期
8 刘凯;汤国安;江岭;宋效东;阳建逸;张刚;;数字地形分析中邻域统计型算法并行化方法及效率分析[J];地理与地理信息科学;2013年04期
9 刘义;陈荦;景宁;熊伟;;基于R-树索引的Map-Reduce空间连接聚集操作[J];国防科技大学学报;2013年01期
10 曾志;刘仁义;李先涛;张丰;包卫正;;一种基于分块的遥感影像并行处理机制[J];浙江大学学报(理学版);2012年02期
相关博士学位论文 前1条
1 康俊锋;云计算环境下高分辨率遥感影像存储与高效管理技术研究[D];浙江大学;2011年
本文编号:1969373
本文链接:https://www.wllwen.com/kejilunwen/dizhicehuilunwen/1969373.html