当前位置:主页 > 科技论文 > 测绘论文 >

基于MapReduce的分布式空间连接查询研究

发布时间:2018-03-29 05:29

  本文选题:HDFS 切入点:MapReduce 出处:《江西理工大学》2013年硕士论文


【摘要】:近年来,随着信息化步伐的加快,地理空间信息获取技术进步日新月异。同时,地理空间数据规模与日俱增,已成为海量数据的重要来源之一。空间连接查询是一种常用且非常耗时的复杂空间查询操作,特别是在处理大规模空间数据集时,由于传统单机系统和MPI集群系统都难以满足其对时空开销的需求,因此,如何在云计算环境中设计高效的分布式空间连接查询算法已成为当前学术界和产业界研究的热点问题。 本文首次尝试提出了一种云计算环境下的分布式QR-树索引结构,,并在该索引基础上进行基于MapReduce的空间连接查询。本文主要工作如下: (1)提出了一种云计算环境下能够支持大规模数据集的分布式QR-树索引结构,并详细介绍了其构建的过程。分布式QR-树索引的构建过程可分为以下两个步骤:首先,采用基于四叉树的空间数据划分对空间数据集进行划分并分布式存储在HDFS数据块中;然后,在分割后的每个子区域数据块中并行构建R树索引。 (2)在构建分布式QR-树索引基础上,将分布式QR-树索引结构与分布式并行计算框架MapReduce相结合,设计和实现了基于MapReduce的空间连接查询算法QRSJ-MR。另外,针对算法中存在的索引并发访问问题,采用了实时缓存机制对索引并发访问进行优化。 (3)搭建Hadoop集群环境,测试基于MapReduce的分布式空间连接算法QRSJ-MR的效率。本文在空间交叠连接查询和空间包含连接查询上,分别与非索引的MapReduce空间连接算法和基于R-树索引的MapReduce空间连接算法做了性能对比实验。 实验结果表明:与非索引的MapReduce空间连接算法和基于R-树索引的MapReduce空间连接查询算法相比,无论在空间交叠连接查询还是在空间包含连接查询上,QRSJ-MR算法都具有更高的执行效率。
[Abstract]:In recent years, with the acceleration of the pace of information technology, geospatial information acquisition technology is changing with each passing day. At the same time, the scale of geospatial data is increasing with each passing day. Spatial join query is a common and time-consuming complex spatial query operation, especially when dealing with large-scale spatial data sets. Because the traditional single-machine system and MPI cluster system can not meet the demand of space-time overhead, how to design an efficient distributed spatial join query algorithm in cloud computing environment has become a hot issue in academia and industry. This paper proposes a distributed QR-tree index structure in cloud computing environment for the first time, and carries on spatial join query based on MapReduce based on this index. The main work of this paper is as follows:. In this paper, a distributed QR-tree index structure which can support large-scale data sets in cloud computing environment is proposed, and the construction process of distributed QR-tree index is introduced in detail. The construction process of distributed QR-tree index can be divided into the following two steps: first of all, Spatial data sets are partitioned based on quadtree and stored distributed in HDFS data blocks, and then R-tree indexes are constructed in parallel in each sub-region data block after segmentation. 2) on the basis of constructing distributed QR-tree index, combining distributed QR-tree index structure with distributed parallel computing framework (MapReduce), a spatial join query algorithm QRSJ-MRbased on MapReduce is designed and implemented. Aiming at the problem of index concurrent access in the algorithm, the real-time cache mechanism is used to optimize the index concurrent access. 3) build Hadoop cluster environment, test the efficiency of QRSJ-MR, a distributed spatial join algorithm based on MapReduce. Performance comparison experiments with non-indexed MapReduce spatial join algorithm and MapReduce spatial join algorithm based on R- tree index are carried out respectively. The experimental results show that the QRSJ-MR algorithm is more efficient than the non-indexed MapReduce spatial join algorithm and the MapReduce spatial join query algorithm based on R- tree index in terms of spatial overlap join query and spatial inclusion join query.
【学位授予单位】:江西理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:P208

【参考文献】

中国期刊全文数据库 前8条

1 唐桂芬;杨伟锋;黄双临;李炜;;一种高效的累进式空间连接查询处理算法[J];电子学报;2009年02期

2 李建江;崔健;王聃;严林;黄义双;;MapReduce并行编程模型研究综述[J];电子学报;2011年11期

3 刘义;陈荦;景宁;熊伟;;基于R-树索引的Map-Reduce空间连接聚集操作[J];国防科技大学学报;2013年01期

4 潘红岩;郝忠孝;;基于4CDRS的空间连接查询[J];哈尔滨理工大学学报;2010年04期

5 刘义;陈荦;景宁;刘露;;海量空间数据的并行Top-k连接查询[J];计算机研究与发展;2011年S3期

6 赵清华;陈荦;景宁;;基于Kd树递归区域划分的分布式空间连接查询[J];计算机工程与科学;2011年08期

7 杨泽雪;郝忠孝;;受限空间连接查询及代价分析[J];哈尔滨工业大学学报;2012年11期

8 回敬齐;李伯权;陈芳芳;;空间数据库R-tree连接方法研究[J];齐齐哈尔大学学报(自然科学版);2010年04期



本文编号:1679774

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/dizhicehuilunwen/1679774.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7b4c9***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com