时空数据分布式存储研究
发布时间:2018-07-21 19:12
【摘要】:时空数据是一种多维数据。它的结构异常复杂,具有空间和时态特性。它能够详细的记录事物空间状态和时空变化,并能正确显示对象过去、现在、未来的状态。在科技快速发展的时代,采集数据的设备种类越来越多,数据的数量也快速增大,从而导致数据存储管理困难。而时空数据存储管理模块设计实现的优劣决定着整个数据管理系统的工作能力。因此又会影响到其它上层的应用系统运行效率。随着分布式框架的提出,它的高效并行计算能力、大容量存储、高扩展、高稳定等优点吸引着我们。本文在前人研究的基础上对时空数据分布式存储进行了探索研究。本文先从时空数据和分布式理论着手,研究相关部分的技术及原理,提出一种基于R树的时空索引,然后以开源云平台Hadoop的HBase为数据库载体,利用Map Reduce高效计算能力对时空数据进行管理,最后通过一些实验验证索引性能。主要研究内容如下:1)深入地分析了经典时空数据模型及时空索引的优缺点;简要分析了分布式平台的特点及相关技术,为论文研究提供理论和技术支撑。2)系统分析了开源云平台Hadoop的核心组件Map Reduce并行计算框架、HDFS分布式文件存储系统、以HDFS为载体的列式键值数据库HBase的数据模型。针对时空数据数据量大等特点,提出了利用HBase大表来存储管理时空数据。结合时空数据与HBase的特性,详细阐述了建表过程以及如何设计行键、定义列族。3)根据当前出现的时空数据索引,提出了一种在R树的基础上构建时空数据索引,该索引将过去和现在时间的数据分别存储,在各自的树中分别管理着起始及结束时间,提高树的利用率来提高查询效率。最后进行了对比实验,测试本文提出时空索引的插入及查询效率。4)最后通过GPS模拟器生成实验数据,然后存储在HBase进行管理。
[Abstract]:Spatiotemporal data is a kind of multidimensional data. Its structure is extremely complex, with spatial and temporal characteristics. It can record the spatial and temporal changes of objects in detail, and correctly display the past, present and future states of objects. In the era of rapid development of science and technology, there are more and more kinds of equipment to collect data, and the quantity of data increases rapidly, which leads to the difficulty of data storage and management. The design and implementation of spatiotemporal data storage management module determines the working ability of the whole data management system. Therefore, it will affect the running efficiency of other upper application systems. With the development of distributed architecture, its advantages of high efficiency parallel computing, large storage capacity, high expansion, high stability and so on attract us. Based on the previous researches, this paper explores the distributed storage of spatiotemporal data. In this paper, we start with spatiotemporal data and distributed theory, study the technology and principle of related parts, propose a spatio-temporal index based on R-tree, then take Hadoop's HBase as database carrier. Map reduce efficient computing power is used to manage spatiotemporal data. Finally, some experiments are carried out to verify the performance of the index. The main research contents are as follows: (1) the advantages and disadvantages of classical spatio-temporal data model and spatio-temporal index are analyzed in depth, and the characteristics of distributed platform and related technologies are briefly analyzed. This paper analyses the core component of open source cloud platform Hadoop, Map reduce parallel computing framework, HDFS distributed file storage system, and the data model of HBase, a column key-value database based on HDFS. According to the characteristics of large amount of spatiotemporal data, HBase large table is used to store and manage spatiotemporal data. Based on the characteristics of spatiotemporal data and HBase, this paper expounds the process of building tables and how to design row keys, defines column family .3) according to the index of spatiotemporal data, a spatio-temporal data index based on R-tree is proposed. The index stores the past and present time data separately and manages the start and end times in their respective trees to improve the query efficiency by improving the utilization ratio of the tree. Finally, a comparative experiment is carried out to test the insertion and query efficiency of the spatiotemporal index. Finally, the experimental data is generated by GPS simulator and stored in HBase for management.
【学位授予单位】:江西理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:P208
本文编号:2136559
[Abstract]:Spatiotemporal data is a kind of multidimensional data. Its structure is extremely complex, with spatial and temporal characteristics. It can record the spatial and temporal changes of objects in detail, and correctly display the past, present and future states of objects. In the era of rapid development of science and technology, there are more and more kinds of equipment to collect data, and the quantity of data increases rapidly, which leads to the difficulty of data storage and management. The design and implementation of spatiotemporal data storage management module determines the working ability of the whole data management system. Therefore, it will affect the running efficiency of other upper application systems. With the development of distributed architecture, its advantages of high efficiency parallel computing, large storage capacity, high expansion, high stability and so on attract us. Based on the previous researches, this paper explores the distributed storage of spatiotemporal data. In this paper, we start with spatiotemporal data and distributed theory, study the technology and principle of related parts, propose a spatio-temporal index based on R-tree, then take Hadoop's HBase as database carrier. Map reduce efficient computing power is used to manage spatiotemporal data. Finally, some experiments are carried out to verify the performance of the index. The main research contents are as follows: (1) the advantages and disadvantages of classical spatio-temporal data model and spatio-temporal index are analyzed in depth, and the characteristics of distributed platform and related technologies are briefly analyzed. This paper analyses the core component of open source cloud platform Hadoop, Map reduce parallel computing framework, HDFS distributed file storage system, and the data model of HBase, a column key-value database based on HDFS. According to the characteristics of large amount of spatiotemporal data, HBase large table is used to store and manage spatiotemporal data. Based on the characteristics of spatiotemporal data and HBase, this paper expounds the process of building tables and how to design row keys, defines column family .3) according to the index of spatiotemporal data, a spatio-temporal data index based on R-tree is proposed. The index stores the past and present time data separately and manages the start and end times in their respective trees to improve the query efficiency by improving the utilization ratio of the tree. Finally, a comparative experiment is carried out to test the insertion and query efficiency of the spatiotemporal index. Finally, the experimental data is generated by GPS simulator and stored in HBase for management.
【学位授予单位】:江西理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:P208
【参考文献】
相关期刊论文 前5条
1 周辉;周晓光;何凭宗;秦佐;杨琦明;;基态修正模型的时空数据组织和快照查询方法研究[J];地理信息世界;2010年02期
2 曹志月,刘岳;一种面向对象的时空数据模型[J];测绘学报;2002年01期
3 龚健雅;GIS中面向对象时空数据模型[J];测绘学报;1997年04期
4 郭志恒;刘艳俊;敖杰刚;;分布式环境下的GML存储[J];城市勘测;2011年05期
5 王永杰;孟令奎;赵春宇;;基于Hilbert空间排列码的海量空间数据划分算法研究[J];武汉大学学报(信息科学版);2007年07期
,本文编号:2136559
本文链接:https://www.wllwen.com/kejilunwen/dizhicehuilunwen/2136559.html