基于HDFS的分布式海量遥感影像数据存储技术研究
发布时间:2018-07-01 10:51
本文选题:遥感数据 + 分布式文件系统 ; 参考:《中国科学院大学(工程管理与信息技术学院)》2013年硕士论文
【摘要】:随着全球对地观测技术的快速发展,遥感影像数据的规模成指数倍数的增长。同期我国开展了一系列基础专项和科研项目,如高分辨率对地观测系统等。这些项目的发展产生了大量的高分辨率遥感影像数据,传统的遥感数据存储管理技术面对TB级至PB级的数据存储问题越来越吃力。由此引发了人们对一系列的超大规模海量遥感数据存储问题的关注和研究。如何能够快速而高效的对海量遥感数据进行存取管理是未来几年内人们关注和研究的一个重要课题。 本文针对如何能快速而高效的进行海量遥感影像数据存储管理的技术,进行了深入研究。选取了hadoop的分布式文件系统HDFS作为存储平台,对比了其他主流的遥感影像数据存储方案,在HDFS文件系统的基础上,针对遥感影像数据,引入了一些其他的优良机制,使之可以应用于海量遥感数据存储上。主要的研究内容包括: (a)对传统的遥感影像数据存储技术进行了分析,探讨了常用的传统遥感影像数据存储在面对迅猛发展的数据规模和数据多样性中存在的不足,对比了现阶段主流的分布式文件系统之后,选用了HDFS进行遥感数据存储技术研究。 (b)介绍了传统的遥感影像数据存储方法—影像四叉树技术,传统的四叉树算法需要消耗大量计算资源,实时性和效率很难保证。因此,本文基于分布式文件系统的核心理念MapReduce算法,提出了四叉树快速构建算法,利用网格节点的计算资源快速构建四叉树。并提出了HDFS文件系统下的四叉树构建方式和构建策略。 (c)设计了基于Hbase数据库的遥感空间数据存储模型,使之能够应用于HDFS分布式文件系统当中;针对HDFS只有单个元数据节点NameNode这种情况,所可能存在的系统稳定性问题,借鉴了目前主流应用系统的机制,采用双机热备的方式来保证系统的容错性;引入了Nagios管理插件,监控分布式文件系统中网格节点的性能信息,从而保证系统的稳定性。 (d)为了解决海量数据的高效率服务问题,在参考了OGC的标准后,本文基于HDFS文件系统中设计了一套数据服务接口,并能够及时的反馈系统中的数据信息和系统状态信息。 (e)基于上述的研究思路设计了实验,从而验证了本文改进策略和方法是有效的。 研究结果表明了本文采用了基于HDFS分布式文件系统对遥感影像数据进行集中管理,针对HDFS而设计的高性能四叉树构建算法和数据存储模型,可以解决日益增长的超大规模海量遥感数据存储管理的问题。同时针对HDFS在存储管理数据过程中存在的问题进行的优化和改进,能够表现出比原有的系统更优的性能,因此这些优化和改进是有效的。
[Abstract]:With the rapid development of global Earth observation technology, the scale of remote sensing image data increases exponentially. In the same period, China has carried out a series of basic projects and scientific research projects, such as high resolution Earth observation system. The development of these projects has resulted in a large number of high-resolution remote sensing image data, the traditional remote sensing data storage management technology facing TB to PB level of data storage problem is becoming more and more difficult. This has aroused the attention and research of a series of massive remote sensing data storage problems. How to access and manage the massive remote sensing data quickly and efficiently is an important issue that people pay attention to and study in the coming years. In this paper, we study how to store and manage massive remote sensing image data quickly and efficiently. The distributed file system of hadoop is selected as storage platform, and other mainstream remote sensing image data storage schemes are compared. On the basis of HDFS file system, some other excellent mechanisms are introduced for remote sensing image data. So that it can be applied to mass remote sensing data storage. The main research contents are as follows: (a) analyzes the traditional remote sensing image data storage technology, and discusses the shortcomings of the traditional remote sensing image data storage in the face of the rapid development of data scale and data diversity. After comparing the mainstream distributed file system at present, we choose HDFS to store remote sensing data;. (b) introduces the traditional remote sensing image data storage method-image quadtree technology. The traditional quadtree algorithm needs to consume a lot of computing resources, so it is difficult to guarantee the real-time and efficiency. Therefore, based on the core idea of distributed file system, MapReduce algorithm, this paper proposes a fast quadtree construction algorithm, which uses the computing resources of grid nodes to quickly construct quadtree. This paper also puts forward the construction method and strategy of quadtree in HDFS file system. The remote sensing spatial data storage model based on Hbase database is designed by. (c), which can be applied to HDFS distributed file system. In view of the fact that HDFS has only a single metadata node, NameNode, the system stability problem that may exist in HDFS is discussed. The mechanism of current mainstream application systems is used to ensure the fault tolerance of the system, and the Nagios management plug-in is introduced. To monitor the performance information of grid nodes in distributed file system, to ensure the stability of the system,. (d) has referred to the standard of. (d) in order to solve the high efficiency service problem of massive data. This paper designs a set of data service interface based on HDFS file system, and can timely feedback the data information and system state information in the system. (e). Based on the above research ideas, the experiment is designed. It is proved that the improved strategy and method are effective. The results show that the distributed file system based on HDFS is used for centralized management of remote sensing image data, and the high performance quadtree algorithm and data storage model are designed for HDFS. It can solve the problem of storage and management of large and large scale remote sensing data. At the same time, the optimization and improvement of HDFS in the process of storage and management data can show better performance than the original system, so these optimization and improvement are effective.
【学位授予单位】:中国科学院大学(工程管理与信息技术学院)
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP751
【参考文献】
相关期刊论文 前1条
1 方裕,周成虎,景贵飞,陆锋,骆剑承;第四代GIS软件研究[J];中国图象图形学报;2001年09期
,本文编号:2087539
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2087539.html