基于HDFS的海量遥感影像存储冗余机制的研究
发布时间:2018-08-01 11:46
【摘要】:海量遥感影像数据存储基本上采用的都是分布式存储方式。特别是在高分辨率数据存储系统中,为了保证数据的安全性、完备性和高可用性,需要提供一定的数据冗余技术。 目前,传统的分布式文件存储系统中采用的数据冗余技术有三种:完全副本技术、磁盘阵列技术和纠删码编码冗余技术,完全副本和磁盘阵列这两种技术在提高系统冗余性的同时都会增加对系统存储空间的需求,纠删码编码冗余技术虽然能弥补存储空间过度消耗的缺陷,但同时也会增加系统I/O负担。针对上面三种方法的缺陷,本文采用完全复制技术和纠删码编码冗余技术相结合的方法来解决。在开源HDFS(HadoopDistributed File System)的基础上,本文将改进后的冗余机制替代HDFS原有的冗余机制来解决系统中存储空间与系统I/O负担之间的冲突问题,使整个系统在提高冗余性的同时能够保证系统I/O速度,并且可以极大地降低系统对存储空间的需求。本文重点研究了适合高分辨率遥感影像的数据冗余机制,提出了一种改进的冗余策略。主要工作与贡献如下。 1.在研究海量遥感影像数据存储管理技术与数据冗余机制的基础上,主要研究了HDFS分布式文件系统及其冗余机制,重点分析了适合海量遥感影像存储的复制冗余技术和纠删码编码冗余技术。 2.在复制冗余机制和纠删码编码冗余机制的基础上,提出了“复制+编码”的改进的HDFS冗余策略方法,,并给出了文件的读写流程方案以及编码后系统中产生的编码块的管理方案。 3.对改进的HDFS系统进行了实验,验证了所提出的改进方案的可行性并且实验结果表明系统在保证系统I/O速度的基础上,能够极大地降低系统对存储空间的需求。改进后的HDFS系统被成功应用到高分重大专项项目(ERSI-DBMS)的海量遥感影像数据存储系统中。
[Abstract]:The massive remote sensing image data storage basically adopts the distributed storage method. Especially in high-resolution data storage system, in order to ensure the security, completeness and high availability of data, it is necessary to provide certain data redundancy technology. At present, there are three kinds of data redundancy techniques used in traditional distributed file storage systems: full copy technology, disk array technology and erasure code coding redundancy technology. Both full copy and disk array can increase the requirement of system storage space while improving system redundancy. Erasure code redundancy can make up the defects of excessive consumption of storage space. But it also adds to the system's I / O burden. Aiming at the defects of the above three methods, this paper adopts the method of combining the complete copy technique and erasure code coding redundancy technique to solve the problem. On the basis of open source HDFS (HadoopDistributed File System), the improved redundancy mechanism is replaced by the original redundancy mechanism of HDFS to solve the conflict between the storage space and the I / O burden of the system. The whole system can improve the redundancy and guarantee the I / O speed of the system, and greatly reduce the storage space requirement of the system. In this paper, the data redundancy mechanism suitable for high resolution remote sensing images is studied, and an improved redundancy strategy is proposed. The main work and contributions are as follows. 1. Based on the research of data storage and management technology and data redundancy mechanism of massive remote sensing image, the distributed file system of HDFS and its redundancy mechanism are studied. The duplication redundancy and erasure code coding redundancy techniques suitable for mass remote sensing image storage are analyzed in detail. 2. On the basis of duplication redundancy mechanism and erasure code coding redundancy mechanism, an improved HDFS redundancy strategy method of "replication coding" is proposed. And gives the file read and write flow scheme and the coding system generated in the code block management scheme. 3. Experiments on the improved HDFS system are carried out to verify the feasibility of the proposed scheme and the experimental results show that the system can greatly reduce the storage space requirements of the system on the basis of ensuring the system I / O speed. The improved HDFS system is successfully applied to the mass remote sensing image data storage system of high score major project (ERSI-DBMS).
【学位授予单位】:河南大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP751
本文编号:2157476
[Abstract]:The massive remote sensing image data storage basically adopts the distributed storage method. Especially in high-resolution data storage system, in order to ensure the security, completeness and high availability of data, it is necessary to provide certain data redundancy technology. At present, there are three kinds of data redundancy techniques used in traditional distributed file storage systems: full copy technology, disk array technology and erasure code coding redundancy technology. Both full copy and disk array can increase the requirement of system storage space while improving system redundancy. Erasure code redundancy can make up the defects of excessive consumption of storage space. But it also adds to the system's I / O burden. Aiming at the defects of the above three methods, this paper adopts the method of combining the complete copy technique and erasure code coding redundancy technique to solve the problem. On the basis of open source HDFS (HadoopDistributed File System), the improved redundancy mechanism is replaced by the original redundancy mechanism of HDFS to solve the conflict between the storage space and the I / O burden of the system. The whole system can improve the redundancy and guarantee the I / O speed of the system, and greatly reduce the storage space requirement of the system. In this paper, the data redundancy mechanism suitable for high resolution remote sensing images is studied, and an improved redundancy strategy is proposed. The main work and contributions are as follows. 1. Based on the research of data storage and management technology and data redundancy mechanism of massive remote sensing image, the distributed file system of HDFS and its redundancy mechanism are studied. The duplication redundancy and erasure code coding redundancy techniques suitable for mass remote sensing image storage are analyzed in detail. 2. On the basis of duplication redundancy mechanism and erasure code coding redundancy mechanism, an improved HDFS redundancy strategy method of "replication coding" is proposed. And gives the file read and write flow scheme and the coding system generated in the code block management scheme. 3. Experiments on the improved HDFS system are carried out to verify the feasibility of the proposed scheme and the experimental results show that the system can greatly reduce the storage space requirements of the system on the basis of ensuring the system I / O speed. The improved HDFS system is successfully applied to the mass remote sensing image data storage system of high score major project (ERSI-DBMS).
【学位授予单位】:河南大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP751
【参考文献】
相关期刊论文 前1条
1 孙劲光;王淑娥;陈虹;;压缩金字塔树:有效的高维数据索引结构[J];计算机工程与应用;2009年22期
相关硕士学位论文 前1条
1 徐文强;基于HDFS的云存储系统研究[D];上海交通大学;2011年
本文编号:2157476
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2157476.html