基于再生码的分布式容错存储优化技木研究

发布时间：2018-04-08 16:47

本文选题：分布式存储　切入点：HDFS　出处：《南京大学》2016年硕士论文

【摘要】：随着大数据时代的到来,大规模数据存储成为大数据的关键技术之一。分布式存储系统大多部署在廉价的商用机器上,节点失效已经成为一种常态。因此,如何构建面向海量数据的可靠存储机制成为相关技术研究的热点。传统基于多副本的容错策略存在存储开销过高和容错性差等问题,使其成为影响系统可扩展能力的瓶颈。近年来,业界开始提出使用纠删码策略作为存储系统的容错机制,但纠删码策略存在数据修复带宽消耗过大问题。为此,学界转向对基于网络编码的再生码存储策略开展研究。再生码在修复时能达到最优带宽开销,但再生码计算开销巨大等问题阻碍了其被广泛应用。另外,大多数存储系统只使用固定的单一的编码方法作为容错策略,忽视所存储文件本身的差异性,使得性能优化还存在缺陷。针对上述问题,论文以构建低冗余、高可用、高可靠的分布式存储系统为目标,以基于HDFS的编码存储系统Cumulus为平台,对基于再生码的容错分布式存储及其性能优化机制开展研究。主要工作包括以下两个方面：1) 针对现有编码方法的不足,结合存储效率、访问延迟、修复带宽、计算复杂性等多维因素,提出了基于简单再生码的分布式容错存储方案,在此基础上进一步对简单再生码的退化读修复机制进行优化设计,并在Cumulus系统中实现了基于简单再生码的容错存储策略。实验结果表明,简单再生码在增加少量存储开销的基础上,有效减少了修复开销。2) 针对存储系统中的文件加入生命周期和访问频率特征对文件访问性能的影响问题,结合文件状态和系统状态,提出了基于文件动态属性的自适应编码机制。论文设计并实现了基于简单再生码的自适应编码模型。实验结果表明,基于文件动态属性的自适应编码机制可有效提高分布式存储系统的整体存储效率,并降低修复代价。
[Abstract]:With the arrival of big data era, large-scale data storage has become one of the key technologies of big data.Distributed storage systems are mostly deployed on cheap commercial machines, and node failure has become the norm.Therefore, how to build a reliable storage mechanism for mass data has become a hot research topic.In recent years, erasure code strategy has been proposed as a fault-tolerant mechanism in storage systems. However, the erasure code strategy has the problem of excessive bandwidth consumption of data repair.Therefore, the academic circle turns to the research on the storage strategy of regenerative code based on network coding.The regenerative code can achieve the optimal bandwidth overhead when it is repaired, but it is widely used because of the huge computational overhead of the regenerated code.In addition, most storage systems only use a fixed single coding method as a fault-tolerant strategy, ignoring the differences of the stored files, which makes the performance optimization still have defects.Aiming at the above problems, this paper aims at building a distributed storage system with low redundancy, high availability and high reliability, and takes Cumulus, a coding storage system based on HDFS, as the platform.The fault-tolerant distributed storage based on regenerative code and its performance optimization mechanism are studied.The main work includes the following two aspects: (1) aiming at the shortcomings of the existing coding methods, combining the multi-dimensional factors such as storage efficiency, access delay, repair bandwidth, computational complexity and so on, a distributed fault-tolerant storage scheme based on simple regenerative code is proposed.On this basis, the degenerate read and repair mechanism of simple regenerative code is optimized, and the fault-tolerant storage strategy based on simple regenerative code is implemented in Cumulus system.Combining file state with system state, an adaptive encoding mechanism based on file dynamic attributes is proposed.An adaptive coding model based on simple regenerative codes is designed and implemented in this paper.Experimental results show that the adaptive coding mechanism based on file dynamic attributes can effectively improve the overall storage efficiency of distributed storage system and reduce the repair cost.
【学位授予单位】：南京大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP333

【参考文献】