分布式存储系统上的RS纠删码研究与应用
发布时间:2018-11-11 10:32
【摘要】:随着计算机、智能设备的普及,以及互联网技术的快速发展。各类数据呈几何数量级的激增,给存储系统带来了更多的挑战。数据的存储安全已经成为当前存储系统迫切需要解决的问题。目前,分布式存储是人们用来应对大数据量存储的有效手段。在分布式存储系统上存在两种保障数据安全的方式:多副本技术和纠删码技术。多副本技术简单且易于实现,只需简单的对一份数据进行多次备份,并分开存放即可,以三副本策略最为常见。为了获得更稳定的数据安全保障,多副本技术只能通过增加副本数量来实现,同时带来的问题是存储成本在成倍的增加。为了解决多副本技术存储成本过高的问题,人们将通信系统中用于解决信息传递过程中数据丢失问题的纠删码技术引入到存储系统上来,纠删码技术能很好地解决存储成本过高的问题,同时能保证和多副本相同甚至更高的数据安全保障能力。在纠删码技术解决了多副本技术存储成本过高的问题的同时,却出现了新的问题:在数据丢失后恢复数据时的所消耗的系统资源和I/O数量大幅增加。正是出于这一目的,本文从RS纠删码入手,分析RS纠删码的编码方程和特点,并结合阵列码、LDPC码的优点,提出一种基于RS纠删码改进后的编码——LRC。给出LRC的定义和对LRC进行容错性分析;建立马尔科夫模型对其可靠性进行分析;还对其编码方程的构造矩阵和编码参数变化进行对比性分析。为了能够对LRC的编码思想进行应用,以开源分布式存储系统HDFS为平台,对HDFS存储系统的系统架构、数据放置策略进行分析和理解的基础上,从数据放置策略、数据重构过程和通信校验机制三个方面,提出在HDFS上实现LRC的设计思路。最后,通过三组对比实验得出:存储成本相似的情况下,LRC在解码时间上比RS编码节省近一半的时间成本;改变编码参数时,LRC的编解码性能不会大幅变化,且能提供更多的参数组合选择;在编码矩阵上,基于柯西矩阵的编码方程和基于范德蒙矩阵的编码方程具有相似的性能表现。
[Abstract]:With the popularity of computers, intelligent devices, and the rapid development of Internet technology. All kinds of data are in geometric order of magnitude, which brings more challenges to storage system. Data storage security has become an urgent problem to be solved in current storage system. At present, distributed storage is an effective way for people to deal with large amount of data storage. There are two ways to ensure data security in distributed storage system: multiple replica technology and erasure code technology. Multi-replica technology is simple and easy to implement. It only needs to backup one data several times and store it separately. The three-copy strategy is the most common. In order to obtain more stable data security, multi-replica technology can only be achieved by increasing the number of replicas, and the problem is that the storage cost is increasing exponentially. In order to solve the problem of high storage cost of multi-copy technology, erasure code technology, which is used to solve the problem of data loss in communication system, is introduced to the storage system. Erasure code technology can solve the problem of high storage cost and guarantee the same or higher data security ability as many copies. While erasure code technology solves the problem that the storage cost of multi-copy technology is too high, a new problem arises: the consumption of system resources and the number of I / O in recovering data after data loss are increased greatly. For this purpose, this paper begins with RS erasure codes, analyzes the coding equations and characteristics of RS erasure codes, and combines the advantages of array codes and LDPC codes, and proposes an improved LRC. code based on RS erasure codes. The definition of LRC and the fault tolerance analysis of LRC are given, the reliability of LRC is analyzed by Markov model, and the construction matrix of coding equation and the variation of coding parameters are analyzed comparatively. In order to apply the coding idea of LRC, based on the open source distributed storage system (HDFS), the system architecture and data placement strategy of HDFS storage system are analyzed and understood. Data reconfiguration process and communication verification mechanism are discussed in this paper. The design idea of implementing LRC on HDFS is put forward. Finally, through three groups of comparative experiments, it is concluded that under the condition of similar storage cost, LRC saves nearly half the time cost of RS coding in decoding time; When the coding parameters are changed, the encoding and decoding performance of LRC will not change significantly, and it can provide more choice of parameters. In the coding matrix, the coding equation based on Cauchy matrix and the coding equation based on Van der Mon matrix have similar performance.
【学位授予单位】:成都理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
本文编号:2324572
[Abstract]:With the popularity of computers, intelligent devices, and the rapid development of Internet technology. All kinds of data are in geometric order of magnitude, which brings more challenges to storage system. Data storage security has become an urgent problem to be solved in current storage system. At present, distributed storage is an effective way for people to deal with large amount of data storage. There are two ways to ensure data security in distributed storage system: multiple replica technology and erasure code technology. Multi-replica technology is simple and easy to implement. It only needs to backup one data several times and store it separately. The three-copy strategy is the most common. In order to obtain more stable data security, multi-replica technology can only be achieved by increasing the number of replicas, and the problem is that the storage cost is increasing exponentially. In order to solve the problem of high storage cost of multi-copy technology, erasure code technology, which is used to solve the problem of data loss in communication system, is introduced to the storage system. Erasure code technology can solve the problem of high storage cost and guarantee the same or higher data security ability as many copies. While erasure code technology solves the problem that the storage cost of multi-copy technology is too high, a new problem arises: the consumption of system resources and the number of I / O in recovering data after data loss are increased greatly. For this purpose, this paper begins with RS erasure codes, analyzes the coding equations and characteristics of RS erasure codes, and combines the advantages of array codes and LDPC codes, and proposes an improved LRC. code based on RS erasure codes. The definition of LRC and the fault tolerance analysis of LRC are given, the reliability of LRC is analyzed by Markov model, and the construction matrix of coding equation and the variation of coding parameters are analyzed comparatively. In order to apply the coding idea of LRC, based on the open source distributed storage system (HDFS), the system architecture and data placement strategy of HDFS storage system are analyzed and understood. Data reconfiguration process and communication verification mechanism are discussed in this paper. The design idea of implementing LRC on HDFS is put forward. Finally, through three groups of comparative experiments, it is concluded that under the condition of similar storage cost, LRC saves nearly half the time cost of RS coding in decoding time; When the coding parameters are changed, the encoding and decoding performance of LRC will not change significantly, and it can provide more choice of parameters. In the coding matrix, the coding equation based on Cauchy matrix and the coding equation based on Van der Mon matrix have similar performance.
【学位授予单位】:成都理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 罗象宏;舒继武;;存储系统中的纠删码研究综述[J];计算机研究与发展;2012年01期
相关博士学位论文 前3条
1 谢平;RAID-6编码布局及重构优化研究[D];华中科技大学;2015年
2 刘卫平;网络存储中的数据容错与容灾技术研究[D];西北工业大学;2006年
3 万武南;分布式安全存储系统纠删码技术的研究[D];中国科学院研究生院(成都计算机应用研究所);2006年
相关硕士学位论文 前6条
1 梁先海;纠删码存储集群的数据重构优化技术研究[D];华中科技大学;2015年
2 王敬轩;分布式文件系统存储效率优化研究[D];华中科技大学;2013年
3 杨明;基于LDPC码的分布式容灾系统及其性能研究[D];哈尔滨工程大学;2012年
4 张世乐;面向大数据块的快速多容错编码研究[D];复旦大学;2010年
5 金奎;基于分布式存储系统的数据安全传输的设计与实现[D];哈尔滨工业大学;2009年
6 姜英豪;基于RS和Chord的分布式存储系统的设计[D];哈尔滨工业大学;2008年
,本文编号:2324572
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2324572.html