基于纠删码的RAID-6双校验条写分析与优化

发布时间：2018-06-09 03:19

本文选题：RAID + 纠删码　；参考：《北京理工大学》2015年博士论文

【摘要】：在现代RAID系统中,可靠性和性能是最重要的两个方面。在存储介质中利用纠删码可以提高可靠性。多年来研究人员尝试利用单奇偶校验、镜像、最大距离可分码等不同方案来容错和备灾。但最新理论表明,随着新技术的应用和新理论的发展,这些技术不能很好得应对新问题。目前,研究人员着手利用条带写数据时的纠删码技术方案提高存储可靠性。本文的研究成果是在RAID存储系统中提高了错误数据的恢复效率和概率。论文的第一部分,分析了RAID6的工作机制、奇偶校验码写入机制、以及数据和校验码更新机制。同时这一部分引出了一种新方案,利用双纠删码来解决错误数据恢复问题。在实现这一方案的过程中,我们发现现有的很多技术方法不能很好的处理大规模数据存储系统,比如只能处理单一磁盘的错误。文中提出的新方法可以在一定程度上克服上述弊端,利用双纠删码在校验盘的基础上恢复数据。论文的第二部分,研究了RAID中单个磁盘的数据分布机制,即数据条带化。所谓条带化,是指将数据分块,并将这些数据块均衡分布到存储系统的不同磁盘的不同区域。条带数据块大小是一个重要参数,会极大影响各个磁盘的读写性能。随着磁盘技术的发展,以及IO优化技术的出现,有必要进一步分析研究这些变化给条带化带来的新影响。文中对此做了阐述。第三部分,描述了RAID6系统中发生数据错误时的恢复方法。RAID6具有高容错性,但恢复效率差,每一校验集都需独立计算,尤其影响写效率。当存储系统中的某一数据出错时,整个容错阵列都会受到很大影响。为此,我们设计了多种方法在数据重建过程中消除错误数据影响。最后,我们使用纠删码来分析和优化RAID6中的分块,并采用XOR技术恢复失败或崩溃的数据,减少使用阵列资源来降低重建的不利因素。在这个RAID级别,我们使用2个奇偶校验设备和校验技术构造扩展阵列,以容许任何一个或两个存储设备故障。在此过程中新旧两种都要应用于校验数据的计算中。在更新存储系统中,会实施6步操作(读数据、写数据、读写两个校验盘),由此可以应对数据灾害。论文的主要贡献为:1、分析了双纠删码技术,并验证了其在RAID中数据容错应用的有效性。2、改善了RAID中利用两块校验盘进行小写的性能。3、基于XOR技术优化了利用纠删码在RAID中恢复数据的方案。
[Abstract]:In modern raid systems, reliability and performance are the two most important aspects. Using erasure codes in storage media can improve reliability. Over the years, researchers have tried to use single parity check, mirror image, maximum distance divisible code and other different schemes to fault tolerance and disaster preparedness. But the latest theory shows that with the application of new technology and the development of new theory, these technologies can not deal with new problems well. At present, researchers are using erasure code to improve storage reliability. The research result of this paper is to improve the recovery efficiency and probability of error data in raid storage system. In the first part of this paper, the working mechanism of RAID6, the parity code writing mechanism, and the updating mechanism of data and check codes are analyzed. At the same time, this part introduces a new scheme, using double erasure code to solve the problem of error data recovery. In the process of implementing this scheme, we find that many existing technical methods can not deal with large scale data storage system well, for example, we can only handle single disk errors. The new method proposed in this paper can overcome the above disadvantages to a certain extent and use double erasure codes to recover the data on the basis of the check disk. In the second part of this paper, we study the data distribution mechanism of single disk in raid, that is, data striping. Striping refers to dividing the data into blocks and distributing them evenly across different disks of the storage system. Strip data block size is an important parameter, which greatly affects the read and write performance of each disk. With the development of disk technology and the emergence of IO optimization technology, it is necessary to further analyze and study the new effects of these changes on striping. This article has made the elaboration to this. In the third part, we describe the recovery method of RAID6 system when data error occurs. RAID6 has high fault tolerance, but the recovery efficiency is poor, each check set needs to be calculated independently, especially affecting the write efficiency. When a data in the storage system goes wrong, the whole fault-tolerant array will be greatly affected. Therefore, we design a variety of methods to eliminate the impact of error data in the process of data reconstruction. Finally, we use erasure codes to analyze and optimize blocks in RAID6, and use XOR technology to restore data that fail or crash, reducing the use of array resources to reduce the adverse factors of reconstruction. At this raid level, we use two parity devices and check techniques to construct an extended array to allow any or both storage devices to fail. In this process, both the new and the old should be applied to the calculation of the calibration data. In the update storage system, six steps will be implemented (read data, write data, read and write two checkboxes), which can deal with data disaster. The main contribution of this paper is to analyze the technology of double erasure code. The validity of the data fault-tolerant application in raid is verified, and the performance of using two check disks in raid for lowercase is improved. Based on XOR technology, the scheme of recovering data in raid by erasure code is optimized.
【学位授予单位】：北京理工大学
【学位级别】：博士
【学位授予年份】：2015
【分类号】：TP333.35

【共引文献】