分布式文件系统存储效率优化研究
发布时间:2019-05-17 17:57
【摘要】:随着当今互联网环境中信息呈爆炸式的增长,人们对信息的存储要求已经变得越来越高。而大型互联网公司内部的分布式文件系统的集群规模更是急剧扩张,存储成本也在不断上升。为了保证数据可靠性,现有分布式文件系统通常使用完全数据冗余的方法对文件进行容错,其中采用最多的方法为三副本备份方法。但是三副本备份方法给系统带来了相当大的空间消耗,存储效率低下,所以针对分布式文件系统的存储效率进行优化越来越成为研究中的热点。 纠删码技术通过对文件编码生成校验来达到容错目的,能够极大地提升分布式文件系统的存储效率。但是由于纠删码方法需要对文件进行编码解码操作,,并且在系统中仅保存了文件的一份副本,对系统性能带来了影响。因此针对此问题,提出一种基于冷热数据区分的混合存储方案,在提高分布式文件系统存储效率的同时,将纠删码技术对系统性能的损耗降低到最小。 所提出的方案结合了三副本备份方法和纠删码方法的优点,将新写入的数据以三副本备份方法存储,保证系统性能,当数据变为冷数据时,对文件进行编码转换,并减少副本数,节约存储空间。在编码过程中系统参考了RAID(RedundantArray ofIndependent Disk)中“条带化”的思想以一个条带做为编码单元。 测试结果表明,此存储效率优化方案相对于三副本备份方法在存储效率方面提升了约25%,并且此方案将纠删码技术所带来的系统性能损耗由50%降低至5%以内,其性能表现与三副本备份方法基本一致。测试说明了基于冷热数据区分的存储效率优化方案确实集合了多种方案的优势,并在存储成本、数据可靠性和系统性能之间达到了平衡。
[Abstract]:With the explosive growth of information in today's Internet environment, the storage requirements of information have become higher and higher. The cluster size of distributed file system within large Internet companies is expanding rapidly, and the storage cost is also rising. In order to ensure the reliability of data, the existing distributed file systems usually use the method of complete data redundancy to fault-tolerant the files, among which the most common method is the three-copy backup method. However, the three-copy backup method brings a lot of space consumption and low storage efficiency to the system, so the optimization of storage efficiency of distributed file system has become a hot research topic. Erasure code technology can greatly improve the storage efficiency of distributed file system by generating and checking the file coding to achieve the purpose of fault tolerance. However, because the erasure code method needs to encode and decode the file, and only one copy of the file is saved in the system, which has an impact on the performance of the system. Therefore, in order to solve this problem, a hybrid storage scheme based on cold and hot data differentiation is proposed, which not only improves the storage efficiency of distributed file system, but also minimizes the loss of erasure code technology to system performance. The proposed scheme combines the advantages of the three-copy backup method and the erasure code method, and stores the newly written data as a three-copy backup method to ensure the system performance. When the data becomes cold data, the file is encoded and converted. And reduce the number of copies, save storage space. In the process of coding, a stripe is used as the coding unit, which refers to the idea of striping in RAID (RedundantArray ofIndependent Disk). The test results show that the storage efficiency optimization scheme improves the storage efficiency by about 25% compared with the three-copy backup method, and this scheme reduces the system performance loss caused by erasure code technology from 50% to less than 5%. Its performance is basically consistent with the three-copy backup method. The test shows that the storage efficiency optimization scheme based on cold and hot data differentiation does combine the advantages of various schemes and strike a balance between storage cost, data reliability and system performance.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2479298
[Abstract]:With the explosive growth of information in today's Internet environment, the storage requirements of information have become higher and higher. The cluster size of distributed file system within large Internet companies is expanding rapidly, and the storage cost is also rising. In order to ensure the reliability of data, the existing distributed file systems usually use the method of complete data redundancy to fault-tolerant the files, among which the most common method is the three-copy backup method. However, the three-copy backup method brings a lot of space consumption and low storage efficiency to the system, so the optimization of storage efficiency of distributed file system has become a hot research topic. Erasure code technology can greatly improve the storage efficiency of distributed file system by generating and checking the file coding to achieve the purpose of fault tolerance. However, because the erasure code method needs to encode and decode the file, and only one copy of the file is saved in the system, which has an impact on the performance of the system. Therefore, in order to solve this problem, a hybrid storage scheme based on cold and hot data differentiation is proposed, which not only improves the storage efficiency of distributed file system, but also minimizes the loss of erasure code technology to system performance. The proposed scheme combines the advantages of the three-copy backup method and the erasure code method, and stores the newly written data as a three-copy backup method to ensure the system performance. When the data becomes cold data, the file is encoded and converted. And reduce the number of copies, save storage space. In the process of coding, a stripe is used as the coding unit, which refers to the idea of striping in RAID (RedundantArray ofIndependent Disk). The test results show that the storage efficiency optimization scheme improves the storage efficiency by about 25% compared with the three-copy backup method, and this scheme reduces the system performance loss caused by erasure code technology from 50% to less than 5%. Its performance is basically consistent with the three-copy backup method. The test shows that the storage efficiency optimization scheme based on cold and hot data differentiation does combine the advantages of various schemes and strike a balance between storage cost, data reliability and system performance.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 罗象宏;舒继武;;存储系统中的纠删码研究综述[J];计算机研究与发展;2012年01期
相关博士学位论文 前1条
1 李明强;磁盘阵列的纠删码技术研究[D];清华大学;2011年
本文编号:2479298
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2479298.html