分布式存储系统中异步编码的动态条带构建
发布时间:2018-11-08 16:09
【摘要】:为了在保证数据访问性能的同时降低系统的冗余存储开销,分布式存储系统通常会采用异步编码技术。在新数据被写入时,系统使用多副本机制对这些数据进行存储,并在数据访问变少后,在后台将这些数据转化为纠删码方式存储。由于分布式系统通常采用随机分布的数据块放置方法,逻辑地址连续的数据块通常会分散在系统的所有节点中。因此在执行编码操作时,编码进程需要通过跨机架下载来获取数据块。而在编码完成后,又需要跨机架的数据块重新分布来保证数据的可靠性。这种方法即降低了异步编码操作的执行效率,也影响了系统中前台任务进程的性能。为了提高异步编码的执行效率并降低其对前台任务性能的影响,本文提出了一种新型的编码条带构建方式,我们称之为动态条带构建技术(Dynamic Stripe Con-structiom,DSC)。DSC根据当前系统中数据块的放置信息来组建编码条带。放入同一编码条带中的数据块需要满足以下两种性质:(1)这些数据块存在副本存储于同一机架中,以保证在编码时不会引起跨机架的数据块下载;(2)这些数据块存在副本分散在其他独立的机架中,以保证编码完成后不会引起跨机架的数据块重新分布。为了在庞大的选择空间中有效地组建编码条带,我们设计了一种管理数据块放置信息的数据结构,并基于这一数据结构提出了一种线性时间复杂度的动态条带构建算法。该算法可以以热插拔的方式应用于使用任何数据放置方式与纠删码配置的分布式集群。为了验证动态条带构建技术的有效性,我们将DSC实现在HDFS系统上。在真实集群的测试实验中,DSC可以显著的提高异步编码的执行效率(实验中最高改进可达81%),并降低其对前台任务进程的影响。在系统集成的过程中,我们首先探讨了异步编码中节点上数据局部性与负载均衡的问题,随后设计了文件间编码与迭代编码技术来优化异步编码在小文件与追加文件场景下的应用。为了适应分布式集群中不断变化的数据访问负载,我们还提出了一种将动态副本与纠删码结合的新型数据块管理架构。这种架构模式使得我们可以对系统中的数据块进行动态的管理,以在提高数据可靠性与访问性能的同时最小化系统的存储开销。
[Abstract]:In order to ensure the performance of data access and reduce the redundant storage overhead, asynchronous coding is usually used in distributed storage systems. When the new data is written, the system uses multi-replica mechanism to store the data, and after the data access becomes less, the data is converted into erasure code storage in the background. Since distributed systems usually use randomly distributed data blocks, logical address blocks are usually scattered across all nodes of the system. Therefore, when performing encoding operations, the encoding process needs to obtain blocks of data through cross-rack downloads. After the coding is completed, the data blocks across the rack need to be redistributed to ensure the reliability of the data. This method not only reduces the efficiency of asynchronous coding operation, but also affects the performance of foreground task process in the system. In order to improve the efficiency of asynchronous coding and reduce its impact on the performance of foreground tasks, this paper proposes a new coding band construction method, which we call dynamic stripe construction technology (Dynamic Stripe Con-structiom,). DSC). DSC constructs coding bands based on the placement information of data blocks in the current system. The data blocks placed in the same coding strip need to satisfy the following two properties: (1) the data blocks are stored in the same rack in order to ensure that the data blocks across the frame will not be downloaded; (2) the existing copies of these blocks are scattered in other independent frames to ensure that the data blocks across the rack will not be redistributed after the coding is completed. In order to construct coding bands effectively in a large selection space, we design a data structure that manages the information placed in blocks of data. Based on this data structure, we propose a dynamic stripe construction algorithm with linear time complexity. The algorithm can be applied to distributed clusters using any data placement and erasure code configuration. In order to verify the effectiveness of dynamic stripe construction technology, we implement DSC on HDFS system. In the real cluster test, DSC can significantly improve the efficiency of asynchronous coding (up to 81% in the experiment) and reduce its impact on the foreground task process. In the process of system integration, we first discuss the problem of data locality and load balancing in asynchronous coding. Then we design inter-file coding and iterative coding techniques to optimize the application of asynchronous coding in small file and append file scenarios. In order to adapt to the changing data access load in distributed cluster, we also propose a new data block management architecture which combines dynamic replica with erasure code. This architecture pattern enables us to dynamically manage the data blocks in the system to minimize the storage overhead while improving the data reliability and access performance.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
[Abstract]:In order to ensure the performance of data access and reduce the redundant storage overhead, asynchronous coding is usually used in distributed storage systems. When the new data is written, the system uses multi-replica mechanism to store the data, and after the data access becomes less, the data is converted into erasure code storage in the background. Since distributed systems usually use randomly distributed data blocks, logical address blocks are usually scattered across all nodes of the system. Therefore, when performing encoding operations, the encoding process needs to obtain blocks of data through cross-rack downloads. After the coding is completed, the data blocks across the rack need to be redistributed to ensure the reliability of the data. This method not only reduces the efficiency of asynchronous coding operation, but also affects the performance of foreground task process in the system. In order to improve the efficiency of asynchronous coding and reduce its impact on the performance of foreground tasks, this paper proposes a new coding band construction method, which we call dynamic stripe construction technology (Dynamic Stripe Con-structiom,). DSC). DSC constructs coding bands based on the placement information of data blocks in the current system. The data blocks placed in the same coding strip need to satisfy the following two properties: (1) the data blocks are stored in the same rack in order to ensure that the data blocks across the frame will not be downloaded; (2) the existing copies of these blocks are scattered in other independent frames to ensure that the data blocks across the rack will not be redistributed after the coding is completed. In order to construct coding bands effectively in a large selection space, we design a data structure that manages the information placed in blocks of data. Based on this data structure, we propose a dynamic stripe construction algorithm with linear time complexity. The algorithm can be applied to distributed clusters using any data placement and erasure code configuration. In order to verify the effectiveness of dynamic stripe construction technology, we implement DSC on HDFS system. In the real cluster test, DSC can significantly improve the efficiency of asynchronous coding (up to 81% in the experiment) and reduce its impact on the foreground task process. In the process of system integration, we first discuss the problem of data locality and load balancing in asynchronous coding. Then we design inter-file coding and iterative coding techniques to optimize the application of asynchronous coding in small file and append file scenarios. In order to adapt to the changing data access load in distributed cluster, we also propose a new data block management architecture which combines dynamic replica with erasure code. This architecture pattern enables us to dynamically manage the data blocks in the system to minimize the storage overhead while improving the data reliability and access performance.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
【相似文献】
相关期刊论文 前10条
1 ;廉价、高效、稳定 微软新一代分布式存储系统[J];新电脑;2006年06期
2 何公明;张元涛;;面向数字媒体的高性能分布式存储系统的研究与应用[J];广播电视信息;2009年10期
3 范剑波,郭建康;分布式存储系统性能模型的建立与应用[J];计算机工程与应用;2001年13期
4 范剑波,徐利浩;分布式存储系统可靠性的研究[J];计算机工程;2001年06期
5 吴英;谢广军;刘t,
本文编号:2318997
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/2318997.html