容错分布式存储系统扩容机制研究
[Abstract]:Today's large-scale distributed storage systems use redundant storage to maintain data availability. The redundant information generation mode has the copy and deletion codes. the storage overhead required for providing the same fault-tolerant capability is greatly reduced with respect to the replication, and is used by an increasing number of storage systems. On the other hand, the rapid growth of data, as well as the user's increasing system capacity and performance requirements, often result in the current build-up of storage systems with low storage capacity and insufficient bandwidth resources. When application requirements exceed system capabilities, the storage resource needs to be increased and some of the data is migrated to the new storage device to relieve the pressure, which is known as the storage-system expansion. Therefore, it is of great significance to study the capacity expansion mechanism of the distributed storage system based on the erasure code, and it is of great significance to the cloud storage and the data storage in the background of the data center. This paper studies the expansion mechanism of the distributed storage system from the three dimensions of the system I/ O request and the system I/ O request and the user's access performance after the expansion, and the main research contents and contributions are as follows: (1) The research of the capacity expansion of the Cauchy Reed-Solomon (CRS) is becoming more and more important as the current storage system is improving the fault tolerance. CRS encoding is mainly applicable to a distributed storage system (e.g., CleverSafe, OceanStore) consisting of a number of storage nodes and the Internet. The expansion process requires the migration of part of the data to the new storage device, while the check needs to be updated. The storage I/ O and network transmission bandwidth overhead brought by the data migration and check update directly influence the system performance in the expansion process. In this paper, the expansion of the distributed storage system based on CRS is studied, the first step is to design the expanded coding matrix, the second step is to design the data migration scheme in the expansion process, and the third step further optimizes the data migration process by using the idea of the data of the check and decoding part. In this paper, a three-stage optimization and expansion algorithm is designed for the expansion of CRS system. The theoretical analysis shows that the three-stage optimization expansion algorithm in this paper can effectively reduce the system I/ O and network transmission bandwidth in the expansion process of the CRS system with respect to the basic capacity expansion algorithm. In this paper, the validity and practicability of the algorithm under the single thread and multi-thread architecture are verified by deploying the CRS three-stage optimization expansion algorithm in the actual distributed file system and comparing with the basic capacity expansion algorithm. (2) On-line capacity expansion is studied in the actual storage system. Most upper-level user-level applications require the system to provide an online service of 7x24 hours. Therefore, when the storage system is expanded online, the I/ O request and the migration I/ O request of the user compete with each other, and the response time performance of the user and the migration in the expansion process is bound to be affected. However, the existing capacity expansion algorithm seldom takes into account the user I/ O request at the time of design, and the response time performance of the user and the migration in the on-line expansion process is bound to be degraded. In this paper, an on-line capacity expansion optimization mechanism, Popularity-based Online Scaling (POS), is designed for a number of expansion algorithms. The on-line capacity expansion optimization mechanism (POS) of this paper is based on two characteristics of user access in the actual system, namely, data heat and data locality, by dividing the original storage space into a plurality of areas, and recording the heat of each area (mainly taking the access frequency as an index), and the influence of user access on the migration performance can be reduced. The POS can be regarded as a plug-in, which can be applied vertically to a large number of expansion algorithms, so as to improve the on-line capacity expansion performance. By deploying the POS in the actual disk simulator, and carrying out extensive experimental comparison with the existing RAID-0 expansion algorithm FastScale, this paper proves that the performance of the response time of the user and the migration in the on-line expansion process can be improved significantly with respect to the traditional expansion algorithm. (3) After capacity expansion, read and write performance optimization study storage system expansion must take account of the performance of the expansion process and the user's reading and writing operation performance after the end of expansion. On the one hand, the greater the system I/ O overhead in the expansion process, the longer the expansion time window, the greater the impact on the migration and the user's response time performance during the expansion: on the other hand, after the expansion is over, the normal user read and write operation must be served, The user access performance after the expansion is also important. However, the existing capacity expansion algorithm is mainly concerned with minimizing the amount of data migration in the expansion process, and does not consider optimizing the user's reading and writing operation performance after the expansion. Because the expansion process changes the data layout of the system, the expansion process directly influences the normal user access performance after the expansion. Therefore, this paper, from the process of expansion, considers the design of the data migration method. In this paper, a new expansion algorithm, PostScale, is designed based on the expansion of RAID-0. PostScale realizes the minimum data migration in the expansion process, and under the constraint condition, the maximum dispersion and placement of the continuous data blocks after the expansion end is guaranteed. With such a design, the expansion time window is reduced, and the user read and write requests after the expansion end can utilize the maximum concurrent access performance of the storage system. The simulation results show that the PostScale has the advantages of both the traditional two RAID-0 expansion algorithms, round-robin and FastScale, and PostScale can greatly reduce the expansion time window of the round-robin, and can effectively improve the time performance of user read and write response after the expansion of the FastScale. PostScale in this paper can further extend to the expansion of the RAID-5 system, expand the distributed storage system based on Reed-Solomon coding, and improve the user access performance after the expansion.
【学位授予单位】:中国科学技术大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP333
【相似文献】
相关期刊论文 前10条
1 王征;刘心松;李美安;;企业信息分布式存储的热点处理策略[J];计算机集成制造系统;2006年09期
2 李磊;沈海斌;黄凯;严晓浪;Han Sangil;Ahmed A Jerraya;;分布式存储管理在多核设计中的高层建模[J];电子与信息学报;2008年11期
3 刘翔;汪海玲;;分布式存储中的一种数据放置策略[J];计算机与数字工程;2009年05期
4 陈卫卫;吴海佳;胥光辉;;分布式存储中文件分割的最优化模型[J];解放军理工大学学报(自然科学版);2010年04期
5 崔忠强;左德承;张展;;在云间可重构的分布式存储[J];系统工程理论与实践;2011年S2期
6 郝杰;逯彦博;刘鑫吉;夏树涛;;分布式存储中的再生码综述[J];重庆邮电大学学报(自然科学版);2013年01期
7 唐京伟;;基于云计算的分布式存储技术[J];中国传媒科技;2013年15期
8 郭栋;王伟;曾国荪;;基于一致性树分布的数据分布式存储方法[J];计算机应用;2013年12期
9 苏李亮;王云福;侯斌;;海量设计文档分布式存储及负载均衡的研究与实现[J];电信科学;2013年12期
10 谢然;;敢问存储之路在何方?见分布式存储摇曳在数据枝头[J];互联网周刊;2014年02期
相关会议论文 前7条
1 苏李亮;王云福;侯斌;;海量设计文档分布式存储及负载均衡的研究与实现[A];2013电力行业信息化年会论文集[C];2013年
2 苏李亮;王云福;侯斌;;海量设计文档分布式存储及负载均衡的研究与实现[A];2013电力行业信息化年会论文集[C];2013年
3 郑文武;李先绪;黄植勤;邱红飞;;云存储关键技术[A];2012全国无线及移动通信学术大会论文集(下)[C];2012年
4 蒋轶林;郭淑琴;;分布式存储在数字集群移动通信系统中的应用[A];浙江省电子学会2013学术年会论文集[C];2013年
5 姜继忱;陈钢;;P2P之路——缔造“分布式对等”的Internet3.0[A];全面建设小康社会:中国科技工作者的历史责任——中国科协2003年学术年会论文集(下)[C];2003年
6 付伟;肖侬;卢锡城;;QoS感知的副本放置问题研究综述[A];第15届全国信息存储技术学术会议论文集[C];2008年
7 张彦;刘欣然;徐慧彬;;一种基于虚拟计算环境的分布式存储体系结构[A];2009全国计算机网络与通信学术会议论文集[C];2009年
相关重要报纸文章 前8条
1 京东架构委员会主任 云平台首席架构师 系统技术部负责人 刘海锋;京东:分布式存储体系成为业务基石[N];中国信息化周报;2014年
2 《网络世界》记者 于翔;京东分布式存储体系研发历程[N];网络世界;2014年
3 《网络世界》记者 于翔;融合一体机投入大规模商用[N];网络世界;2013年
4 记者 余荣华;大数据,催生大变革[N];人民日报;2014年
5 本报记者 张佳星;新生产业布局如何“云”中索骥[N];科技日报;2014年
6 本报记者 甘露;物联网让管理更美妙[N];计算机世界;2013年
7 本报记者 郭涛;华为帮用户定制HANA一体机[N];中国计算机报;2013年
8 临江;手机浏览器,3G时代的采矿机?[N];人民邮电;2009年
相关博士学位论文 前9条
1 吴思;容错分布式存储系统扩容机制研究[D];中国科学技术大学;2016年
2 胡q,
本文编号:2341817
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2341817.html