面向分布式存储系统的数据一致性协议设计与优化
发布时间:2019-01-20 19:27
【摘要】:随着大数据、云计算等概念的提出与兴起,分布式存储技术的价值和重要性日渐凸显。相比于传统的集中式存储系统,分布式存储系统具有低成本、易扩展、高可用等优势。然而,分布式存储的架构体系也为数据复制和同步带来了诸多挑战。首先,由于分布式存储系统建立在异步通信环境之上,且系统中各数据节点存在发生不可预计故障的情况,使得分布式存储系统实现数据一致非常困难。其次,根据CAP理论,数据强一致性和系统可用性、分区容错性之间存在相互制约的关系,如何在保证数据一致性的前提下综合与其他因素间的权衡是一项极具挑战的课题。针对上述问题,本文根据分布式存储系统的特点,在现有数据一致性理论的基础上,实现了基于Paxos协议的分布式数据一致性模块。然后,通过对一致性过程的优化和改进,提出了流程简化、可用性高且读写同步的数据一致性协议设计。本文的主要研究内容和结果归纳如下:(1)基于经典的Paxos协议,实现了分布式存储系统数据一致性模块。该模块可以同时接收多个客户端并行发送的数据操作请求,生成系统各个数据节点可获取的相同的数据操作序列。各数据节点顺序执行序列中数据操作请求,就可以实现分布式存储系统的数据最终一致性。(2)针对Paxos协议中消息传递次数因提议者抢占接受者访问权而增多,导致协议延迟增大的问题,通过延长提案编号有效范围的方式优化了Paxos协议的运行过程,进而提高了数据一致性模块的每秒查询率。(3)针对基于Paxos协议实现的分布式存储系统数据一致性模块只能达到数据的最终一致性程度,而无法实现数据的同步读写的问题,通过选举分布式存储系统中的一个数据节点作为唯一接受客户端请求节点的方式,为数据一致性模块提供了数据同步读写的功能。最后,本文对提出的数据一致性协议进行了实验,通过对若干关键指标的检测,证明了本文所提方法的正确性和有效性。
[Abstract]:With the development of big data and cloud computing, the value and importance of distributed storage technology are becoming more and more important. Compared with the traditional centralized storage system, distributed storage system has the advantages of low cost, easy expansion and high availability. However, distributed storage architecture also brings many challenges to data replication and synchronization. Firstly, the distributed storage system is built on the asynchronous communication environment, and the data nodes in the system have unpredictable failures, so it is very difficult to realize the data consistency in the distributed storage system. Secondly, according to CAP theory, there is a relationship between strong consistency of data and availability of system, and fault tolerance of partition. How to synthesize the trade-off between data consistency and other factors under the premise of data consistency is a very challenging issue. According to the characteristics of distributed storage system and the existing data consistency theory, the distributed data consistency module based on Paxos protocol is implemented in this paper. Then, through the optimization and improvement of the consistency process, the design of data consistency protocol with simplified flow, high availability and synchronous reading and writing is proposed. The main research contents and results are summarized as follows: (1) based on the classical Paxos protocol, the distributed storage system data consistency module is implemented. The module can simultaneously receive data operation requests sent by multiple clients in parallel and generate the same data operation sequences that can be obtained by each data node of the system. When each data node executes the data operation request in sequence, the data consistency of distributed storage system can be realized. (2) the number of message delivery in Paxos protocol increases because the proponent preempts the access right of the receiver. The problem that leads to the increase of protocol delay optimizes the running process of Paxos protocol by extending the valid range of proposal number. Then it improves the query rate of data consistency module per second. (3) the distributed storage system data consistency module based on Paxos protocol can only achieve the final consistency of data, but can not achieve the problem of synchronous reading and writing of data. By electing one of the data nodes in the distributed storage system as the only way to accept the client request node, the data consistency module is provided with the function of data synchronous reading and writing. Finally, the proposed data consistency protocol is tested, and the correctness and effectiveness of the proposed method are proved by the detection of some key indicators.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
本文编号:2412329
[Abstract]:With the development of big data and cloud computing, the value and importance of distributed storage technology are becoming more and more important. Compared with the traditional centralized storage system, distributed storage system has the advantages of low cost, easy expansion and high availability. However, distributed storage architecture also brings many challenges to data replication and synchronization. Firstly, the distributed storage system is built on the asynchronous communication environment, and the data nodes in the system have unpredictable failures, so it is very difficult to realize the data consistency in the distributed storage system. Secondly, according to CAP theory, there is a relationship between strong consistency of data and availability of system, and fault tolerance of partition. How to synthesize the trade-off between data consistency and other factors under the premise of data consistency is a very challenging issue. According to the characteristics of distributed storage system and the existing data consistency theory, the distributed data consistency module based on Paxos protocol is implemented in this paper. Then, through the optimization and improvement of the consistency process, the design of data consistency protocol with simplified flow, high availability and synchronous reading and writing is proposed. The main research contents and results are summarized as follows: (1) based on the classical Paxos protocol, the distributed storage system data consistency module is implemented. The module can simultaneously receive data operation requests sent by multiple clients in parallel and generate the same data operation sequences that can be obtained by each data node of the system. When each data node executes the data operation request in sequence, the data consistency of distributed storage system can be realized. (2) the number of message delivery in Paxos protocol increases because the proponent preempts the access right of the receiver. The problem that leads to the increase of protocol delay optimizes the running process of Paxos protocol by extending the valid range of proposal number. Then it improves the query rate of data consistency module per second. (3) the distributed storage system data consistency module based on Paxos protocol can only achieve the final consistency of data, but can not achieve the problem of synchronous reading and writing of data. By electing one of the data nodes in the distributed storage system as the only way to accept the client request node, the data consistency module is provided with the function of data synchronous reading and writing. Finally, the proposed data consistency protocol is tested, and the correctness and effectiveness of the proposed method are proved by the detection of some key indicators.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
【参考文献】
相关期刊论文 前3条
1 李芬;朱志祥;刘盛辉;;大数据发展现状及面临的问题[J];西安邮电大学学报;2013年05期
2 王喜妹;杨寿保;王淑玲;郭良敏;;云存储中一种自适应的副本一致性维护机制[J];中国科学院研究生院学报;2013年01期
3 岳昆,王晓玲,周傲英;Web服务核心支撑技术:研究综述[J];软件学报;2004年03期
相关硕士学位论文 前1条
1 游胜;网络协议仿真方法及软件实现关键技术的研究[D];湖南大学;2008年
,本文编号:2412329
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2412329.html