云存储系统中数据复制研究

发布时间：2018-04-12 23:37

本文选题：云计算 + 云存储系统　；参考：《南京邮电大学》2017年硕士论文

【摘要】：随着云计算技术、云存储技术的发展,系统内部产生的数据量呈现出爆炸式的增长方式。在面对海量数据的存储与处理时,云计算技术以及云存储技术逐渐成为当前互联网技术中主流的数据共享与处理的技术。云存储系统中,经常通过数据复制技术来提高数据可用性,保证数据的容错性和提高容灾能力。云存储技术的关键技术中就包含有副本管理技术和副本放置技术。这些技术能给系统带来好处的同时,也带来了一些负面的影响,比如多副本在系统中存储势必会增加存储代价的开销,副本维护也会消耗部分的系统资源。本文设计了一个数据复制方案并且提出了一个基于负载均衡的数据副本放置方法:(1)复制方案中包含对副本创建时机的判断,副本数量的确定以及副本一致性维护。方案采用基于分块计算校验值的方式来验证并找出需要同步的文件之间的差异数据,在数据同步过程中只需要将两个副本的差异数据同步即可。(2)针对副本放置问题,本文提出的副本放置算法在对整个系统中所有数据中心负载做均衡操作的基础上,根据转发服务请求信息的距离决定是否需要创建新的副本节点并选取合适的放置节点。选取合适的节点过程是对数据中心的各个性能参数综合考评确定数据中心资本值,从而选取资本值最高的数据中心作为副本放置节点。仿真实验结果表明,本文提出的副本放置方法是可行性的。最后,本文借助Hadoop平台中分布式计算框架MapReduce以及分布式文件系统HDFS来设计了相关的业务场景,分别验证了本文设计的复制方案以及数据副本放置方法,验证结果表明,本文设计的复制方案和提出的副本放置方法都是可行的。
[Abstract]:With the development of cloud computing technology and cloud storage technology, the amount of data generated in the system is increasing explosively.In the face of massive data storage and processing, cloud computing technology and cloud storage technology has gradually become the mainstream of Internet technology data sharing and processing technology.In cloud storage systems, data replication technology is often used to improve data availability, ensure data fault tolerance and improve disaster tolerance.The key technologies of cloud storage include replica management and replica placement.These technologies can not only bring benefits to the system, but also bring some negative effects. For example, multi-copy storage in the system will inevitably increase the cost of storage, copy maintenance will also consume part of the system resources.In this paper, we design a data replication scheme and propose a data replica placement scheme based on load balancing, which includes the judgment of replica creation time, the determination of replica number and the maintenance of replica consistency.The scheme uses the method of calculating the check value based on block to verify and find out the difference data between the files that need to be synchronized. In the process of data synchronization, we only need to synchronize the difference data of two replicas.On the basis of balancing the load of all data centers in the whole system, the replica placement algorithm proposed in this paper determines whether a new replica node should be created and the appropriate placement node should be selected according to the distance of forwarding service request information.The process of selecting the appropriate node is to determine the capital value of the data center by synthetically evaluating the performance parameters of the data center, and then selecting the data center with the highest capital value as the replica to place the node.The simulation results show that the proposed replica placement method is feasible.Finally, this paper designs the related business scenarios with the help of the distributed computing framework MapReduce and the distributed file system HDFS in the Hadoop platform, and verifies the replication scheme and the data replica placement method designed in this paper, and the results show that,The replica scheme designed in this paper and the proposed replica placement method are feasible.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP333

【参考文献】