可扩展数据库管理系统中的数据复制

发布时间：2017-12-28 05:32

本文关键词：可扩展数据库管理系统中的数据复制　出处：《华东师范大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着互联网的不断发展,数据规模不断增大,数据库系统的存储与计算的横向扩展能力将会越来越重要。因此,分布式数据库系统以其良好的扩展性受到了工业界和学术界的广泛关注。其中,基于日志结构存储(Log-StructuredStorage)的分布式系统成为了一种新的趋势,这种读写分离的架构已应用于分布式数据库系统中,如阿里巴巴的开源关系型数据库管理系统OceanBase。数据导出是数据复制常见的技术之一,常用于企业级应用,来提高系统的可用性、可扩展性,以及保证数据的可靠性。在采用读写分离架构的分布式数据库系统中,由于数据分为静态数据和动态数据,并且静态数据存储于不同的物理节点上,数据复制成为了一种既消耗时间,也浪费系统资源的一种操作。本文主要分析了在读写分离的分布式数据库架构下,数据复制存在的问题,并提出了有效的解决方法。本文工作的主要贡献如下:1.设计并实现了一种考虑负载均衡的静态数据导出方法。首先,针对分布式数据库的架构特点,直接向不同物理节点发起并发查询请求,减少数据的网络传输次数,缩短响应时间。其次,采用生产者消费者模型加快数据写磁盘速度并解决占用大量内存的问题。最后,根据数据多副本的特点,将查询请求均匀的发送给各个节点,使系统中的各个节点负载均衡,同时也能提高整体数据导出的性能。2.设计并实现了一种基于日志解析的动态数据捕获方法。一方面,实现日志同步和日志拉取功能,保证数据的正确性。另一方面,在日志解析过程中精简对同一元组的频繁操作,避免冗余操作,降低应用更新的代价。3.通过基准测试YCSB生成测试数据集并设计多组实验,验证了本文提出的数据导出方法的可行性与高效性。并在开源数据库CEDAR上实现了本文提出的数据导出方法。实验结果展示了本文提出的数据导出方法能有效的降低响应时间,减少系统资源占用。本文提出的数据复制方法在CEDAR中的测试结果表明,该方法极大地提升了数据导出的效率。同时,本文提出的方法对同类型的可扩展数据库管理系统的数据复制有借鉴意义,也为可扩展数据库管理系统后续的数据复制技术提供了参考。
[Abstract]:With the continuous development of the Internet, the scale of data is increasing, and the lateral expansion of the storage and calculation of the database system will become more and more important. As a result, the distributed database system has attracted wide attention from industry and academia for its good scalability. Among them, the distributed system based on log structure storage (Log-StructuredStorage) has become a new trend. The architecture of reading and writing separation has been applied to distributed database systems, such as Alibaba's open source relational database management system OceanBase. Data export is one of the common technologies of data replication. It is commonly used in enterprise applications to improve the availability and scalability of the system, and to ensure data reliability. In distributed database system with read / write separation architecture, data is divided into static data and dynamic data, and static data are stored on different physical nodes. Data replication has become an operation which consumes time and wastes system resources. This paper mainly analyzes the problems of data replication in the distributed database architecture which is separated by read and write, and puts forward an effective solution. The main contributions of this work are as follows: 1. the design and implementation of a static data export method considering load balancing is designed and implemented. First, aiming at the architecture characteristics of distributed database, it directly initiates concurrent query requests to different physical nodes, reducing the number of network transmission and shortening the response time. Secondly, the producer consumer model is used to speed up the data write disk speed and to solve the problem of large amount of memory. Finally, according to the characteristics of multiple replicates, the query requests are sent to all nodes evenly, so that the load of each node in the system is balanced, and the overall data export performance is also improved. 2. design and implement a dynamic data capture method based on log parsing. On the one hand, log synchronization and log pull are implemented to ensure the correctness of the data. On the other hand, the frequent operation of the same tuple is streamlined in the log parsing process to avoid redundant operations and reduce the cost of application updates. 3. the test data set is generated by the benchmark YCSB and a number of experiments are designed to verify the feasibility and efficiency of the data export method proposed in this paper. The data export method proposed in this paper is implemented on the open source database CEDAR. The experimental results show that the proposed data export method can effectively reduce the response time and reduce the system resource occupancy. The results of the data replication method presented in this paper in CEDAR show that this method greatly improves the efficiency of the data export. At the same time, the method proposed in this paper has reference significance for data replication of the same type of extensible database management system, and also provides a reference for the subsequent data replication technology of the extensible database management system.
【学位授予单位】：华东师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【参考文献】