Bigtable分布式存储系统的研究

发布时间：2018-05-12 22:46

本文选题：Bigtable + 批量插入　；参考：《西安电子科技大学》2014年硕士论文

【摘要】：云存储是云计算的核心内容，其市场潜力巨大。Bigtable分布式数据存储系统是Google为其内部海量的结构化数据开发的云存储技术，是Google的第三项云计算关键技术。Bigtable具有高可靠性、高性能、可伸缩等特性，是所有云时代分布式存储系统的开发蓝本。然而，Bigtable中仍存在许多亟待改进的地方。本文首先对Bigtable的实现细节进行具体的表述，并在深入了解Bigtable架构、逐步剖析Bigtable实现细节的基础上，提出了Bigtable在批量插入、主服务器故障恢复、SSTable压缩机制、子表服务器重启、时间轴访问效率低和读操作延迟方面的问题。并在系统分析各个问题的产生原因之后，给出了这些问题可能的解决方案。对于每种方案，都通过仿真或性能分析，给出了新方案在解决Bigtable原有问题上的优势。本文针对Bigtable常规方法在批量插入时会导致子表频繁分裂的问题，给出了带预估期的批量插入方案。对于Bigtable主服务器重启操作流程复杂、资源和时间消耗多的问题，本文将检查点机制应用到主服务器的恢复中，给出了带检查点的主服务器恢复机制。对于Bigtable原压缩机制对周期的选取要求较高的问题，给出了一种基于数量的压缩方案。对于Bigtable中子表服务器重启不能有效利用其本地磁盘上的SSTable文件，，增加不必要的网络通信和数据传输的问题，本文给出了子表服务器初始化的新方案。对于Bigtable在时间轴上访问低效的问题，本文给出了带时间索引的SSTable设计。对于Bigtable中元操作延迟的问题，本文给出了让子表服务器一边从GFS上读取数据，一边与客户端通讯将数据返回给客户端的方案。
[Abstract]:Cloud storage is the core of cloud computing. Its market potential is huge. BigTable distributed data storage system is a cloud storage technology developed by Google for its internal mass of structured data. It is the third key technology of Google cloud computing. BigTable has high reliability. High performance, scalability and other features are all cloud era distributed storage system development blueprint. However, there are still many areas for improvement in BigTable. In this paper, the implementation details of Bigtable are described in detail, and on the basis of deeply understanding the Bigtable architecture and analyzing the implementation details of Bigtable step by step, this paper puts forward the Bigtable in batch insertion, the main server fault recovery and the compression mechanism of SSTable, and the reboot of the sub-table server. Time axis access inefficiency and read delay problems. After systematically analyzing the causes of each problem, the possible solutions to these problems are given. For each scheme, the advantages of the new scheme in solving the original problem of Bigtable are given through simulation or performance analysis. In this paper, a batch insertion scheme with prediction period is proposed to solve the problem that Bigtable routine method causes frequent splitting of subtables during batch insertion. For the problems of complex operation flow and more resource and time consumption of the Bigtable master server restart, this paper applies the checkpoint mechanism to the recovery of the primary server, and gives the recovery mechanism of the primary server with checkpointing. This paper presents a quantity-based compression scheme for the problem that the Bigtable compressor system requires a higher selection of the cycle. To solve the problem that the Bigtable neutron table server can not effectively utilize the SSTable file on its local disk and add unnecessary network communication and data transmission, a new scheme for initialization of the sub-table server is presented in this paper. For the problem of Bigtable accessing inefficiency on time axis, this paper gives the design of SSTable with time index. For the problem of meta operation delay in Bigtable, this paper presents a scheme to let the subtable server read data from the GFS while communicating with the client to return the data to the client.
【学位授予单位】：西安电子科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP333

【参考文献】