基于LSM-Tree的持久化缓存机制的优化研究

发布时间：2018-09-10 10:35

【摘要】：互联网行业的蓬勃发展带动了数据量的爆炸式增长,传统的关系型数据库并不能满足海量数据规模的应用,所以Key-value存储机制应运而生。伴随该机制,NoSQL型数据库得到了大量推广,NoSQL数据库由于其处理海量数据之快而一般用作于传统关系型数据库的前置数据库,但这种基于内存存储的数据库在数据持久化方面并不完善,并且存在系统宕机时导致数据丢失的风险。然而随着互联网技术的成熟,该类型数据库开始摸索将其缓存数据持久化的方法。本文主要着眼于KV存储机制的NoSQL数据库的持久化研究,针对Google开源的LevelDB数据库所采用的LSM-Tree持久化机制,进行深入研究。文章首先从LSM-Tree持久化缓存模型入手,深入分析该模型的缓存机制;然后找出LSM-Tree缓存机制持续写入快速但随机读取较慢的不足之处,引入改进的LSM-Tree模型;对应用该缓存机制的Google单机持久化存储系统LevelDB进行详细研究,着重分析其关键技术和运行机理,用C++语言在LevelDB的源码上进行改动。文章最后使用雅虎YCSB负载生成器框架对改进前后的LevelDB性能做比较,结果显示在保证写入吞吐量的同时随机读取吞吐量提高了一倍,随机读取延时有明显降低,优化效果较为明显。
[Abstract]:The booming development of the Internet industry has driven the explosive growth of the data volume. The traditional relational database can not meet the application of massive data scale, so the Key-value storage mechanism came into being. Along with this mechanism, NoSQL database has been widely popularized. Because of its fast processing of massive data, NoSQL database is generally used as the predatabase of traditional relational database. However, the database based on memory storage is not perfect in data persistence, and there is the risk of data loss when the system is down. However, with the maturity of Internet technology, this type of database began to explore its cache data persistence method. This paper focuses on the persistence of NoSQL database based on KV storage mechanism and the persistence mechanism of LSM-Tree in Google open source LevelDB database. This paper first analyzes the cache mechanism of LSM-Tree persistence cache model, and then finds out the shortcomings of LSM-Tree cache mechanism that writes fast but reads slowly, and then introduces an improved LSM-Tree model. The Google single persistent storage system (LevelDB) which applies the cache mechanism is studied in detail. The key technology and running mechanism are analyzed emphatically. The source code of LevelDB is modified with C language. Finally, the LevelDB performance before and after the improvement is compared with Yahoo YCSB load generator framework. The results show that the random read throughput is doubled while the write throughput is guaranteed, and the random read delay is obviously reduced. The effect of optimization is obvious.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13

【参考文献】