基于LSM-Tree的持久化缓存机制的优化研究
发布时间:2018-09-10 10:35
【摘要】:互联网行业的蓬勃发展带动了数据量的爆炸式增长,传统的关系型数据库并不能满足海量数据规模的应用,所以Key-value存储机制应运而生。伴随该机制,NoSQL型数据库得到了大量推广,NoSQL数据库由于其处理海量数据之快而一般用作于传统关系型数据库的前置数据库,但这种基于内存存储的数据库在数据持久化方面并不完善,并且存在系统宕机时导致数据丢失的风险。然而随着互联网技术的成熟,该类型数据库开始摸索将其缓存数据持久化的方法。本文主要着眼于KV存储机制的NoSQL数据库的持久化研究,针对Google开源的LevelDB数据库所采用的LSM-Tree持久化机制,进行深入研究。文章首先从LSM-Tree持久化缓存模型入手,深入分析该模型的缓存机制;然后找出LSM-Tree缓存机制持续写入快速但随机读取较慢的不足之处,引入改进的LSM-Tree模型;对应用该缓存机制的Google单机持久化存储系统LevelDB进行详细研究,着重分析其关键技术和运行机理,用C++语言在LevelDB的源码上进行改动。文章最后使用雅虎YCSB负载生成器框架对改进前后的LevelDB性能做比较,结果显示在保证写入吞吐量的同时随机读取吞吐量提高了一倍,随机读取延时有明显降低,优化效果较为明显。
[Abstract]:The booming development of the Internet industry has driven the explosive growth of the data volume. The traditional relational database can not meet the application of massive data scale, so the Key-value storage mechanism came into being. Along with this mechanism, NoSQL database has been widely popularized. Because of its fast processing of massive data, NoSQL database is generally used as the predatabase of traditional relational database. However, the database based on memory storage is not perfect in data persistence, and there is the risk of data loss when the system is down. However, with the maturity of Internet technology, this type of database began to explore its cache data persistence method. This paper focuses on the persistence of NoSQL database based on KV storage mechanism and the persistence mechanism of LSM-Tree in Google open source LevelDB database. This paper first analyzes the cache mechanism of LSM-Tree persistence cache model, and then finds out the shortcomings of LSM-Tree cache mechanism that writes fast but reads slowly, and then introduces an improved LSM-Tree model. The Google single persistent storage system (LevelDB) which applies the cache mechanism is studied in detail. The key technology and running mechanism are analyzed emphatically. The source code of LevelDB is modified with C language. Finally, the LevelDB performance before and after the improvement is compared with Yahoo YCSB load generator framework. The results show that the random read throughput is doubled while the write throughput is guaranteed, and the random read delay is obviously reduced. The effect of optimization is obvious.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
本文编号:2234193
[Abstract]:The booming development of the Internet industry has driven the explosive growth of the data volume. The traditional relational database can not meet the application of massive data scale, so the Key-value storage mechanism came into being. Along with this mechanism, NoSQL database has been widely popularized. Because of its fast processing of massive data, NoSQL database is generally used as the predatabase of traditional relational database. However, the database based on memory storage is not perfect in data persistence, and there is the risk of data loss when the system is down. However, with the maturity of Internet technology, this type of database began to explore its cache data persistence method. This paper focuses on the persistence of NoSQL database based on KV storage mechanism and the persistence mechanism of LSM-Tree in Google open source LevelDB database. This paper first analyzes the cache mechanism of LSM-Tree persistence cache model, and then finds out the shortcomings of LSM-Tree cache mechanism that writes fast but reads slowly, and then introduces an improved LSM-Tree model. The Google single persistent storage system (LevelDB) which applies the cache mechanism is studied in detail. The key technology and running mechanism are analyzed emphatically. The source code of LevelDB is modified with C language. Finally, the LevelDB performance before and after the improvement is compared with Yahoo YCSB load generator framework. The results show that the random read throughput is doubled while the write throughput is guaranteed, and the random read delay is obviously reduced. The effect of optimization is obvious.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
【参考文献】
相关期刊论文 前8条
1 严霄凤;张德馨;;大数据研究[J];计算机技术与发展;2013年04期
2 张顺仕;龚代圣;杨静;;高负载下数据库动态性能预测模型的研究与应用[J];计算机与现代化;2012年11期
3 王心妍;;Memcached和Redis在高速缓存方面的应用[J];无线互联科技;2012年09期
4 杨艳;李炜;王纯;;内存数据库在高速缓存方面的应用[J];现代电信科技;2011年12期
5 郭云峰;张集祥;;一种基于压缩前缀树的频繁模式挖掘算法[J];计算机工程与科学;2009年12期
6 陈丽冰;;基于J2EE的Web应用系统的性能优化方法研究[J];计算机科学;2006年07期
7 林雅榕;侯整风;;对哈希算法SHA-1的分析和改进[J];计算机技术与发展;2006年03期
8 魏利峰,左明,王志晓,刘二永;一个基于集群的Web服务器负载平衡模型[J];计算机工程;2005年10期
相关硕士学位论文 前3条
1 陈席林;分布式持久化缓存系统的研究与实现[D];重庆大学;2013年
2 曹帅;公共信息模型的面向对象数据管理[D];东华大学;2012年
3 孟阳;分布式缓存系统MCACHE的优化与测试[D];北京邮电大学;2010年
,本文编号:2234193
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2234193.html