HBase分布式缓存策略的研究与设计
发布时间:2018-01-16 21:34
本文关键词:HBase分布式缓存策略的研究与设计 出处:《北京交通大学》2017年硕士论文 论文类型:学位论文
更多相关文章: HBase 分区 读写性能 一致性哈希算法 缓存替换策略
【摘要】:随着互联网的飞速发展,大数据的价值也得到了越来越多的重视。作为大数据研究与应用的基础设施,大数据存储系统显得尤为重要,HBase便是其中一款典型的非关系型数据库。当前HBase仍然存在分区不均衡和缓存替换策略单一等问题,对集群读写性能造成了制约。论文针对这些问题进行研究,致力于优化HBase的读写性能。论文的研究工作得到了国家自然科学基金项目(No.61172072、61271308)、北京市自然科学基金项目(No.4112045)和高等学校博士学科点专项科研基金(No.20100009110002)的支持。论文的主要工作如下:(1)写缓存方面:在不分区的情况下,现有HBase很难发挥出分布式系统的优点。即使采用了预分区技术,也没有一套对任何数据表存储均适用的预分区方法以及一套能够自适应调整系统负载的方案。为了解决上述问题,本文设计了一种两阶段分区方法。预分区阶段,利用MD5的散列效果对RowKey重新进行设计。自适应分区阶段,本文设计了一种RegionServer性能评价策略,依据该策略实现自适应分区。该评价策略将层次分析和TOPSIS相结合,利用并改进了一致性哈希算法,而且设计了一种新的数据结构来实现改进后的一致性哈希算法。(2)读缓存方面:现有BlockCache的LRU缓存替换策略十分粗糙。它虽然将缓存分成多层,但是所有层均使用同一种缓存策略,即只根据数据最后一次更新时间的先后进行缓存替换。本文将对每一层的缓存替换策略进行进一步的设计:在Single层添加了对数据热点的考虑,在Multi层添加了对Block大小的权衡,同时对Single层进入Multi层的门限参数重新进行规定,降低FULL GC发生的概率。另一方面,针对连续数据等紧密关系数据查询速度降低的问题,使用社区发现的思想设计了一个二级缓存来对其弥补。(3)本论文准备了连续型数据、随机型数据和集中型数据来模拟不同的实验情景,将本文设计的HBase系统应用于同构、异构集群中,进行读写性能的测试,并与原HBase的测试结果进行对比和分析。通过实验表明,本论文所给出的方案对原有HBase的读写性能具有一定程度的提高,而且改进后的HBase适用于绝大多数类型的数据表,具有较好的适用性和稳定性。
[Abstract]:With the rapid development of the Internet, big data's value has been paid more and more attention. As the infrastructure of big data's research and application, big data storage system is particularly important. HBase is one of the typical non-relational databases. Currently, there are still some problems in HBase, such as partition imbalance and single cache replacement strategy. The performance of reading and writing in cluster is restricted. In order to optimize the reading and writing performance of HBase, the research work of this paper has been obtained from the National Natural Science Foundation Project No. 61172072 / 61271308). Beijing Natural Science Foundation Project No. 4112045) and the Special Research Foundation for doctoral subject points in institutions of higher Learning No. 20100009110002). The main work of this paper is as follows: write cache aspect: without partitioning. It is difficult for the existing HBase to take advantage of distributed systems, even if pre-partitioning technology is used. There is not a set of prepartitioning methods that are applicable to any data table storage and a scheme to adjust the system load adaptively. In order to solve the above problems. In this paper, a two-stage partitioning method is designed. In the pre-partitioning stage, the RowKey is redesigned using the hash effect of MD5. In this paper, a RegionServer performance evaluation strategy is designed, according to which adaptive partitioning is realized. The evaluation strategy combines AHP with TOPSIS. The consistent hash algorithm is used and improved. Furthermore, a new data structure is designed to implement the improved consistency hash algorithm. Read the cache aspect: the existing BlockCache's LRU cache replacement strategy is rough, although it divides the cache into multiple layers. But all layers use the same caching policy. Cache replacement is only based on the last update time of data. This paper will further design the cache replacement strategy for each layer: the consideration of data hotspots is added in the Single layer. A tradeoff of the Block size is added to the Multi layer, while the threshold parameters of the Single layer entering the Multi layer are re-specified. Reduce the probability of FULL GC occurrence. On the other hand, for the continuous data closely related to the problem of data query speed. Using the idea of community discovery, a two-level cache is designed to compensate for it.) in this paper, continuous data, random data and centralized data are prepared to simulate different experimental scenarios. The HBase system designed in this paper is applied to the isomorphic heterogeneous cluster to test the read and write performance. The test results are compared and analyzed with the original HBase. The scheme presented in this paper can improve the reading and writing performance of the original HBase to a certain extent, and the improved HBase is suitable for most kinds of data tables and has good applicability and stability.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333
【参考文献】
相关期刊论文 前7条
1 葛微;罗圣美;周文辉;赵,
本文编号:1434930
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1434930.html