当前位置:主页 > 科技论文 > 计算机论文 >

针对HBase的MapReduce数据访问方式的优化

发布时间:2018-11-01 16:28
【摘要】:随着信息技术的飞速发展,互联网上的数据量快速增长,数据种类也多种多样,世界已经转移到以数据为中心的范式上——“大数据”时代。传统的数据处理技术主要采用数据库管理模式,在面对大数据时存在存储空间不易扩展和查询效率低下的问题,越来越无法满足人们高效处理数据的要求。越来越多企业把目光投向开源的Hadoop云平台,使用HBase来存储和管理数据。HBase中数据读取可以使用MapReduce框架来完成并行化,从而在处理速度上比传统的数据库管理方式有了较大提高,然而在此框架下HBase数据读取的速度仍然无法赶上数据处理的速度,问题主要在于HBase的MapReduce数据访问方式无法完全保证数据的本地性。 本文首先介绍大数据的相关知识,包括大数据存储技术和大数据处理技术,概述了云计算的分类、特点和主要平台,着重研究了当前应用最广泛的Hadoop云平台的三种关键技术,HDFS、MapReduce和HBase。从而为分析和改进HBase的MapReduce过程提供了理论依据。 然后通过深入分析HBase中MapReduce框架的任务分配流程、数据分片过程和数据读取接口(Scan)的工作流程,找到了HBase进行MapRedcue计算的瓶颈:1)任务无法做到本地;2) Region中数据读取是串行的;3)数据需要进行一次合并组成一条记录。针对上述问题,本文提出了一种改进方法,该方法不以原来的逻辑存储单元Region作为任务分配的基本单位,而是以HBase的物理存储单元Block作为任务分配的基本单位;重新设计了数据分片读取方法;采用华中杰提出了基于本地任务优先的MapReduce的调度策略。 最后通过对比实验证明:改进后的接口取消Scan接口的额外处理工作,加强了数据的本地性,使得访问数据所花费的时间减少为原来接口的1/10,很好的节省了工作时间,,从而有效的提高了工作效率。
[Abstract]:With the rapid development of information technology, the amount of data on the Internet is growing rapidly, and the types of data are also varied. The world has been transferred to the data-centered paradigm "big data" era. The traditional data processing technology mainly adopts the database management mode. Facing big data, the storage space is not easy to expand and the query efficiency is low, which is more and more unable to meet the demand of people to deal with the data efficiently. More and more enterprises are looking to the open source Hadoop cloud platform, using HBase to store and manage data. Data reading in HBase can be parallelized by MapReduce framework, so the processing speed is much higher than that of traditional database management. However, under this framework, the speed of HBase data reading is still unable to catch up with the speed of data processing. The main problem lies in the fact that the MapReduce data access mode of HBase can not completely guarantee the nativeness of the data. This paper first introduces big data's relevant knowledge, including big data storage technology and big data processing technology, summarizes the classification, characteristics and main platforms of cloud computing, and focuses on three key technologies of the most widely used Hadoop cloud platform. HDFS,MapReduce and HBase. It provides a theoretical basis for analyzing and improving the MapReduce process of HBase. Then through deeply analyzing the flow of task allocation of MapReduce framework in HBase, the process of data fragmentation and the workflow of data reading interface (Scan), the bottleneck of MapRedcue calculation of HBase is found: 1) the task can not be done locally; 2) data reading in Region is serial; 3) data needs to be merged to form a record at a time. In order to solve the above problems, an improved method is proposed, in which the original logical storage unit (Region) is not taken as the basic unit of task allocation, but the physical storage unit (Block) of HBase is taken as the basic unit of task allocation. This paper redesigns the method of data partitioning and proposes a scheduling strategy based on local task first MapReduce using Huazhong Jie. Finally, through the contrast experiment, it is proved that the improved interface cancels the extra processing work of the Scan interface, strengthens the local data, reduces the time spent on accessing the data to 1 / 10 of the original interface, and saves the working time very well. Thus, the work efficiency is improved effectively.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.13;TP333

【参考文献】

相关期刊论文 前8条

1 孙健;贾晓菁;;Google云计算平台的技术架构及对其成本的影响研究[J];电信科学;2010年01期

2 朱颂;;分布式文件系统HDFS的分析[J];福建电脑;2012年04期

3 刘琦琳;;IBM云计算:从理想到实践[J];互联网周刊;2009年11期

4 侯建;帅仁俊;侯文;;基于云计算的海量数据存储模型[J];通信技术;2011年05期

5 赵华茗;;搭建基于云计算的开源海量数据挖掘平台[J];现代图书情报技术;2010年10期

6 王勇;;Google VS微软:云端对决[J];中国企业家;2008年22期

7 牛莉丽;;云计算环境下的图书馆服务[J];医学信息学杂志;2012年07期

8 郝树魁;;Hadoop HDFS和MapReduce架构浅析[J];邮电设计技术;2012年07期



本文编号:2304468

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2304468.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户2a022***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com