Hadoop框架的扩展和性能调优
发布时间:2018-01-14 08:30
本文关键词:Hadoop框架的扩展和性能调优 出处:《西安建筑科技大学》2012年硕士论文 论文类型:学位论文
更多相关文章: 云计算 网格计算 LSF Hadoop MapReduce
【摘要】:云计算作为一种新的概念在2007年成为了人们热议的话题,在随后的几年内云计算得到了快速的发展。从计算模式来看,云计算、分布式计算和网格计算有很多相似之处,深入研究云计算产生的背景就可以看出,云计算是在分布式计算和网格计算的基础之上发展起来的。以前的分布式计算和网格计算主要用于科学研究方面,随着互联网的迅速发展,分布式计算和网格计算的思想逐渐演化为一种更适合商用的计算模式-云计算。 论文首先介绍了云计算与网格计算的相关背景知识,并分析了两者之间的区别,然后对云计算平台Hadoop核心组成MapReduce、HDFS(Hadoop Distributed FileSystem)和Hbase等的关键技术进行详细的分析与研究[1]。接着详细介绍了LSF(Load Sharing Facility)系统的架构组成,包括LSF base和LSF batch两部分,并对LSF的作业执行流程和系统负载均衡进行了深入细致的分析。 论文在对Hadoop系统深入研究分析之后,发现Hadoop在面对企业级应用时有3大不足,分别是单点故障、调度算法单一、异构平台兼容性差[2]。针对这几点不足,论文对Hadoop系统与LSF系统进行了关联性整合,形成一个新的系统LSH(LoadShare Hadoop)。系统整合主要有两大结合点,第一,,将LSF的作业控制机制LIM(Load Information Manager)、RES (Remote Execution Server)和SBD(sbatch,一个守护进程)加入到Hadoop系统的HDFS层与MapReduce层之间;第二,LSF的master节点与HDFS的NameNode之间通过开放接口共享信息。整合后的系统LSH有效地防止了Hadoop系统的单点故障问题,也解决了Hadoop调度算法单一的问题和Hadoop对异构平台的兼容性问题。 论文最后针对整合后的系统LSH和原生态的Hadoop系统设计了不同的实验,分别来验证两系统对单点故障的处理、差异性作业的性能和异构平台的适应性方面的表现,结果证明LSH系统完全弥补了原生Hadoop的不足,LSH是能够适应企业级的应用。
[Abstract]:Cloud computing as a new concept in 2007 has become a hot topic, cloud computing has been rapid development in the following years. From the perspective of computing mode, cloud computing. There are many similarities between distributed computing and grid computing. Cloud computing is developed on the basis of distributed computing and grid computing. The former distributed computing and grid computing are mainly used in scientific research, with the rapid development of the Internet. The idea of distributed computing and grid computing has evolved into a more commercial computing model-cloud computing. Firstly, this paper introduces the background knowledge of cloud computing and grid computing, and analyzes the difference between them. Then, the Hadoop core of cloud computing platform is composed of MapReduce. The key technologies of HDFS(Hadoop Distributed File system and Hbase are analyzed and studied in detail. [1. Then the architecture of the LSF(Load Sharing availability) system is introduced in detail, including two parts: LSF base and LSF batch. The job execution flow and system load balance of LSF are analyzed in detail. After deeply studying and analyzing Hadoop system, it is found that Hadoop has three shortcomings in facing enterprise application, namely, single point fault, single scheduling algorithm and poor compatibility of heterogeneous platform. [2]. Aiming at these shortcomings, this paper integrates the Hadoop system and the LSF system. To form a new system LSH(LoadShare Hadoop. System integration has two main points of convergence, first. LSF's job control mechanism, LIM(Load Information Manager. RES remote Execution Server) and SBD(sbatch. A daemon is added between the HDFS layer and the MapReduce layer of the Hadoop system; Number two. The master node of LSF and the NameNode of HDFS share information through open interface. The integrated system LSH effectively prevents the single point of failure of Hadoop system. It also solves the problem of single Hadoop scheduling algorithm and compatibility of Hadoop to heterogeneous platforms. At the end of the paper, different experiments are designed for the integrated system LSH and the original Hadoop system, respectively, to verify the two systems to deal with the single point of failure. The performance of different jobs and the adaptability of heterogeneous platforms show that the LSH system can fully compensate for the deficiency of native Hadoop and is able to adapt to enterprise-level applications.
【学位授予单位】:西安建筑科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP338.8
【参考文献】
相关期刊论文 前3条
1 李鑫;张鹏;;Hadoop集群公平调度算法的改进与实现[J];电脑知识与技术;2012年01期
2 李成华;张新访;金海;向文;;MapReduce:新型的分布式并行计算编程模型[J];计算机工程与科学;2011年03期
3 孙广中;肖锋;熊曦;;MapReduce模型的调度及容错机制研究[J];微电子学与计算机;2007年09期
相关硕士学位论文 前6条
1 陈艳金;MapReduce模型在Hadoop平台下实现作业调度算法的研究和改进[D];华南理工大学;2011年
2 徐文强;基于HDFS的云存储系统研究[D];上海交通大学;2011年
3 张文峰;基于MapReduce模型的分布式计算平台的原理与设计[D];华中科技大学;2010年
4 杜志源;基于OGSA的教育资源共享研究[D];西安电子科技大学;2007年
5 夏yN;Hadoop平台下的作业调度算法研究与改进[D];华南理工大学;2010年
6 张密密;MapReduce模型在Hadoop实现中的性能分析及改进优化[D];电子科技大学;2010年
本文编号:1422823
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1422823.html