当前位置:主页 > 管理论文 > 移动网络论文 >

云计算平台Hadoop负载均衡研究

发布时间:2018-04-16 14:32

  本文选题:Big + Data ; 参考:《河北工程大学》2014年硕士论文


【摘要】:负载均衡在Hadoop集群系统中十分重要,合理的负载均衡策略能提高集群的性能,同时也可以改善用户体验。Hadoop集群的任务调度策略和调度方式对负载分配有很大影响,但是目前Hadoop集群的调度方式没有考虑到负载的均衡问题。本文主要是从任务调度的角度对集群负载进行研究,,在任务调度程中就考虑到负载均衡,比当集群负载已经严重失衡时再调整更有意义。 本文详细介绍了云计算、MapReduce分布式计算框架和HDFS及MapReduce的Java开源实现Hadoop,重点分析了Hadoop中作业执行的过程和当前Hadoop中常见的调度算法FIFO、Capacity Scheduler和Fair Scheduler。充分利用TaskTracker请求新任务的Heartbeat信息,从任务调度的角度出发,提出了动态反馈的负载均衡调度方式(Dynamic Feedback Load Balance,DFLB)。主要是在作业执行过程收集相关信息,反馈给JobTracker,在新任务分配时利用这些信息,并在新任务执行时再收集执行信息,最终形成一个收集-反馈-利用-收集这样一个闭环。本文对集群的负载均衡情况进行了数学定义,为任务调度时任务分配情况是否合理提供了考虑和判断的依据。另外考虑到作业的公平性,在调度过程中提出了作业的动态优先级。 在分析完动态反馈负载均衡的流程后,动手搭建了Hadoop集群对研究结果进行验证,对试验结果进行对比分析,结果表明DFLB调度方式可以使集群负载达到均衡状态,作业的平均响应时间较Hadoop自带的调度方式有一定的改善,充分反映了负载均衡对资源利用率和作业并行度的影响,对Hadoop云计算平台的负载均衡研究取得了有意义的进展。
[Abstract]:Load balancing is very important in Hadoop cluster system. A reasonable load balancing strategy can improve the performance of the cluster, and it can also improve the user experience. Hadoop cluster task scheduling strategy and scheduling methods have a great impact on load distribution.However, the current scheduling of Hadoop cluster does not take load balance into account.This paper mainly studies the cluster load from the perspective of task scheduling. It is more meaningful to consider load balancing in the task scheduling process than to adjust when the cluster load has been seriously out of balance.This paper introduces the distributed computing framework of cloud computing MapReduce and the Java open source implementation of HDFS and MapReduce in detail. The process of job execution in Hadoop and the scheduling algorithms such as FIFO capacity Scheduler and Fair Scheduler, which are commonly used in Hadoop at present, are analyzed in detail.Taking full advantage of the Heartbeat information of new tasks requested by TaskTracker, a dynamic Feedback Load balance scheduling method with dynamic feedback is proposed from the point of view of task scheduling.It mainly collects the relevant information in the job execution process, feeds back to JobTracker, uses the information in the new task assignment, and then collects the execution information when the new task is executed, and finally forms a close loop of collecting, feedback, utilizing and collecting.In this paper, the mathematical definition of load balancing in cluster is given, which provides a basis for considering and judging whether the task allocation is reasonable or not.In addition, considering the fairness of the job, the dynamic priority of the job is proposed in the scheduling process.After analyzing the process of dynamic feedback load balancing, a Hadoop cluster is built to verify the research results, and the results are compared. The results show that the DFLB scheduling mode can make the cluster load balance.The average response time of jobs is better than that of Hadoop, which fully reflects the influence of load balancing on resource utilization and job parallelism, and makes a significant progress in the research of load balancing in Hadoop cloud computing platform.
【学位授予单位】:河北工程大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.09

【参考文献】

相关期刊论文 前6条

1 李秋云;朱庆保;马卫;;用于连续域寻优的分组蚁群算法[J];计算机工程与应用;2010年30期

2 陈全;邓倩妮;;云计算及其关键技术[J];计算机应用;2009年09期

3 顾宏久;;浅谈虚拟化与云计算的关系[J];科学咨询(科技·管理);2011年08期

4 王笑宇;程良伦;;云计算下的多源信息资源云体系及云服务模型研究[J];计算机应用研究;2014年03期

5 冯登国;张敏;张妍;徐震;;云计算安全研究[J];软件学报;2011年01期

6 张雷;扈飞;;软件即服务应用框架中配置的设计与实现(英文)[J];计算机系统应用;2009年06期



本文编号:1759347

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1759347.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户791f2***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com