异构MapReduce集群的网络与调度优化
发布时间:2018-03-30 17:30
本文选题:MapReduce 切入点:OpenFlow 出处:《上海交通大学》2014年硕士论文
【摘要】:因为MapReduce对于处理大规模数据有着很好的可扩展性,所以MapRe-duce成为了云计算中非常流行的一个编程模型。但是,MapReduce在异构集群上的表现并不好。出现这种情况的原因是Hadoop的MapReduce的负载均衡机制——备份任务会造成过量的网络流量,与Shufe争夺带宽。本课题基于OpenFlow协议提出了一个称为OFScheduler+的动态异构MapReduce集群优化方案,可以减少带宽争夺情况。优化方案主要致力于减少带宽竞争,,增加链路负载的平衡性和带宽利用率,同时对于MapReduce任务调度算法的任务分配算法进行了改进,使得任务分配的时代考虑了网络的因素。OFScheduler+包括下面的4个部分: (1)一个可以标记不同流量类型的标记机制,利用对IP头部的ToS的值进行修改的方法标记了不同类型的流量 (2)一个针对MapReduce基层网络特征进行特殊优化的动态流调度算法,可以提高集群的网络利用率 (3)一个流速控制机制,可以根据集群中当前的网络状态,事实上开启或者关闭MapReduce的负载平衡机制 (4) JobTracker通过查询OpenFlow的控制器得到当前网络的状态,并将网络因素融入了MapReduce调度算法的任务分配方案中 为了对本课题提出的优化方案的效果进行评估,我们实现了一个MapRe-duce模拟器,以及一个真实的OpenFlow的testbed。模拟结果说明,在一个多路径拓扑的异构集群中,OFScheduler+可以提高链路的带宽利用率,对于大多数MapReduce作业,可以提高26-64%的性能,尤其是对于数据密集型的作业有更好的效果。在testbed上的实验结果说明,OFScheduler+可以部署于真实环境,并取得良好的效果。
[Abstract]:Because MapReduce is extensible for dealing with large scale data, So MapRe-duce has become a very popular programming model in cloud computing. But MapReduce doesn't perform well on heterogeneous clusters. The reason for this is that Hadoop's MapReduce load balancing mechanism, the backup task, can cause excessive network traffic. This paper presents a dynamic heterogeneous MapReduce cluster optimization scheme called OFScheduler based on OpenFlow protocol, which can reduce bandwidth contention. The optimization scheme is mainly devoted to reducing bandwidth competition. Increase the balance of link load and bandwidth utilization, and improve the task allocation algorithm of the MapReduce task scheduling algorithm, so that the era of task allocation takes into account the network factors.!!! Scheduler includes the following four parts:. A tagging mechanism that can mark different traffic types, using the method of modifying the ToS value of the IP header, to mark different types of traffic. A special optimized dynamic flow scheduling algorithm based on the characteristics of MapReduce grass-roots network can improve the network utilization of cluster. A flow rate control mechanism that can in fact turn on or off the load balancing mechanism of MapReduce based on the current network state in the cluster. JobTracker gets the status of the current network by querying the controller of the OpenFlow, and integrates the network factors into the task allocation scheme of the MapReduce scheduling algorithm. In order to evaluate the effectiveness of the optimization scheme proposed in this paper, we have implemented a MapRe-duce simulator and a real OpenFlow testbed. the simulation results show that, In a heterogeneous cluster with a multipath topology, the OF Scheduler can improve the bandwidth utilization of the link, and for most MapReduce jobs, it can improve the performance by 26-64%. The experimental results on testbed show that the OF Scheduler can be deployed in real environment and achieve good results.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP311.13;TP393.01
【参考文献】
相关期刊论文 前1条
1 梁建武;周杨;;一种异构环境下的Hadoop调度算法[J];中国科技论文;2012年07期
本文编号:1686902
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1686902.html