当前位置:主页 > 科技论文 > 软件论文 >

基于Hadoop集群的作业调度算法研究与改进

发布时间:2018-03-08 16:40

  本文选题:Hadoop集群 切入点:作业调度 出处:《沈阳工业大学》2017年硕士论文 论文类型:学位论文


【摘要】:随着大数据时代的到来,云计算受到了商业界和各类研究人员的高度关注。Hadoop是Apache组织开发的一种开源的云计算平台。Hadoop平台主要由两部分组成,分别是Hadoop的基本HDFS分布式文件系统和Hadoop的核心MapReduce计算框架。MapReduce计算框架作为Hadoop的核心内容,主要功能是数据处理。而在MapReduce框架中的作业调度技术,在系统中起到分配系统资源的关键性作用。但Hadoop自带的调度算法都存在着不同的缺点,所以研究调度算法的缺点并进行有针对性的改进是有必要的。调度算法的性能是影响系统性能的重要因素,在Hadoop集群环境下,系统性能的主要指标有数据本地性和作业的平均完成时间。本地性调度算法的本质是提高Hadoop集群的数据本地性,减少网络传输开销避免阻塞。为提高数据本地性,本文提出一种本地性调度算法,该算法分别定义了Map任务和Reduce任务的节点选取条件。调度算法对HDFS中分片后的数据进行处理,尽可能使数据在本地节点运行。在本地性调度算法中,Map任务的完成时间不同,启动Early Shuffle机制后Reduce任务存在空闲等待现象,影响作业的平均完成时间,使得作业的完成时间增加。针对上述问题,本文提出一种新的调度策略,它是一种保证数据本地性,集成可抢占式的调度策略。在Reduce任务等待时挂起该任务并释放资源给其他Map任务,当Map任务完成一定程度后重新调度Reduce任务,这样既满足了算法的数据本地性,也降低了作业的平均完成时间。本文最后描述了在Hadoop集群平台下实现新的调度算法,并通过对集成抢占式的本地性调度策略和非集成抢占式的本地性调度策略进行比较,观察性能的变化。通过在集群环境下的实验发现,本文提出的算法在各节点的本地数据平均完成度提高了17%,算法集成抢占调度策略后平均完成时间降低了14.12%,有效优化了数据本地性性能,降低了网络传输,且降低了作业的平均完成时间。
[Abstract]:With the arrival of big data era, cloud computing has been highly concerned by the business community and all kinds of researchers. Hadoop is an open source cloud computing platform. Hadoop platform is mainly composed of two parts. It is the basic HDFS distributed file system of Hadoop and the core MapReduce computing framework of Hadoop. MapReduce computing framework is the core content of Hadoop, whose main function is data processing. It plays a key role in allocating system resources in the system, but the scheduling algorithms that come with Hadoop have different disadvantages. Therefore, it is necessary to study the shortcomings of scheduling algorithm and improve it. The performance of scheduling algorithm is an important factor affecting system performance. The main indicators of system performance are data nativeness and average job completion time. The essence of local scheduling algorithm is to improve the data locality of Hadoop cluster and reduce the network transmission overhead to avoid blocking. In this paper, a local scheduling algorithm is proposed, which defines the node selection conditions of Map task and Reduce task, respectively. The scheduling algorithm processes the segmented data in HDFS. Make the data run in the local node as far as possible. In the local scheduling algorithm, the completion time of the Early task is different. After the Early Shuffle mechanism is started, the Reduce task has the phenomenon of idle waiting, which affects the average completion time of the job. In order to solve the above problems, a new scheduling strategy is proposed, which guarantees the data locality. Integrating preemptive scheduling strategy, suspending the Reduce task while waiting and releasing resources to other Map tasks, rescheduling the Reduce task when the Map task completes to a certain extent, which satisfies the data locality of the algorithm. Finally, this paper describes the implementation of a new scheduling algorithm based on Hadoop cluster platform, and compares the integrated preemptive local scheduling strategy with the non-integrated preemptive local scheduling strategy. Observe changes in performance. Through experiments in a cluster environment, The algorithm proposed in this paper improves the average completion degree of local data by 17% and reduces the average completion time by 14.12 after the algorithm integrates preemptive scheduling strategy, which effectively optimizes the performance of local data and reduces the network transmission. And reduced the average completion time of the work.
【学位授予单位】:沈阳工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP301.6

【参考文献】

相关期刊论文 前7条

1 帅仁俊;沈阳;陈平;潘静;董亚楠;;基于logistic回归模型的Hadoop本地任务调度优化算法[J];计算机应用研究;2017年03期

2 卢慧;高弘博;张丰满;王梅;肖震;;Hadoop云平台下基于资源预估的作业调度算法[J];计算机应用研究;2016年08期

3 燕明磊;;Hadoop集群中作业调度研究[J];软件导刊;2015年04期

4 陶永才;李文洁;石磊;刘磊;卫琳;曹仰杰;;基于负载均衡的Hadoop动态延迟调度机制[J];小型微型计算机系统;2015年03期

5 刘再明;;腾讯云上的开放游戏生态圈——专访腾讯云计算公司总裁陈磊[J];互联网周刊;2014年16期

6 宁文瑜;吴庆波;谭郁松;;面向MapReduce的自适应延迟调度算法[J];计算机工程与科学;2013年03期

7 王凯;吴泉源;杨树强;;一种多用户MapReduce集群的作业调度算法的设计与实现[J];计算机与现代化;2010年10期

相关硕士学位论文 前4条

1 陶昌俊;Hadoop平台的作业调度算法研究与改进[D];中国科学技术大学;2015年

2 徐淑琦;基于MapReduce的高性能云计算任务调度技术的研究[D];北京工业大学;2013年

3 何文峰;基于任务特征与公平策略的Hadoop作业调度算法研究[D];华中科技大学;2013年

4 周俊清;基于Hadoop平台的分布式任务调度算法研究[D];湖南大学;2012年



本文编号:1584725

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1584725.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户92d46***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com