云计算中对于MapReduce调度机制的研究与改进

发布时间：2018-06-22 03:41

本文选题：云计算 + MapReduce　；参考：《吉林大学》2013年硕士论文

【摘要】：云计算自2006年提出以来，改变了人们对于网络的概念，目前广泛应用于互联网各个方面。目前各个IT行业领导者们都在部署和研究云计算技术，把云计算技术应用于核心服务中。云计算是新一代互联网技术的研究热点。因此对于云计算技术的研究有着重要的实际意义。云计算技术的关键之一是由Google提出来的MapReduce并行数据编程模型框架。Hadoop平台是对于Google公司的MapReduce的一种开源模仿，也是世界上应用最为广泛的一种开源云计算平台。因此对于Hadoop平台下的MapReduce模型的研究和优化有着重要的意义。本文首先介绍云计算的概念和背景知识，然后介绍并分析Hadoop平台的关键技术。研究Hadoop平台中的MapReduce模型的机制之后，针对存在的不足之处，提出了一种改进的方案，命名为动态自适应调度算法（Adaptive CapacityAlgorithm Based on Priority，以下简称ACBP）。本文算法在运行中按照实际运行情况动态改变设置的执行作业数量，实现自适应的系统任务调度机制，而且本文对Hadoop默认的推测机制进行了研究，针对推测机制的不足之处进行改进，使得判别落后的任务更为准侧，避免不必要的系统计算资源的消耗。启动备份任务的时候，对节点的分配上，，有必要考虑节点的系统负载情况，考虑剩余计算能力，来合理分配备份任务运行节点，避免启动无效备份任务，从而有效提高系统整本文最后通过实验验证本算法在预期条件下的算法性能，和先进先出调度算法和公平调度算法和能力调度算法经过对比，得出实验结果。实验结果表明本文的算法比先进先出调度算法有着更为良好的性能，但是相比公平调度算法和能力调度算法，存在着一定的局限性。但是在特定环境中有着预期的的表现。实现了实验的预期目的。
[Abstract]:Cloud computing has changed the concept of network since it was put forward in 2006, and has been widely used in all aspects of the Internet. At present, various IT industry leaders are deploying and researching cloud computing technology and applying cloud computing technology to core services. Cloud computing is the research hotspot of the new generation Internet technology. Therefore, the research of cloud computing technology has important practical significance. One of the key technologies of cloud computing is that the MapReduce parallel data programming model framework. Hadoop platform, which is proposed by Google, is an open source imitation of MapReduce made by Google, and it is also the most widely used open source cloud computing platform in the world. Therefore, it is of great significance for the research and optimization of MapReduce model based on Hadoop platform. This paper first introduces the concept and background of cloud computing, then introduces and analyzes the key technologies of Hadoop platform. After studying the mechanism of MapReduce model in Hadoop platform, an improved scheme named Adaptive capacity algorithm based on Priorityis proposed. In this paper, according to the actual running situation, the algorithm dynamically changes the number of execution jobs and realizes the adaptive system task scheduling mechanism. Furthermore, the default mechanism of Hadoop is studied in this paper. Aiming at the inadequacies of the speculate mechanism, it can make the judgment of backward tasks more accurate and avoid the unnecessary consumption of system computing resources. When the backup task is started, it is necessary to consider the system load of the node and the residual computing ability to distribute the backup task running node reasonably, to avoid starting the invalid backup task. Finally, the performance of the proposed algorithm is verified by experiments, and compared with the first-in-first-out scheduling algorithm, fair scheduling algorithm and ability scheduling algorithm, the experimental results are obtained. The experimental results show that the proposed algorithm has better performance than the first-in-first-out scheduling algorithm, but it has some limitations compared with the fair scheduling algorithm and the ability scheduling algorithm. But in a particular environment there is expected performance. The expected purpose of the experiment has been achieved.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP338

【参考文献】