当前位置:主页 > 科技论文 > 计算机论文 >

面向海量数据的MapReduce本地优先作业调度策略研究与实现

发布时间:2018-01-13 12:29

  本文关键词:面向海量数据的MapReduce本地优先作业调度策略研究与实现 出处:《国防科学技术大学》2012年硕士论文 论文类型:学位论文


  更多相关文章: 负载均衡 数据本地性 MapReduce 云计算


【摘要】:近几十年来,信息网络的技术和规模都不断发展,海量数据应用不断增加,由单个企业搭建的普通计算集群已难以解决不断增长的海量数据给有效管理和高效计算带来的挑战,因此工业界提出将计算推至云端的思想,即云计算当前,云计算的概念已经被企业和科研机构所广泛接受,,并且在可靠性可用性等方面取得了很多成果 在这些成果中,MapReduce是海量数据分布式计算中具有重要意义的解决方案之一,它的核心功能已在Hadoop分布式计算系统中得到实现Hadoop的开源特性,使得其成为研究MapReduce分布式计算的重要基础平台本文的工作即基于此平台 MapReduce分布式计算模型中的作业调度问题对系统的性能可靠性等方面具有重要的影响本文针对多作业情况下现有的作业调度算法的数据本地性差的问题,提出了一种基于本地优先的作业调度算法该方法通过新的思路解决数据本地性和系统负载均衡性相冲突的问题,在保证数据本地性的同时,通过作业级别的调度优化系统的负载均衡性能,降低了计算过程中的IO开销,从而增加系统的吞吐率和减少单个作业的执行时间 本文在以HDFS为分布式存储系统的MapReduce编程模型中设计实现了基于本地优先的作业调度算法,并且在仿真环境中进行了实验验证实验结果显示,在完全实现数据本地性的机制下,系统的吞吐率得到有效提升的同时,单个作业的平均执行时间也大大减少
[Abstract]:In recent decades, information network technology and scale development, the increasing use of massive data, common computing cluster from single enterprise has been difficult to solve massive data growing brings to the effective management and efficient computing challenges, so the industry will push to put forward the idea of cloud computing, cloud computing is the current. The concept of cloud computing has been widely accepted by enterprises and research institutions, and made a lot of achievements in reliability, usability etc.
In these results, MapReduce is one of the solutions is of great significance for massive data in distributed computing, its core function has been calculated in Hadoop distributed implementation of Hadoop open source system, making it become the important work platform in MapReduce distributed computing is based on this platform
This paper has the important effect of MapReduce distributed computing scheduling problem in the model performance of the reliability of the system and other aspects of the existing scheduling algorithms work in case of data locality difference problem, put forward a new idea by scheduling algorithm based on local priority based on the solution of data locality and load system the balance of conflict problems, while ensuring the data locality, through load balancing performance scheduling optimization system operation level, reduces the calculation of the IO overhead, thereby increasing system throughput and reduce the execution time of a single job
The design and implementation of scheduling algorithm based on local priority based on the HDFS MapReduce programming model for distributed storage system, and verified the experimental results shown in the simulation environment, in the full realization mechanism of data locality, and effectively improve the system throughput, the average execution time of single job is greatly reduced

【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333;TP311.13

【参考文献】

相关期刊论文 前4条

1 李明;胥光辉;戢瑶;;MapReduce编程模型在网络I/O密集型程序中的应用研究[J];计算机应用研究;2011年09期

2 杜建成,黄皓,陈道蓄,谢立;基于最佳并行度的任务依赖图调度[J];软件学报;1999年10期

3 陈康;郑纬民;;云计算:系统实例与研究现状[J];软件学报;2009年05期

4 林子雨;赖永炫;林琛;谢怡;邹权;;云数据库研究[J];软件学报;2012年05期

相关博士学位论文 前2条

1 方雷;基于云计算的土地资源服务高效处理平台关键技术探索与研究[D];浙江大学;2011年

2 陈海波;云计算平台可信性增强技术的研究[D];复旦大学;2008年



本文编号:1418846

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1418846.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户9ea37***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com