若干MapReduce排序问题的算法研究
发布时间:2019-07-07 20:59
【摘要】:MapReduce是Google提出的一种编程模型,一个处理和生成大数据集的相关实现。本文主要研究了Map Reduce平行机排序问题,对极小化最大完工时间(makespan),论文研究了m台同类机离线和两台机在线排序模型。对极大化最小完工时间,文中研究了同型机离线排序模型。全文共分五章。第一章简要地介绍了排序问题的基本知识以及MapReduce的相关知识。第二章主要研究m台同类机离线MapReduce排序问题,在这里map任务可以平行加工然而reduce任务不能平行加工。目标是极小化最大完工时间,分别考虑reduce任务可中断和reduce任务不可中断两种情况。对可中断的reduce任务,我们设计了一个近似比为2-(?)的算法,其中s1≥s2≥…≥2sm,是机器σi的加工速度;对不可中断的reduce任务,给出了一个近似比为2+(?)的算法。第三章主要研究两台同类机在线MapReduce排序问题,目标是极小化最大完工时间。首先对reduce任务可中断和reduce任务不可中断两种情况,分别给出了问题的下界。对可中断的reduce任务,给出了竞争比为(?)的最优算法,其中s是两台机器的速度之比;对不中断的reduce任务,我们证明类似LS算法是最优的且它的竞争比为(?)和(?)。第四章主要研究离线MapReduce同型机排序问题,目标是极大化最小完工时间。对可中断的reduce任务,给出了3台机的最优算法;对不可中断的reduce任务,考虑2台机、3台机、任意m台机的情况,分别给出了最坏情况界为4/3、3/2、m的近似算法且都是紧的。最后,我们对相关问题的后续研究作了讨论。第五章总结全文并提出相关问题进一步的研究方向。
[Abstract]:MapReduce is a programming model proposed by Google, a related implementation of processing and generating large data sets. In this paper, the scheduling problem of Map Reduce parallel machines is studied, and the offline and two online sorting models of m similar machines are studied for minimizing the maximum completion time (makespan),. For maximizing the minimum completion time, the offline sorting model of the same machine is studied in this paper. The full text is divided into five chapters. The first chapter briefly introduces the basic knowledge of scheduling problem and the related knowledge of MapReduce. In the second chapter, we mainly study the off-line MapReduce scheduling problem of m similar machines, where map tasks can be processed in parallel, but reduce tasks can not be processed in parallel. The goal is to minimize the maximum completion time, considering the interruptible reduce task and the uninterruptible reduce task respectively. For interruptible reduce tasks, we design an approximate ratio of 2-(?) Where S1 鈮,
本文编号:2511438
[Abstract]:MapReduce is a programming model proposed by Google, a related implementation of processing and generating large data sets. In this paper, the scheduling problem of Map Reduce parallel machines is studied, and the offline and two online sorting models of m similar machines are studied for minimizing the maximum completion time (makespan),. For maximizing the minimum completion time, the offline sorting model of the same machine is studied in this paper. The full text is divided into five chapters. The first chapter briefly introduces the basic knowledge of scheduling problem and the related knowledge of MapReduce. In the second chapter, we mainly study the off-line MapReduce scheduling problem of m similar machines, where map tasks can be processed in parallel, but reduce tasks can not be processed in parallel. The goal is to minimize the maximum completion time, considering the interruptible reduce task and the uninterruptible reduce task respectively. For interruptible reduce tasks, we design an approximate ratio of 2-(?) Where S1 鈮,
本文编号:2511438
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/2511438.html