云环境下基于失效感知的工作流调度算法研究

发布时间：2018-05-27 22:11

本文选题：失效感知 + 工作流　；参考：《广西师范大学》2017年硕士论文

【摘要】：近年来,由于云计算的快速发展,大规模的云计算数据中心在全球各地被广泛建立。随着对云计算关注度的提高,云计算的功能(functionality)和复杂度(complexity)被研究人员广泛研究。在云环境中,为了在满足服务质量的前提下,尽可能的降低服务成本,实现用户和服务提供商双方最大化收益问题而引入了“云工作流”。通过将工作流技术与云计算相结合,一方面其可以将原先复杂的应用需求按照业务逻辑进行抽象定义和分解,对任务和资源进行再次整理和灵活配置,从而提高服务质量;另一方面其可以实现任务自动调度、任务监控和资源分配的优化与管理,因此可以大幅提高任务的执行效率,能有效地提高云计算服务的质量,减少执行任务的费用开销。针对云工作流的调度问题,不同的研究人员从完工时间最小化、执行成本最低化、任务完成率最大化等多方面展开研究。虽然当前存在很多云工作流的调度方法,但都没有针对云计算资源失效而建立调度模型,从而有效规避和减少失效事件对云环境下工作流任务调度结果的影响。而在云计算环境中,资源失效是不可避免的。由于资源失效将直接带来系统性能降低、程序执行提前终止甚至数据丢失等问题,最终导致更多的任务不能在截止期内完成、违约率增高,严重影响到云计算的可靠性和稳定性,大大降低了服务质量(Quality of Service,QoS)。同时由于工作流的各个任务之间存在时序约束和数据依赖。因此在工作流执行的过程当中,一旦某一个资源节点出现失效情况,不但导致此任务需要重新执行,有可能整个工作流任务都需要重新执行,严重影响到云计算的效率,浪费大量的计算资源。基于当前云计算环境下失效预测机制的国内、外研究现状和发展趋势,结合云计算调度优化特点,本文首先提出了基于失效感知的工作流调度模型,在调度过程中引入了失效预测机制和任务再调度策略,调度在满足截止期要求的基础上以最大化任务的完成率为目标。在任务调度过程中,为工作流的每一个任务生成子截止时间。根据资源失效预测模型,当所选的资源节点在任务的子截止期前发生失效时,提前把任务迁移到另一个可以顺利完成该任务的节点上,从而有效地规避资源失效对任务执行带来的影响,且在任务迁移过程中尽量把关键路径任务分配到同一个性能较高的虚拟机上以减少任务之间数据传输的开销,缩短完工时间,提高任务的完成率。然后对失效感知的工作流调度模型中的各个模块功能进行了详细说明,在此基础上对失效预测机制、工作流模型和资源模型进行定义,最后对模型进行具体实现,给出了基于失效感知的工作流调度算法(BFGA)。该算法基于遗传算法进行改进,算法中提出了新颖的三元组编码方式,在种群初始化过程中,采用随机生成和使用已经证明是有效的算法相结合的方式生成个体,以达到兼顾种群多样性的目的。同时设计了符合工作流特点的交叉和变异方法,在个体进行交叉变异之后又引入了调整算子对部分结果进行局部微调,以避免陷入局部最优,有效提高了收敛的速度。通过CloudSim云计算仿真平台对提出的模型和算法进行仿真实验,实验借助不同类型工作流应用和改变仿真环境参数的方法进行。通过与GA算法进行对比,验证了算法的有效性。实验证明BFGA算法相对一般的GA算法由于采用三元组的编码方式,初始化种群采用了多种生成个体的方法,丰富了种群个体的多样性,且在种群进化过程中增加了调整算子,使其具有更好的收敛速度。其次,从失效预测准确率、工作流任务数量、失效节点比率三个方面来验证BFGA算法与GA算法、First-fit算法、Pessimistic Best-fit算法以及不考虑失效的普通算法对任务调度的影响。实验证明当失效预测准确率大于50%时,BFGA算法相比其他算法具有较高的作业完成率和不可靠节点利用率。当失效预测准确率为75%,工作流任务数大于600时五种算法的任务完成率均有下降,但是BFGA算法下降较为缓慢且一直高于其他四种算法。通过实验有效地证明BFGA算法能够降低资源失效给工作流任务调度带来的影响,很好的解决了基于失效感知的工作流调度问题。
[Abstract]:In recent years, because of the rapid development of cloud computing, large cloud computing data centers have been widely established all over the world. With increasing attention to cloud computing, functionality and complexity are widely studied by researchers. In the cloud environment, in order to meet the quality of service, as much as possible By combining workflow technology with cloud computing, it can abstract and decompose the original complex application requirements according to the business logic, and re organize and configure the tasks and resources again, from the combination of workflow technology and cloud computing. To improve the quality of service, on the other hand, it can realize automatic task scheduling, task monitoring and resource allocation optimization and management, so it can greatly improve the efficiency of task execution, improve the quality of cloud computing services effectively and reduce the cost of execution tasks. The work time is minimized, the execution cost is minimized, and the task completion rate is maximized. Although there are many scheduling methods of cloud workflow, no scheduling model is established for the failure of cloud computing resources, thus effectively avoiding and reducing the impact of failure events on workflow task scheduling results under the cloud environment. In the cloud computing environment, resource failure is inevitable. Due to the failure of the resources, the performance of the system will be reduced, the execution of the program is terminated in advance or even the loss of the data. Finally, more tasks can not be completed in the deadline and the default rate is higher, which seriously affects the reliability and stability of the cloud computing and greatly reduces the service. Quality of Service (QoS). At the same time, due to the existence of temporal constraints and data dependence among the various tasks of the workflow, in the process of workflow execution, once a resource node fails, it not only causes the task to be re executed, but the whole workflow task needs to be re executed, and it is seriously affected. To the efficiency of cloud computing, a lot of computing resources are wasted. Based on the current situation and development trend of the domestic and external research on the failure prediction mechanism under the current cloud computing environment, combined with the characteristics of cloud computing scheduling optimization, this paper first proposes a workflow scheduling model based on failure aware, and introduces the failure prediction mechanism and task re tuning in the process of adjustment. On the basis of meeting the deadline requirements, scheduling is aimed at maximizing the completion rate of the task. In the task scheduling process, the sub cut-off time is generated for each task of the workflow. According to the resource failure prediction model, when the selected resource node fails before the sub deadline of the task, the task is moved to another one in advance. The task can be successfully completed on the node, thus effectively avoiding the impact of resource failure on task execution, and assigning the key path tasks to the same virtual machine with higher performance in the process of task migration to reduce the overhead of data transmission between tasks, shorten the completion time, and improve the completion rate of the task. The function of each module in the failure aware workflow scheduling model is explained in detail. On this basis, the failure prediction mechanism, the workflow model and the resource model are defined. Finally, the model is realized and the workflow scheduling algorithm based on the failure aware (BFGA) is given. The algorithm is improved based on the genetic algorithm. In the algorithm, a novel method of three tuple coding is proposed. In the process of population initialization, the individual is generated by combining random generation and using proven effective algorithms to achieve the purpose of taking into account the diversity of the population. At the same time, a cross and mutation method which conforms to the characteristics of the workflow is designed and after the individual crosses the mutation. In addition, the adjustment operator is introduced to local fine-tuning of partial results in order to avoid local optimum and improve the speed of convergence effectively. Through the simulation experiment of the proposed model and algorithm through the CloudSim cloud computing simulation platform, the experiment is carried out with the help of different types of workflow applications and changes of the real environment parameters. Through the GA algorithm, the experiment is carried out. The comparison shows the effectiveness of the algorithm. The experiment proves that the BFGA algorithm is relative to the general GA algorithm because of the use of three tuples, initializing the population using a variety of individual generation methods, enriching the diversity of the population, and increasing the adjustment operator in the process of population evolution, so that it has a better convergence rate. Secondly, From three aspects of the accuracy rate of failure prediction, the number of workflow tasks and the ratio of failure nodes, the effect of BFGA algorithm with GA algorithm, First-fit algorithm, Pessimistic Best-fit algorithm and the common algorithm without failure is verified. The experiment proves that when the accuracy rate of failure prediction is greater than 50%, the BFGA algorithm has a comparison with other algorithms. High job completion rate and unreliable node utilization rate. When the accuracy of failure prediction is 75% and the number of workflow tasks is more than 600, the task completion rate of the five algorithms is reduced, but the BFGA algorithm decreases slowly and has been higher than the other four algorithms. The experiment effectively proves that the BFGA algorithm can reduce the failure of resources to workflow The impact of job scheduling can solve the problem of workflow scheduling based on failure aware.
【学位授予单位】：广西师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP301.6

【参考文献】