基于云计算平台的资源调度关键技术研究

发布时间：2018-05-18 08:42

本文选题：可靠性评估 + 任务调度　；参考：《北京邮电大学》2014年硕士论文

【摘要】：随着人们对计算能力需求的逐渐增大,高性能计算技术的发展催生出云计算概念。作为新型的服务与计算模型,云计算的出现使得计算能力和存储资源可以按需获取,用户服务与应用通过网络共享底层资源。与传统计算模式不同,云计算通过虚拟化技术实现对硬件资源的虚拟化管理、调度与应用,软件／硬件组件异构以及它们之间复杂的相互关系增加了底层资源调度的难度,使得资源调度成为云计算研究的重要方面。云计算中的资源调度主要包括数据资源的可靠存储以及用户任务的有效分配两方面。在可靠性方面,当前云计算技术大多通过数据复制实现在廉价服务器集群上构建稳定可靠的分布式文件系统,如何对云服务可靠性进行评估成为云计算可靠性分析的重要方面。在有效性方面,对任务进行合理的分配,可以有效提高系统整体的性能。为此,本文对云计算的资源调度进行了深入分析,并在此基础上实现理论创新和实验验证。本文的创新点如下： (1)本文在传统可靠性分析方法的基础上提出两种低复杂度的可靠性评估算法,分别对故障相互独立和故障相互关联两种情况下的云服务可靠性进行评估,前者通过边界方法对计算方法进行简化,在保证可靠性精度的前提下降低了计算的难度,后者利用贝叶斯网络和马尔科夫理论对故障关联性进行模拟,并提出一种简单的可靠性计算算法。 (2)本文提出基于混沌蚁群算法的云计算任务调度策略,以便解决异构环境下云计算任务的分配。为了保证用户服务质量,我们对任务的完成时间、可靠性等诸多方面进行了分析,并在此基础上构建了带有约束条件的多目标调度模型。本文利用混沌蚁群算法对云计算调度问题进行求解,实验结果表明混沌蚁群算法可以有效提高用户的服务质量,且性能优于其他群体智能算法。 (3) Hadoop作为当前云计算中的主流技术,构建在分布式文件系统HDFS上,采用MapReduce编程模型处理任务。为了具体了解云计算的运转模式和任务调度流程,本文对Hadoop中的关键技术进行了深入的剖析,尤其是对常用的容量调度器、公平调度器等调度机制进行了分析和阐述。 (4)实验室搭建了基于HP服务器的Hadoop云计算平台,并在平台上对联通用户上网数据开展分析和统计工作,完成用户分类、流量预测、流向分析及网页关键词提取等功能,从而对Hadoop处理流程和任务调度机制有了充分的认识。 (5)本文提出了基于资源感知的Hadoop任务调度机制,它通过对底层资源的监测获取节点的资源使用状况,从而为任务的调度提供参考。另外,在作业的调度方面,本文提出基于剩余时间预测的作业选择策略,通过对作业剩余运行时间的估算对作业进行排序,并优先调度剩余时间较短的作业,可以在一定程度上增加系统数据处理的时效性。
[Abstract]:With the increasing demand for computing power, the development of high-performance computing technology spawned the concept of cloud computing. As a new service and computing model, cloud computing makes computing power and storage resources available on demand, and user services and applications share the underlying resources through the network. Different from the traditional computing mode, cloud computing implements virtualization management, scheduling and application of hardware resources, heterogeneous software / hardware components and complex relationships between them through virtualization technology, which increases the difficulty of resource scheduling. Resource scheduling has become an important aspect of cloud computing research. Resource scheduling in cloud computing mainly includes two aspects: reliable storage of data resources and efficient assignment of user tasks. In terms of reliability, most of cloud computing technologies build stable and reliable distributed file systems on cheap server clusters through data replication. How to evaluate the reliability of cloud services becomes an important aspect of cloud computing reliability analysis. In terms of effectiveness, a reasonable assignment of tasks can effectively improve the overall performance of the system. In this paper, the resource scheduling of cloud computing is deeply analyzed, and theoretical innovation and experimental verification are realized on this basis. The innovations of this paper are as follows: In this paper, based on the traditional reliability analysis methods, two low complexity reliability evaluation algorithms are proposed to evaluate the cloud service reliability under the condition of fault mutual independence and fault correlation respectively. The former simplifies the calculation method by boundary method and reduces the difficulty of calculation on the premise of ensuring reliability accuracy. The latter uses Bayesian network and Markov theory to simulate the fault correlation. A simple reliability calculation algorithm is proposed. This paper proposes a cloud computing task scheduling strategy based on chaotic ant colony algorithm to solve the problem of cloud computing task allocation in heterogeneous environment. In order to ensure the quality of service (QoS) of users, we analyze the completion time and reliability of the task, and build a multi-objective scheduling model with constraints. In this paper, chaotic ant colony algorithm is used to solve cloud computing scheduling problem. Experimental results show that chaotic ant colony algorithm can effectively improve the quality of service of users, and its performance is better than other swarm intelligence algorithms. As the mainstream technology of cloud computing, Hadoop is built on the distributed file system (HDFS), and uses MapReduce programming model to deal with the task. In order to understand the operation mode and task scheduling process of cloud computing, this paper analyzes the key technologies in Hadoop, especially the common scheduling mechanisms such as capacity scheduler and fair scheduler. The Hadoop cloud computing platform based on HP server has been set up in the laboratory. On the platform, the data of Unicom users are analyzed and counted, and the functions of user classification, traffic prediction, flow analysis and page keyword extraction are completed. Therefore, the Hadoop processing flow and task scheduling mechanism are fully understood. In this paper, a resource-aware Hadoop task scheduling mechanism is proposed, which obtains the resource usage of the node by monitoring the underlying resources, thus providing a reference for task scheduling. In addition, in the aspect of job scheduling, this paper proposes a job selection strategy based on the prediction of residual time, and gives priority to scheduling jobs with shorter residual time by estimating the remaining running time of jobs. It can increase the timeliness of system data processing to a certain extent.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.09;TP18

【参考文献】