Storm平台下作业调度方法研究
发布时间:2018-05-19 02:31
本文选题:流式计算 + Storm ; 参考:《南京邮电大学》2017年硕士论文
【摘要】:随着互联网业务数据规模的急剧增加,人们处理和使用数据的模式已经发生了翻天覆地的变化,为了满足人们对数据实时处理能力越来越高的需求,流式计算应运而生。Storm平台是一个对实时数据进行处理的开源平台,它能够快速可靠地处理流式数据,满足人们越来越迫切的需求。Storm平台的调度器是Storm平台的核心技术之一,对Storm集群的性能和资源利用有着直接的影响。因此对Storm平台的调度器进行研究和改进对Storm平台的发展有着重大的意义。论文主要工作如下:首先,介绍了流式计算和Storm平台的相关知识及国内外研究现状,重点研究了分布式开源流式平台Storm的整体架构和作业处理相关核心技术。然后对Storm平台提供的调度器(默认调度器、均衡调度器、隔离调度器)分别进行深度分析;不仅通过一个示例分析提交到Storm平台的同一作业在调用三种不同的调度器时任务分配的表现,总结出各个调度器的特点、适用场景和存在的问题;还着设置Storm任务调度性能的评估指标,通过实验对Storm默认调度器的任务分配存在的问题进行分析。接着,针对默认调度器在分配任务时不仅会忽略了节点间和进程间通信,还忽略了作业的结构及任务对资源的实际需求和Storm集群工作节点的资源状态等问题,提出了基于蚁群算法的资源感知任务调度算法(RSBA),该算法在调度的过程中将工作节点的资源动态变化表示为蚂蚁运动所需的信息素,觅食的蚂蚁身上带着自身资源需求标签,任务调度过程类似蚂蚁觅食过程,对原有Storm默认调度策略进行改进与优化。最后,对基于蚁群算法的资源感知任务调度算法(RSBA)进行实验验证。实验结果表明该算法学习能力强,可找到与当前任务所需资源最匹配的节点,达到了合理分配资源的目的。相对于Storm平台的默认调度算法,RSBA算法不仅可以提高任务调度的效率、有效减少作业平均处理时间、提高Storm集群的吞吐量,而且有利于集群的负载均衡、可优化Storm集群性能。
[Abstract]:With the rapid increase in the scale of Internet business data, the mode of data processing and using has changed dramatically. In order to meet the increasing demand for real-time data processing capacity, Streaming computing is an open source platform for real-time data processing. It can process streaming data quickly and reliably. The scheduler of the platform. Storm is one of the core technologies of Storm platform. It has a direct impact on the performance and resource utilization of Storm clusters. Therefore, it is of great significance to research and improve the scheduler of Storm platform for the development of Storm platform. The main work of this paper is as follows: firstly, the related knowledge of flow computing and Storm platform and the current research situation at home and abroad are introduced, and the whole architecture and core technology of job processing of distributed open source streaming platform Storm are studied emphatically. Then the scheduler (default scheduler, equalization scheduler, isolated scheduler) provided by Storm platform is analyzed in depth. Not only through an example analysis of the same job submitted to the Storm platform in the call of three different scheduler task assignment performance, summed up the characteristics of each scheduler, applicable scenarios and existing problems; The evaluation index of Storm task scheduling performance is also set, and the problems existing in the task allocation of Storm default scheduler are analyzed through experiments. Then, the default scheduler not only ignores the communication between nodes and processes, but also ignores the structure of the job, the actual resource requirement of the task and the resource status of the Storm cluster work node. A resource aware task scheduling algorithm based on ant colony algorithm (ant colony algorithm) is proposed in this paper. In the process of scheduling, the resource dynamic change of the working node is expressed as the pheromone needed by the ant movement, and the foraging ant has its own resource requirement label. The task scheduling process is similar to the ant foraging process. The original Storm default scheduling strategy is improved and optimized. Finally, the resource aware task scheduling algorithm based on ant colony algorithm (RSBA) is tested. The experimental results show that the algorithm has strong learning ability and can find the node that is the best match to the resource needed by the current task, and achieve the purpose of allocating resources reasonably. Compared with the default scheduling algorithm of Storm platform, it can not only improve the efficiency of task scheduling, reduce the average processing time of jobs, improve the throughput of Storm cluster, but also help to balance the load of Storm cluster and optimize the performance of Storm cluster.
【学位授予单位】:南京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18;TP301.6
【参考文献】
相关期刊论文 前9条
1 熊安萍;王贤稳;邹洋;;基于Storm拓扑结构热边的调度算法[J];计算机工程;2017年01期
2 王润华;毋建军;侯佳路;;分布式实时计算引擎——Storm研究[J];中国科技信息;2015年06期
3 覃雄派;王会举;杜小勇;王珊;;大数据分析——RDBMS与MapReduce的竞争与共生[J];软件学报;2012年01期
4 贺晓丽;;一种用于任务调度的广义遗传算法[J];计算机工程;2010年17期
5 许昌;常会友;徐俊;衣杨;;一种新的融合分布估计的蚁群优化算法[J];计算机科学;2010年02期
6 黄亚平;熊婧;;基于改进蚁群算法作业车间调度问题仿真研究[J];计算机仿真;2009年08期
7 郑勇明;彭凤梅;陈越;;分布式数据库查询优化处理——基于关系代数等价变换的查询优化处理[J];电脑知识与技术(学术交流);2007年04期
8 高尚;解旅行商问题的混沌蚁群算法[J];系统工程理论与实践;2005年09期
9 段国林,查建中,徐安平,张满囤;启发式算法及其在工程中的应用[J];机械设计;2000年06期
,本文编号:1908336
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1908336.html