Storm环境下基于资源感知的任务调度研究

发布时间:2018-07-27 15:58
【摘要】:随着大数据应用程序中数据创建速度的不断提高,需要及时实时处理大量的数据,Apache Storm是一个流处理系统,具有实时、分布式、可扩展和高可靠的数据处理优势,在学术界和产业界备受关注。在一个复杂的流事件处理引擎中,数据是必须被快速分析处理的事件实时流,这种形式主要用于大数据中,不断产生的数据流被加工使用和处理结果为进一步生成新事件数据流做准备。为了评估资源分配策略是否成功,三个性能指标用来检查其在资源调度时资源波动的适应性,这些性能指标包括处理延迟,资源吞吐量和用户满意度。执行调度相关的元件,被定义为基本计算组件,聚合到单个topology结构中执行。不同到达率的实时数据流以及不断变化的操作条件对数据处理提出了新挑战,因此,提高调度效率成为本文解决的主要问题,也成为在活跃的物理节点间查找Strom优化布置的关键环节。然而,像许多其他大数据处理系统一样,Storm没有智能调度机制。目前在Storm中默认循环调度机制没有充分考虑资源需求和可用性,导致了资源不能被充分使用或过度利用。设计出可以应对输入数据流突然波动的弹性解决方案是最近热门的研究领域。传统的调度方案在很大程度上依赖于一组性能指标的测量,通过将其与另一组预定阈值进行比较来做出适当的调度。这种方案缺乏对可用资源量的实时变化的适应性。本文提出了一个用于Storm框架的基于CPU、内存、网络带宽的资源自适应调度器,能更有效地分配资源并提高性能,并且考虑了Storm的任务间的数据传输速率和负载均衡,将高度通信的任务对分配给同一组计算节点。同Storm提供的默认调度相比,本文的调度算法具有显著的改进,它将整个任务分布在集群中,感知CPU、内存、网络带宽的变化来进行任务调度。通过分析Storm默认任务调度策略的特点和性能,本文设计并实现了基于Storm资源感知的流数据处理系统。与默认的Storm调度相比,改进后的Storm调度具有以下理想特征:(1)基于运行时状态,通过有效的资源感知调度来动态地分配或重新分配任务来加速数据处理,从而最小化节点间和进程间资源开销的同时确保没有工作节点过载;(2)它能够对工作节点进行资源整合,从而进行细粒度的控制,使改进后Storm能够以更少的工作节点实现更好的性能;(3)它允许调度算法通过代码实现模块化管理,也允许调度参数的调整;(4)它对Storm用户是透明的,Storm应用程序可以被移植到改进后Strom调度的平台上。本文在SOL、RollingSort和WordCount这三种Benchmark流数据处理应用程序的基础上添加感知CPU、内存、网络带宽的监控程序代码,将监控信息存入数据库中,调度器根据改进后的算法程序从数据库中获取数据并替换默认的调度策略,自动生成对topology节点的吞吐量和节点间的时间延迟的统计表以进行性能评估。多次的实验结果表明,与Storm默认调度程序相比,改进后的Storm在SOL、RollingSort和WordCount上的性能更优。
[Abstract]:With the increasing speed of data creation in large data applications, a lot of data need to be processed in time. Apache Storm is a flow processing system. It has the advantages of real-time, distributed, scalable and high reliable data processing. It is paid much attention in the academia and industry. In a complex flow event processing engine, data is necessary. The event real-time flow that must be quickly analyzed and processed is mainly used in large data, and the generated data streams are processed and processed to prepare for the further generation of new event data streams. In order to assess whether the resource allocation strategy is successful, three performance metrics are used to check the adaptability of the resource volatility during resource scheduling. These performance metrics include processing latency, resource throughput, and user satisfaction. Executing scheduling related components are defined as basic computing components, aggregated into a single topology structure. Real time data streams with different arrival rates and changing operating conditions pose new challenges to data handling. Therefore, scheduling efficiency is improved. As the main problem solved in this article, it is also the key link to find the optimal Strom arrangement between active physical nodes. However, like many other large data processing systems, Storm has no intelligent scheduling mechanism. At present, the default cyclic scheduling mechanism in Storm does not fully consider the resource requirements and availability, resulting in the failure of the resources to be filled. An elastic solution that can cope with the sudden fluctuation of the input data flow is a recent hot research field. The traditional scheduling scheme, to a large extent, relies on the measurement of a set of performance metrics and makes appropriate scheduling by comparing it with another set of predetermined thresholds. In this paper, a resource adaptive scheduler based on CPU, memory, network bandwidth is proposed for Storm framework, which can allocate resources and improve performance more effectively, and consider the data transmission rate and load balance between tasks of Storm, and assign the task pairs of high communication to the same group. Compared with the default scheduling provided by Storm, the scheduling algorithm in this paper has a significant improvement. It distributes the whole task in the cluster, perceiving the changes of CPU, memory, and network bandwidth to perform task scheduling. By analyzing the characteristics and performance of the Storm default task scheduling strategy, this paper designs and implements a flow based on the Storm resource perception. The data processing system. Compared with the default Storm scheduling, the improved Storm scheduling has the following ideal features: (1) to dynamically allocate or reassign tasks to speed up data processing based on the runtime state, dynamically allocate or reassign tasks through the efficient resource aware scheduling, thus minimizing the inter node and inter process resource overhead while ensuring no working nodes. Overload; (2) it can integrate the resources of the work node to make fine-grained control so that the improved Storm can achieve better performance with fewer work nodes; (3) it allows the scheduling algorithm to implement modularized management through the code and allow the adjustment of the scheduling parameters; (4) it is transparent to the Storm user, and the Storm application can On the platform of the improved Strom scheduling. Based on the three Benchmark stream data processing applications of SOL, RollingSort and WordCount, this article adds the monitoring program code that perceiving CPU, memory, network bandwidth, storing the monitoring information into the database, and the scheduler obtains data from the database based on the improved algorithm program and Instead of the default scheduling policy, a statistical table of throughput and time delay between the topology nodes is automatically generated for performance evaluation. Several experimental results show that the improved Storm is better than the Storm default scheduler on SOL, RollingSort and WordCount.
【学位授予单位】:新疆大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP301.6

【参考文献】

相关期刊论文 前7条

1 陈伯雄;艾中良;;差异化作业调度在Storm上的实现[J];软件;2017年01期

2 熊安萍;王贤稳;邹洋;;基于Storm拓扑结构热边的调度算法[J];计算机工程;2017年01期

3 黄容;王贤稳;;基于Storm slot使用率低优先的动态负载均衡策略[J];电脑知识与技术;2016年36期

4 杨秋吉;于俊清;莫斌生;何云峰;;面向Storm的数据流编程模型与编译优化方法研究[J];计算机工程与科学;2016年12期

5 孙大为;;大数据流式计算:应用特征和技术挑战[J];大数据;2015年03期

6 孙大为;张广艳;郑纬民;;大数据流式计算:关键技术及系统实例[J];软件学报;2014年04期

7 孟小峰;慈祥;;大数据管理:概念、技术与挑战[J];计算机研究与发展;2013年01期

相关硕士学位论文 前3条

1 谈杰;基于storm的实时物流数据查询系统设计与实现[D];南京邮电大学;2016年

2 李萍;基于SLA感知的Hadoop YARN节能调度策略研究[D];山东大学;2016年

3 王冬;基于Storm的铁道供电监控信息实时流计算处理研究[D];华东交通大学;2016年



本文编号:2148344

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/2148344.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户dde4f***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com