海量网络流量分析平台的作业调度及优化
发布时间:2018-10-25 15:34
【摘要】:近年来,随着网络传输技术的进步与链路传输带宽的提升,网络流量激增,海量的网络流量数据给网络流量分析平台带来了许多存储和计算方面的问题。Hadoop凭借其良好的容错性能,简单的并发编程模型已经逐渐成为大数据处理平台的首选,它也被广泛的应用在海量网络流量分析应用中。面对日益增长的流量数据,简单的对Hadoop集群进行升级扩容不仅会耗费大量的人力物力,而且可能不会带来集群性能的线性提升。因此,海量网络流量分析平台的作业调度及优化工作就显得尤为重要。本文首先对基于Hadoop的海量网络流量分析平台的体系架构进行了介绍。然后,通过与其他作业调度方法的对比,阐述了选用Oozie作为流量分析平台的作业调度工具的原因,并展示了使用Oozie进行作业调度的方法。接下来,在对网络流量分析平台典型Hadoop作业类型进行总结的基础上,针对不同类型的作业分别提出了不同的优化方案,并对优化的效果进行了逐一的验证。最后,本文研究了采样方法在网络流量分析作业中的应用。首先探究了采样方法造成的流量分析过程中的相对误差的影响因素,而后针对特定应用场景提出了优化的采样策略,同时还指出了采样方法在网络流量分析应用中的局限性。
[Abstract]:In recent years, with the progress of network transmission technology and the improvement of link transmission bandwidth, network traffic has increased dramatically. The massive network traffic data brings many problems in storage and computation to the network traffic analysis platform. With its good fault-tolerant performance, Hadoop has gradually become the first choice of big data processing platform because of its simple concurrent programming model. It is also widely used in mass network traffic analysis applications. In the face of increasing traffic data, simply upgrading and expanding the Hadoop cluster will not only consume a lot of manpower and material resources, but also may not lead to the linear improvement of cluster performance. Therefore, the task scheduling and optimization of mass network traffic analysis platform is particularly important. Firstly, the architecture of mass network traffic analysis platform based on Hadoop is introduced in this paper. Then, by comparing with other job scheduling methods, this paper expounds the reason why Oozie is chosen as the job scheduling tool of traffic analysis platform, and shows the method of job scheduling using Oozie. Then, on the basis of summarizing the typical Hadoop job types of network traffic analysis platform, different optimization schemes are proposed for different types of jobs, and the results of optimization are verified one by one. Finally, this paper studies the application of sampling method in network traffic analysis. This paper first explores the influence factors of the relative error in the flow analysis process caused by the sampling method, then puts forward an optimized sampling strategy for specific application scenarios, and also points out the limitations of the sampling method in the application of network traffic analysis.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.0
[Abstract]:In recent years, with the progress of network transmission technology and the improvement of link transmission bandwidth, network traffic has increased dramatically. The massive network traffic data brings many problems in storage and computation to the network traffic analysis platform. With its good fault-tolerant performance, Hadoop has gradually become the first choice of big data processing platform because of its simple concurrent programming model. It is also widely used in mass network traffic analysis applications. In the face of increasing traffic data, simply upgrading and expanding the Hadoop cluster will not only consume a lot of manpower and material resources, but also may not lead to the linear improvement of cluster performance. Therefore, the task scheduling and optimization of mass network traffic analysis platform is particularly important. Firstly, the architecture of mass network traffic analysis platform based on Hadoop is introduced in this paper. Then, by comparing with other job scheduling methods, this paper expounds the reason why Oozie is chosen as the job scheduling tool of traffic analysis platform, and shows the method of job scheduling using Oozie. Then, on the basis of summarizing the typical Hadoop job types of network traffic analysis platform, different optimization schemes are proposed for different types of jobs, and the results of optimization are verified one by one. Finally, this paper studies the application of sampling method in network traffic analysis. This paper first explores the influence factors of the relative error in the flow analysis process caused by the sampling method, then puts forward an optimized sampling strategy for specific application scenarios, and also points out the limitations of the sampling method in the application of network traffic analysis.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.0
【相似文献】
相关期刊论文 前10条
1 王宇;;网络流量分析技术及其应用[J];科技创业月刊;2010年03期
2 江萍萍;;网络流量分析系统的设计研究[J];科技风;2012年19期
3 黄天戍,邹俊峰,李俊娥,陈萍,刘s,
本文编号:2294116
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2294116.html