高速网络流量环境中分布式大数据处理模式的性能研究

发布时间：2018-05-04 12:38

本文选题：大数据 + 分布式计算　；参考：《北京邮电大学》2016年硕士论文

【摘要】：随着互联网与通信技术的飞速发展,网络与人们的生活紧密相关。这些丰富和便捷着人们的互联网应用,也无时无刻不在产生着大量用户数据的网络流量,这些流量中包含着非常有价值的行为信息。如何能够在高速网络流量环境中,对这些数据进行高效率分析和处理成为了学术界和工业界关注的焦点。由于目前在分布式大数据处理模式上针对其性能的相关分析和研究还很匮乏和浅显,因此有必要进一步通过仿真建模和数据分析等方法,对分布式大数据处理模式的性能表现进行深入的研究。本文首先介绍了高速网络流量环境的特点,以及在高速网络流量环境下进行海量数据处理所面临的技术挑战。最后对解决大数据问题的相关技术方案进行简要说明。随后,对业界最广泛采用的Hadoop大数据处理模式的技术实现进行了深入分析,其中着重对其性能表现的重要影响因素进行了细致的讨论。接下来,提出了通过基于Petri网对Hadoop进行仿真建模的方法,并实现对Hadoop的性能表现进行预测的仿真工具。并且通过比较其仿真结果与Hadoop在真实环境中的实际测试数据,从而证明Hadoop仿真工具的准确性,高效性和可拓展性。最后,对Spark这种新兴的大数据处理模式的产生原因和设计思想进行了深入分析。以及通过Spark与Hadoop在高速流量网络环境中的实际测试性能数据,对两者的性能表现进行分析。
[Abstract]:With the rapid development of Internet and communication technology, the network is closely related to people's life. These abundant and convenient people's Internet application, also all the time produces the massive user data the network traffic, these traffic contains the very valuable behavior information. How to efficiently analyze and process these data in high-speed network traffic environment has become the focus of academia and industry. Due to the lack of relevant analysis and research on its performance in the distributed big data processing mode, it is necessary to further use simulation modeling and data analysis methods. The performance of distributed big data processing mode is studied deeply. This paper first introduces the characteristics of high-speed network traffic environment and the technical challenges of mass data processing in high-speed network traffic environment. Finally, the technical solution to big data problem is briefly explained. Then, the technical implementation of Hadoop big data processing mode, which is widely used in the industry, is analyzed in depth, and the important influencing factors of its performance are discussed in detail. Then, the simulation modeling method of Hadoop based on Petri net is put forward, and the simulation tool to predict the performance of Hadoop is realized. By comparing the simulation results with the actual test data of Hadoop in real environment, the accuracy, efficiency and expansibility of Hadoop simulation tools are proved. Finally, this paper analyzes the cause and design idea of Spark, a new treatment mode of big data. The performance of Spark and Hadoop in high speed traffic network is analyzed.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13

【参考文献】