基于Hadoop的海量网络流量日志处理技术研究与实现
发布时间:2018-09-19 20:28
【摘要】:伴随着网络的高速发展,大数据时代的降临,海量网络流量数据的处理需求也应运而生。为满足海量网络流量数据的处理需求,对网络流量进行有效、深入地分析,实现对网络流量有力监管,需要针对骨干网进行网络流量日志的高效采集然后再对网络流量日志进行高效地分析处理。对网络流量日志的多维度统计分析,可以深入了解网络的运行及使用状况,以调整策略提高网络质量;对网络流量日志的深入挖掘分析,可以发掘用户上网特点及偏好,可以深入了解用户需求,以高效服务提高用户满意度。因此,该课题研究了网络流量日志的处理技术,并最终实现了基于Hadoop的HAMANT海量网络流量日志分析系统(由关键英文单词首字母缩写而成)。 本文首先介绍了课题背景与意义,日志处理技术现状,另概述了与课题相关的一些关键技术,包括大数据、DPI、Hadoop、Hbase、数据挖掘等。随后依据课题需要,结合应用场景对海量网络流量日志处理技术进行了需求及功能分析,给出了HAMANT日志分析系统的整体框架,并给出了其中日志采集、日志预处理、日志存储、日志统计分析、日志挖掘分析、报表展示等模块的详细设计。最后,进行了该系统各项性能测试,并结合对某重点高校骨干网的海量网络流量的处理进行了效果展示,证明了本系统对于海量网络流量日志的处理能够达到较好效果,而且还具有一定可扩展性。 本课题对于网络流量日志技术进行了较为深入地探究,并最终设计出基于Hadoop的HAMANT日志分析系统。该系统对网络流量日志采集加入了DPI协议识别引擎,使网络流量日志采集丰富而高效;日志存储、处理部分采用分布式处理,支持自动备份、容错,克服了传统的日志单机处理计算速度慢、存储空间不足、服务器压力较大的问题;将数据挖掘中的聚类算法进行了分布式实现并加入系统,实现了对于海量网络流量日志的深度分析,能发掘大量网络用户背后所隐藏的上网行为偏好。最后给出了系统性能测试及实际应用实验分析。
[Abstract]:With the rapid development of network and the advent of big data era, massive network traffic data processing demand also came into being. In order to meet the demand of massive network traffic data processing, the network traffic is analyzed effectively and deeply, and the network traffic can be supervised effectively. It is necessary to collect the network traffic log efficiently for the backbone network and then analyze and process the network traffic log efficiently. The multi-dimensional statistical analysis of network traffic log can deeply understand the operation and usage of the network, adjust the strategy to improve the network quality, and the in-depth mining analysis of the network traffic log can discover the characteristics and preferences of users on the Internet. Can deeply understand the user needs, to improve user satisfaction with efficient services. Therefore, this paper studies the processing technology of network traffic log, and finally realizes the HAMANT massive network traffic log analysis system based on Hadoop (abbreviated by the acronym of key words). This paper first introduces the background and significance of the project, the present situation of log processing technology, and summarizes some key technologies related to the subject, including big data's DPI / Hadoop Hbase, data mining and so on. Then according to the need of the project, combined with the application scene, the requirements and functions of the massive network traffic log processing technology are analyzed, and the overall framework of the HAMANT log analysis system is given, and the log collection, log preprocessing and log storage are also given. Log statistics analysis, log mining analysis, report presentation module detailed design. Finally, the performance tests of the system are carried out, and the effect of dealing with the massive network traffic of a key university backbone network is demonstrated, which proves that the system can achieve better results for the processing of the massive network traffic log. And also has certain expansibility. In this paper, the network traffic log technology is deeply explored, and finally a HAMANT log analysis system based on Hadoop is designed. The system adds DPI protocol recognition engine to the collection of network traffic log, which makes the collection of network traffic log rich and efficient, and the part of log storage and processing adopts distributed processing, supports automatic backup, fault-tolerant, and so on. It overcomes the problems of slow processing speed, insufficient storage space and high pressure of server in traditional log processing, and implements the clustering algorithm in data mining distributed and joins the system. The deep analysis of massive network traffic log is realized, which can discover the hidden behavior preference behind a large number of network users. Finally, the system performance test and practical application experiment analysis are given.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.06;TP311.13
[Abstract]:With the rapid development of network and the advent of big data era, massive network traffic data processing demand also came into being. In order to meet the demand of massive network traffic data processing, the network traffic is analyzed effectively and deeply, and the network traffic can be supervised effectively. It is necessary to collect the network traffic log efficiently for the backbone network and then analyze and process the network traffic log efficiently. The multi-dimensional statistical analysis of network traffic log can deeply understand the operation and usage of the network, adjust the strategy to improve the network quality, and the in-depth mining analysis of the network traffic log can discover the characteristics and preferences of users on the Internet. Can deeply understand the user needs, to improve user satisfaction with efficient services. Therefore, this paper studies the processing technology of network traffic log, and finally realizes the HAMANT massive network traffic log analysis system based on Hadoop (abbreviated by the acronym of key words). This paper first introduces the background and significance of the project, the present situation of log processing technology, and summarizes some key technologies related to the subject, including big data's DPI / Hadoop Hbase, data mining and so on. Then according to the need of the project, combined with the application scene, the requirements and functions of the massive network traffic log processing technology are analyzed, and the overall framework of the HAMANT log analysis system is given, and the log collection, log preprocessing and log storage are also given. Log statistics analysis, log mining analysis, report presentation module detailed design. Finally, the performance tests of the system are carried out, and the effect of dealing with the massive network traffic of a key university backbone network is demonstrated, which proves that the system can achieve better results for the processing of the massive network traffic log. And also has certain expansibility. In this paper, the network traffic log technology is deeply explored, and finally a HAMANT log analysis system based on Hadoop is designed. The system adds DPI protocol recognition engine to the collection of network traffic log, which makes the collection of network traffic log rich and efficient, and the part of log storage and processing adopts distributed processing, supports automatic backup, fault-tolerant, and so on. It overcomes the problems of slow processing speed, insufficient storage space and high pressure of server in traditional log processing, and implements the clustering algorithm in data mining distributed and joins the system. The deep analysis of massive network traffic log is realized, which can discover the hidden behavior preference behind a large number of network users. Finally, the system performance test and practical application experiment analysis are given.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.06;TP311.13
【参考文献】
相关期刊论文 前7条
1 陈亮;龚俭;徐选;;基于特征串的应用层协议识别[J];计算机工程与应用;2006年24期
2 曹晶华;邹翔;;校园网网络流量日志处理的设计与实现[J];计算机时代;2008年10期
3 王珊;王会举;覃雄派;周p,
本文编号:2251252
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2251252.html