基于Hadoop的应用层协议识别技术研究
发布时间:2018-11-11 22:09
【摘要】:Internet的飞速发展,使得多种多样的应用层协议不断涌现,导致网络变得更加复杂、更加多样化和难以管理。攻击方式和入侵手段也层出不穷,恶意的网络攻击对网络服务和信息安全产生了严重的危害。应用层协议的识别技术的提出、研究和发展,能够较好的解决网络流量实时识别和提取特征的问题。考虑到安全性、灵活性等因素,很多的新应用层协议不再选择固定的端口号来传输数据,而是比较青睐于动态端口号,并且很多协议不具有统一的标准和规范,因此没法通过固定端口号来寻找到简单快捷的统一分类规律。基于端口号的分类对于使用动态端口的应用则不适用;基于负载的分类方法将会涉及到用户隐私问题,时间代价高;正则表达式的提取主要通过人为分析某种应用层协议的规范文档来提取。在数据爆炸的今天,人为分析协议进行特征提取变得日益困难。本文针对当前应用层协议识别的困难和提取遇到的问题,提出基于Hadoop的应用层协议识别系统。利用并行处理海量数据的Hadoop来识别应用层数据包,并且可以提取出应用层数据包的特征串,实现了对应用层数据包特征的准确提取和识别。本文主要研究内容如下:首先,研究现有的应用层协议识别技术、Hadoop和Hbase的架构和工作机制。其次,研究Apriori算法,并基于Hadoop对该算法进行了改进,得到基于Hadoop的应用层协议特征串提取算法--MapReduceApriori算法。改进后的算法可较好地解决从非公开规范文档的应用层协议中提取特征困难的问题,以及新协议种类繁多人为提取特征日益困难的问题。最后,设计并实现了基于Hadoop的应用层协议识别系统,实验表明该系统能够更高效准确地识别出应用层协议,并能够较准确的提取出未识别协议的特征串。
[Abstract]:With the rapid development of Internet, a variety of application layer protocols are emerging, which makes the network more complex, more diversified and more difficult to manage. Attacks and intrusion methods emerge in endlessly, malicious network attacks on network services and information security has caused serious harm. The application layer protocol identification technology is proposed, researched and developed, which can solve the problem of real-time network traffic recognition and feature extraction. Considering security, flexibility and other factors, many new application-layer protocols do not choose fixed port numbers to transmit data, but prefer dynamic port numbers, and many protocols do not have uniform standards and specifications. Therefore, it is impossible to find a simple and fast uniform classification rule by fixed port number. The classification based on port number is not applicable to the application of dynamic port, the load based classification method will involve user privacy problem, and the time cost will be high. The extraction of regular expressions is mainly done by analyzing the specification documents of a certain application layer protocol. In today's data explosion, it is becoming increasingly difficult to extract features from artificial analysis protocols. Aiming at the difficulties of current application layer protocol recognition and the problems encountered in extraction, this paper proposes an application layer protocol recognition system based on Hadoop. The application layer data packet can be identified by using the Hadoop which processes massive data in parallel, and the feature string of the application layer data packet can be extracted, and the accurate extraction and recognition of the application layer data packet feature can be realized. The main contents of this paper are as follows: firstly, the existing application layer protocol recognition technology, the architecture and working mechanism of Hadoop and Hbase are studied. Secondly, the Apriori algorithm is studied, and the algorithm is improved based on Hadoop, and the MapReduceApriori algorithm, which is based on Hadoop, is proposed to extract the feature string of the application layer protocol. The improved algorithm can solve the problem that it is difficult to extract features from the application layer protocols of non-public specification documents, and that the new protocols are becoming more and more difficult to extract features artificially. Finally, an application layer protocol recognition system based on Hadoop is designed and implemented. Experiments show that the system can recognize the application layer protocol more efficiently and accurately, and extract the feature string of the unrecognized protocol more accurately.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.04
本文编号:2326281
[Abstract]:With the rapid development of Internet, a variety of application layer protocols are emerging, which makes the network more complex, more diversified and more difficult to manage. Attacks and intrusion methods emerge in endlessly, malicious network attacks on network services and information security has caused serious harm. The application layer protocol identification technology is proposed, researched and developed, which can solve the problem of real-time network traffic recognition and feature extraction. Considering security, flexibility and other factors, many new application-layer protocols do not choose fixed port numbers to transmit data, but prefer dynamic port numbers, and many protocols do not have uniform standards and specifications. Therefore, it is impossible to find a simple and fast uniform classification rule by fixed port number. The classification based on port number is not applicable to the application of dynamic port, the load based classification method will involve user privacy problem, and the time cost will be high. The extraction of regular expressions is mainly done by analyzing the specification documents of a certain application layer protocol. In today's data explosion, it is becoming increasingly difficult to extract features from artificial analysis protocols. Aiming at the difficulties of current application layer protocol recognition and the problems encountered in extraction, this paper proposes an application layer protocol recognition system based on Hadoop. The application layer data packet can be identified by using the Hadoop which processes massive data in parallel, and the feature string of the application layer data packet can be extracted, and the accurate extraction and recognition of the application layer data packet feature can be realized. The main contents of this paper are as follows: firstly, the existing application layer protocol recognition technology, the architecture and working mechanism of Hadoop and Hbase are studied. Secondly, the Apriori algorithm is studied, and the algorithm is improved based on Hadoop, and the MapReduceApriori algorithm, which is based on Hadoop, is proposed to extract the feature string of the application layer protocol. The improved algorithm can solve the problem that it is difficult to extract features from the application layer protocols of non-public specification documents, and that the new protocols are becoming more and more difficult to extract features artificially. Finally, an application layer protocol recognition system based on Hadoop is designed and implemented. Experiments show that the system can recognize the application layer protocol more efficiently and accurately, and extract the feature string of the unrecognized protocol more accurately.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.04
【参考文献】
相关期刊论文 前1条
1 刘秋菊;刘书伦;冯艳茹;;基于分类与特征匹配的应用层协议识别方法[J];计算机工程与设计;2012年07期
相关硕士学位论文 前2条
1 韩伟;基于Hadoop云计算平台下DDoS攻击防御研究[D];太原科技大学;2011年
2 刘俊超;基于正则表达式的应用层协议识别技术研究[D];国防科学技术大学;2008年
,本文编号:2326281
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2326281.html