基于DPI的网络业务流量识别技术研究
发布时间:2018-12-05 18:01
【摘要】:当今,互联网飞速发展,网络新业务层见叠出,网络流量也呈现指数级的增长。网络业务流量的精细识别被广泛应用于规划和管理网络,解决网络用塞,预防网络攻击等方面,成为对防火墙等安全技术的有力补充。高速网络的出现对流量识别技术提出了更高的要求,而分布式计算框架对大规模数据的处理能力使其能够更好的应对高速网络流量,从而确保网络环境的通畅。因此,将分布式计算框架应用于网络业务流量识别中已成为新的研究热点。本文全面详细的阐述了网络流量识别技术的理论,对当下最为常见的网络流量识别技术中包含的端口识别技术、DFI技术和DPI技术进行了深入分析。通过分析网络流量识别的需求,重点研究了DPI技术中的KMP算法、BM算法、WM算法和AC算法,对各种算法的原理以及算法的运算流程进行了对比性研究,提出了一种改进的模式匹配算法--BMF算法,它能够更加快速的进行文本串的模式匹配。伴随着互联网的高速发展,传统的网络结构已经难以适应如今网络新业务的需求,传统的关系型数据的存储和计算也已经难以适应未来海量流量增长的需求,因此应用分布式计算框架对大规模数据流量进行识别是必然的发展趋势,本文根据Hadoop云计算平台的特点设计了基于DPI技术和MapReduce模块的MapReduceBoyer-MooreFast算法的运算流程,将DPI技术应用到Hadoop云计算平台中,最后搭建Hadoop实验集群,抓取数据进行对比实验,实验结果表明,该方法能够有效的识别网络业务流量。本文的主要工作如下:(1)提出了一种改进的模式匹配算法—BMF算法。BM算法利用好后缀规则和坏字符规则构造两张跳转表,指示字符向右移动的距离,在此基础上,本文对算法的匹配思想进行了优化和改进,舍弃了好后缀规则以及好后缀规则中数据链表的构造,从而简化了算法的运算流程,降低了空间复杂度,重点利用坏字符规则,改进字符匹配方式,增加文本串向右移动的最大距离,降低了文本串向右移动的次数。实验结果表明,BMF算法在不降低匹配准确率的前提下一定程度上提高了模式匹配算法的运行效率。(2)设计了基于Hadoop平台的DPI技术流量识别方案。首先使用抓包软件Wireshark对网络流量进行抓取,提取流量的数据包特征,然后利用Hadoop平台处理大规模数据流量的优势,将DPI技术与MapReduce编程框架进行结合,根据其框架特点设计了MapReduceBoyer-MooreFast算法的运算流程,最后搭建相关的实验环境,在Hadoop云计算平台下实现了基于DPI技术的流量识别。实验结果表明,DPI技术在Hadoop平台下不仅提高了流量识别的效率,而且也保证了识别的准确率。
[Abstract]:Nowadays, with the rapid development of the Internet, the new network services are stacked, and the network traffic increases exponentially. The fine identification of network traffic is widely used in planning and managing network, solving network plug, preventing network attack and so on. It becomes a powerful supplement to firewall and other security technologies. The emergence of high-speed network has put forward higher requirements for traffic identification technology, while the distributed computing framework has the ability to deal with large-scale data better to cope with high-speed network traffic, so as to ensure the smooth flow of network environment. Therefore, the application of distributed computing framework in network traffic identification has become a new research hotspot. In this paper, the theory of network traffic identification technology is expounded in detail, and the port identification technology, DFI technology and DPI technology, which are the most common network traffic identification technology, are deeply analyzed. By analyzing the demand of network traffic identification, the KMP algorithm, BM algorithm, WM algorithm and AC algorithm in DPI technology are studied. In this paper, an improved pattern matching algorithm, BMF algorithm, is proposed, which can match the pattern of text string more quickly. With the rapid development of the Internet, the traditional network structure has been difficult to adapt to the needs of new network services, the traditional relational data storage and computing has been difficult to adapt to the future demand of massive traffic growth. Therefore, it is an inevitable trend to use distributed computing framework to identify large-scale data traffic. According to the characteristics of Hadoop cloud computing platform, this paper designs the MapReduceBoyer-MooreFast algorithm based on DPI technology and MapReduce module. The DPI technology is applied to the Hadoop cloud computing platform. Finally, the Hadoop experimental cluster is built, and the data is grabbed to carry on the contrast experiment. The experimental results show that the method can effectively identify the network traffic. The main work of this paper is as follows: (1) an improved pattern matching algorithm, BMF algorithm, is proposed. The BM algorithm constructs two jump tables using good suffix rules and bad character rules to indicate the distance of characters moving to the right. This paper optimizes and improves the algorithm's matching idea, forgets the construction of good suffix rule and data linked list in good suffix rule, thus simplifies the operation flow of the algorithm, reduces the space complexity, and makes use of the bad character rule. The method of character matching is improved to increase the maximum distance of text string moving to the right and to reduce the frequency of text string moving to the right. The experimental results show that the BMF algorithm improves the efficiency of the pattern matching algorithm to some extent without reducing the matching accuracy. (2) the scheme of DPI traffic recognition based on Hadoop platform is designed. Firstly, the packet grabbing software Wireshark is used to capture the network traffic and extract the packet characteristics of the traffic. Then, the advantage of the Hadoop platform to deal with the large-scale data traffic is used to combine the DPI technology with the MapReduce programming framework. According to the characteristics of the framework, the operation flow of MapReduceBoyer-MooreFast algorithm is designed. Finally, the related experimental environment is built, and the traffic identification based on DPI technology is realized on the platform of Hadoop cloud computing. The experimental results show that DPI technology not only improves the efficiency of traffic identification, but also ensures the accuracy of recognition on Hadoop platform.
【学位授予单位】:曲阜师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.06
[Abstract]:Nowadays, with the rapid development of the Internet, the new network services are stacked, and the network traffic increases exponentially. The fine identification of network traffic is widely used in planning and managing network, solving network plug, preventing network attack and so on. It becomes a powerful supplement to firewall and other security technologies. The emergence of high-speed network has put forward higher requirements for traffic identification technology, while the distributed computing framework has the ability to deal with large-scale data better to cope with high-speed network traffic, so as to ensure the smooth flow of network environment. Therefore, the application of distributed computing framework in network traffic identification has become a new research hotspot. In this paper, the theory of network traffic identification technology is expounded in detail, and the port identification technology, DFI technology and DPI technology, which are the most common network traffic identification technology, are deeply analyzed. By analyzing the demand of network traffic identification, the KMP algorithm, BM algorithm, WM algorithm and AC algorithm in DPI technology are studied. In this paper, an improved pattern matching algorithm, BMF algorithm, is proposed, which can match the pattern of text string more quickly. With the rapid development of the Internet, the traditional network structure has been difficult to adapt to the needs of new network services, the traditional relational data storage and computing has been difficult to adapt to the future demand of massive traffic growth. Therefore, it is an inevitable trend to use distributed computing framework to identify large-scale data traffic. According to the characteristics of Hadoop cloud computing platform, this paper designs the MapReduceBoyer-MooreFast algorithm based on DPI technology and MapReduce module. The DPI technology is applied to the Hadoop cloud computing platform. Finally, the Hadoop experimental cluster is built, and the data is grabbed to carry on the contrast experiment. The experimental results show that the method can effectively identify the network traffic. The main work of this paper is as follows: (1) an improved pattern matching algorithm, BMF algorithm, is proposed. The BM algorithm constructs two jump tables using good suffix rules and bad character rules to indicate the distance of characters moving to the right. This paper optimizes and improves the algorithm's matching idea, forgets the construction of good suffix rule and data linked list in good suffix rule, thus simplifies the operation flow of the algorithm, reduces the space complexity, and makes use of the bad character rule. The method of character matching is improved to increase the maximum distance of text string moving to the right and to reduce the frequency of text string moving to the right. The experimental results show that the BMF algorithm improves the efficiency of the pattern matching algorithm to some extent without reducing the matching accuracy. (2) the scheme of DPI traffic recognition based on Hadoop platform is designed. Firstly, the packet grabbing software Wireshark is used to capture the network traffic and extract the packet characteristics of the traffic. Then, the advantage of the Hadoop platform to deal with the large-scale data traffic is used to combine the DPI technology with the MapReduce programming framework. According to the characteristics of the framework, the operation flow of MapReduceBoyer-MooreFast algorithm is designed. Finally, the related experimental environment is built, and the traffic identification based on DPI technology is realized on the platform of Hadoop cloud computing. The experimental results show that DPI technology not only improves the efficiency of traffic identification, but also ensures the accuracy of recognition on Hadoop platform.
【学位授予单位】:曲阜师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.06
【参考文献】
相关期刊论文 前10条
1 彭立志;;互联网流量识别研究综述[J];济南大学学报(自然科学版);2016年02期
2 杜江;张铮;张杰鑫;邰铭;;MapReduce并行编程模型研究综述[J];计算机科学;2015年S1期
3 李莉;江育娥;林R,
本文编号:2365302
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2365302.html