基于DPI系统的改进正则表达式算法

发布时间：2018-02-27 03:24

本文关键词： DPI 匹配算法模式匹配正则表达式自动机猜测-分组-检验算法　出处：《江西理工大学》2014年硕士论文　论文类型：学位论文

【摘要】：随着科技的发展和网络的普及，使得互联网无论在人们的工作还是生活中都扮演着非常重要的角色，例如淘宝购物、公司文案的处理以及个人资料的保存都离不开互联网。然而就如同一把双刃剑，互联网的安全也成为一个不能够忽视的问题，如何防止信息和机密文件的泄露与篡改俨然是迫在眉睫的研究内容。因此，采用DPI（深度报文检测）检测方法解决互联网安全问题，已经成为了现今一种有效方法并被广泛采纳使用。然而，在对DPI检测技术的研究和分析的基础上，发现现有的DPI检测技术中匹配算法上的不足包括：（1）若DPI匹配算法采用的是模式匹配算法，当有网络流量形式复杂与多变，模式匹配算法会呈现出一种其匹配速度慢、匹配方式单一的衰老状态，这将无法满足现今复杂多变的网络流量；（2）若DPI匹配算法采用的是正则表达式算法，现今正则表达式算法的不足之处是在转变为自动机的过程中消耗过多的内存，占用极大的系统资源。针对上述所描述DPI匹配算法所存在的问题，本文提出基于DPI系统的改进正则式表达算法。具体的内容如下：首先，对DPI检测方法的工作原理进行了深入学习与研究，并通过搭建DPI系统模型对网络上多种应用协议的识别和阻断，证实在实际的应用中DPI检测系统可以极大的提高了防范网络信息泄露的能力，而且够有效地对多种网络应用进行识别和监控，并且在网络安全上具备着广泛的应用包括有：反病毒、入侵防御、URL过滤、内容过滤、文件过滤、应用行为控制和邮件过滤等功能。其次，分析了DPI检测方法中最为核心的网络流匹配引擎所采用的识别算法，通过对模式匹配算法和正则表达式算法研究和对比，总结出了以往算法的不足。提出一种基于DPI系统的改进正则表达式算法：猜测-分组-检验算法。算法首先对出现概率高的部分特征子块进行搜索并把特征子块进行分组后DFA转换，然后对输入的网络流量进行猜测匹配，若流量完成DFA匹配则使用NFA进行完整验证。最后，通过实验验证了本文所提的猜测-分组-检验算法的正确性和有效性，并对比Hybrid-FA算法和猜测-检验算法，，证明本文算法能有效地减少DFA状态机转化，减少内存使用和资源占用率，对网络流协议识别方面具有优越性。
[Abstract]:With the development of science and technology and the popularity of the Internet, the Internet plays a very important role in people's work and life, such as Taobao shopping, The handling of corporate documents and the preservation of personal data are inseparable from the Internet. However, just like a double-edged sword, the security of the Internet has become a problem that cannot be ignored. How to prevent the disclosure and tampering of information and confidential documents is an urgent research content. Therefore, DPI (Deep message Detection) detection method is used to solve the Internet security problem. Has become an effective method and has been widely used. However, based on the research and analysis of DPI detection technology, It is found that the shortcomings of the matching algorithms in the existing DPI detection techniques include: 1) if the DPI matching algorithm uses a pattern matching algorithm, when the network traffic forms are complex and changeable, the pattern matching algorithm will present a slow matching speed. If the DPI matching algorithm uses a regular expression algorithm, it can not meet the needs of the complex and changeable network traffic. The shortcoming of the current regular expression algorithm is that it consumes too much memory and takes up a lot of system resources in the process of converting to automaton. In view of the problems of the DPI matching algorithm described above, An improved canonical representation algorithm based on DPI system is proposed in this paper. First of all, the working principle of DPI detection method is deeply studied and studied, and the identification and blocking of various application protocols on the network by building a DPI system model are carried out. It is proved that the DPI detection system can greatly improve the ability of preventing network information leakage in practical applications, and it is also effective enough to identify and monitor various network applications. And it has a wide range of applications in network security, including anti-virus, intrusion prevention URL filtering, content filtering, file filtering, application behavior control and email filtering and other functions. Secondly, the recognition algorithm used by the network flow matching engine, which is the core of the DPI detection method, is analyzed, and the pattern matching algorithm and the regular expression algorithm are studied and compared. This paper summarizes the shortcomings of the previous algorithms, and proposes an improved regular expression algorithm based on DPI system: conjecture-packet-test algorithm. The algorithm first searches some feature subblocks with high occurrence probability and makes feature subblocks. After grouping the DFA transformation, Then the inputted network traffic is estimated and matched, and if the traffic completes the DFA matching, NFA is used to complete the verification. Finally, the correctness and validity of the conjecture-packet-test algorithm proposed in this paper are verified by experiments. Compared with the Hybrid-FA algorithm and the conjecture-test algorithm, it is proved that the proposed algorithm can effectively reduce the transformation of the DFA state machine. It has advantages in network flow protocol identification by reducing memory usage and resource occupancy.
【学位授予单位】：江西理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08

【参考文献】