DFA压缩和Snort规则解析的算法研究

发布时间：2018-09-11 11:47

【摘要】：随着网络技术的发展，针对网络传输的重要信息的攻击变得更加隐秘和复杂。从攻击数据包的头部到恶意代码、入侵指令等攻击可能隐藏在数据包的内容中。深包检测技术（DPI）不仅对数据包的头部进行检测，还对数据包的内容进行分析。DPI通过使用一组给定的安全规则与网络中捕获的数据包内容进行比较，从而发现数据包内容中携带的攻击特征。早期DPI定义的安全规则中，用精确字符串描述入侵行为的特征，这些字符串组成规则过滤集。然而，为躲避检测，各种恶意代码和入侵指令不断刻意隐藏自己的特征，，导致精确字符串已经不能充分描述这些特征。正则表达式因表达能力灵活成为描述入侵行为特征的新一代工具。将正则表达式编译为确定性有穷自动机（DFA）可以实现正则表达式与数据包的匹配。DFA自身具有高效的匹配速率，但占用存储空间较大。如果用软件实现匹配算法，会面临匹配耗时过长的问题；如果用硬件实现匹配算法，则会面临存储空间较小的问题，通常硬件可用空间大小为数百KB到2MB左右。因此，构建时空高效的正则表达式匹配过滤算法是目前DPI研究的热点之一。本文重点研究了DFA的存储压缩问题，实验中所需的正则表达式来源于Snort2.8规则集中的一些关键选项。本文也简要介绍了实验中对Snort2.8规则集中关键选项的解析处理方法。主要工作为以下四个方面： (1)对于同一正则语言，可以存在多个识别此语言的DFA，但任何正则语言都有一个唯一的（不计同构）状态数目最少的DFA。将任意DFA转化为等价的状态数最少的DFA的算法称为DFA最小化算法。本文分析了DFA最小化算法实现的理论依据，描述并实现了一个DFA最小化的经典算法。 (2)正则表达式在编译为DFA跳转表数据后，存在数据量大和数据冗余的问题。通过对DFA跳转函数表进行分析，发现了数据量大与其字符集和状态数之间的关系。分别提出了字符集压缩算法和基于层次聚类的状态压缩算法。 (3)介绍了Snort2.8规则集中规则的构成，说明了从文本规则到存入内存数据结构的Snort2.8规则集解析处理过程，给出了一种抽取Snort规则中关键选项的算法。 (4)通过Snort2.8规则集进行实验，评估了两种压缩算法结合使用后的DFA压缩效率。与原始DFA算法和D2FA相比，存储空间开销分别减少了98.7%和63.2%。
[Abstract]:With the development of network technology, the attacks against important information transmitted by network become more secret and complex. Attacks ranging from the header of attack packets to malicious code, intrusion instructions and so on may be hidden in the contents of packets. Deep-packet detection technology (DPI) not only detects the header of data packets, but also analyzes the contents of packets. DPI compares the contents of packets captured in the network by using a set of given security rules. Thus, the attack characteristics carried in the packet content are found. In earlier security rules defined by DPI, exact strings were used to describe the characteristics of intrusion behavior, and these strings formed a rule filter set. However, in order to avoid detection, all kinds of malicious code and intrusion instructions have been deliberately hiding their own characteristics, resulting in accurate string can no longer fully describe these characteristics. Regular expression is a new generation of tools for describing intrusion behavior because of its flexible expression ability. Compiling regular expressions into deterministic finite automata (DFA) can realize matching between regular expressions and packets. If the matching algorithm is implemented by software, it will be faced with the problem of too long matching time; if the matching algorithm is implemented with hardware, it will face the problem of small storage space, usually the size of available hardware space is about hundreds of KB to 2MB. Therefore, the construction of spatiotemporal efficient regular expression matching filtering algorithm is one of the hotspots of DPI research. This paper focuses on the storage compression problem of DFA. The required regular expressions in the experiment come from some key options in the Snort2.8 rule set. This paper also briefly introduces the analytical method of key options in Snort2.8 rule set. The main work is as follows: (1) for the same regular language, there can be more than one DFA, that recognizes the language, but any regular language has a unique (excluding isomorphism) state with the least number of DFA. The algorithm of transforming arbitrary DFA into equivalent DFA with the least number of states is called DFA minimization algorithm. In this paper, the theoretical basis of DFA minimization algorithm is analyzed, and a classical DFA minimization algorithm is described and implemented. (2) after compiling to DFA jump table data, the regular expression has the problems of large amount of data and redundant data. By analyzing the DFA jump function table, the relationship between the large amount of data and its character set and state number is found. The character set compression algorithm and the state compression algorithm based on hierarchical clustering are put forward respectively. (3) the composition of Snort2.8 rule set rules is introduced, and the parsing process of Snort2.8 rule set from text rules to memory data structure is explained. An algorithm for extracting key options from Snort rules is presented. (4) the DFA compression efficiency of two compression algorithms is evaluated by Snort2.8 rule set experiments. Compared with the original DFA algorithm and D2FA, the storage space overhead is reduced by 98.7% and 63.2% respectively.
【学位授予单位】：西安电子科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08

【参考文献】