基于位分割的K步长多模式匹配算法的研究

发布时间：2018-03-03 21:36

本文选题：模式匹配　切入点：入侵检测　出处：《杭州电子科技大学》2014年硕士论文　论文类型：学位论文

【摘要】：随着网络技术的高速发展和互联网的日益开放，网络应用日趋普及。与此同时，网络带来的攻击行为也引发了人们对网络安全问题的关注。作为维护网络安全的重要手段，入侵检测系统得到了广泛的应用。而模式匹配是入侵检测系统中最重要的一种检测技术，其创新性和有效性将成为提高入侵检测系统性能和效率的关键。论文对入侵检测系统进行了一个简单的概述，介绍了入侵检测的一般过程和分类，并阐述了模式匹配在入侵检测中的应用。此外，论文还对几种经典的模式匹配算法作了详细的介绍，其中包括BF、KMP、BM等单模式匹配算法和AC、AC_BM等多模式匹配算法。论文介绍了Bloom Filter的原理及应用。Bloom Filter是一种基于多个哈希函数映射压缩空间的数据结构，通过寻找一种优化的哈希查找算法可以提高Bloom Filter的性能。针对现有哈希算法中链地址法处理冲突时存在查找效率低的问题，论文设计了一种改进的哈希表查询算法。经分析和实际测试表明，该算法在不增加消耗的同时降低了冲突时执行查询的查找长度，从而缩短了查询响应时间。针对经典AC算法构建的状态机内存利用率低且需要频繁访问外存的问题，论文设计了一种改进的K步长状态机。设计算法主要由以下四个模块组成：改进的AC算法、文本转换、映射机制和匹配查询。其中，改进的AC算法对原AC状态机中各个状态的输出进行了重新定义，映射机制则负责将改进的AC状态机中相应的状态信息映射到K步长状态机中。通过这种映射机制，最终生成的K步长状态机中只包含跳转和输出信息，没有失效函数。而且在状态存储时，改进的K步长状态机在原有链式存储的基础上，只保留出现过的子串输入，对于链表中未出现的子串，则借助查询算法进行处理。因此，和原来的K步长状态机相比，改进的K步长状态机占据更高的内存优势。尽管改进的K步长状态机解决了一些问题，，但并没有达到最优的内存资源利用率。为此，在它的基础上，论文提出了一种基于位分割的K步长多模式匹配方法，即从位的角度出发，将原先的一个状态机分割成八个小状态机，每个状态机可以同时读取K个输入字符的K个比特位，当所有子状态机都输出匹配信号时才确定匹配。该算法有两个优点：首先，子状态机的每个状态最多只有2K种输入选择，这使得内存更加紧凑；其次，几个子状态机可以独立并行工作，加快了模式查询的速度。同时，为了避免一些不必要的查询，待匹配的字符可以先进入Bloom Filter引擎过滤出可疑字符，由于Bloom Filter存在假阳性误判，过滤后的字符要进行精确匹配。在文中，精确匹配由位分割状态机来完成。二者的结合从整体上提高了匹配查询的效率。
[Abstract]:With the rapid development of network technology and the increasing opening of the Internet, network applications are becoming more and more popular. At the same time, the attacks brought by the network also arouse people's attention to network security issues. Intrusion detection system (IDS) has been widely used, and pattern matching is the most important detection technology in IDS. Its innovation and effectiveness will be the key to improve the performance and efficiency of IDS. This paper gives a brief overview of intrusion detection system, introduces the general process and classification of intrusion detection, and expounds the application of pattern matching in intrusion detection. Several classical pattern matching algorithms are also introduced in this paper, including single pattern matching algorithm such as BFN KMPBM and multi-pattern matching algorithm such as ACK / AC\ + BM\\\). This paper introduces the principle and application of Bloom Filter. Bloom Filter is a data structure based on multiple hash function maps. The performance of Bloom Filter can be improved by looking for an optimized hash lookup algorithm. An improved hashing table query algorithm is designed in this paper. The analysis and practical test show that the algorithm can reduce the query search length and shorten the query response time without increasing the consumption. Aiming at the problem of low memory utilization and frequent access to external memory of the state machine constructed by classical AC algorithm, an improved K-step state machine is designed in this paper. The algorithm is composed of the following four modules: the improved AC algorithm. Text conversion, mapping mechanism and matching query. Among them, the improved AC algorithm redefines the output of each state in the original AC state machine. The mapping mechanism is responsible for mapping the corresponding state information from the improved AC state machine to the K step state machine. Through this mapping mechanism, only jump and output information are included in the resulting K step state machine. There is no failure function. Moreover, in state storage, the improved K-step state machine only retains the existing sub-string input on the basis of the original chain storage, and the query algorithm is used to deal with the sub-string that does not appear in the linked list. Compared with the original K step state machine, the improved K step state machine occupies a higher memory advantage. Although the improved K-step state machine solves some problems, it does not achieve the optimal utilization of memory resources. Based on this, a K-step multi-pattern matching method based on bit segmentation is proposed in this paper. That is, from the point of view of bit, the original state machine is divided into eight small state machines. Each state machine can read K bits of K input characters at the same time. The algorithm has two advantages: first, there are at most 2K input choices for each state of the substate machine, which makes the memory more compact. Several sub-state machines can work independently and in parallel, which speeds up the speed of pattern query. At the same time, in order to avoid some unnecessary queries, the characters to be matched can be filtered into the Bloom Filter engine to filter out suspicious characters, because Bloom Filter has false positive misjudgment. In this paper, the exact matching is accomplished by bit segmentation state machine. The combination of the two methods improves the efficiency of matching query.
【学位授予单位】：杭州电子科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08

【参考文献】