多维报文分类算法研究

发布时间：2019-03-26 13:34

【摘要】：随着社会信息化程度的逐渐提高,互联网已广泛渗透到社会生活的各个领域。报文分类作为互联网的支撑技术之一,在网络测量、防火墙访问列表控制、负载均衡和网络入侵检测系统等诸多领域发挥着重要的作用。互联网规模的急剧增长给报文分类技术带来了严峻的挑战,为了解决当前报文分类算法吞吐率不足、内存使用较大和更新性能难以满足网络需求的问题,本文依托国家863计划课题“面向三网融合的统一安全管控网络”,提出了三种报文分类算法并设计了一种多维报文分类子系统。主要研究内容和创新点如下:1、传统的报文分类算法存在较多规则冗余,导致时间性能无法满足网络需求。针对此问题,提出一种基于动态点切分的报文分类算法(GroupCuts)。首先在分析规则集特征的基础上,通过聚类具有相似空间交叉关系的规则,划分规则集为若干子集;然后在每个子集中动态的选取规则投影点完成空间分解;最后建立多决策树查找结构。仿真结果表明,在保证算法的空间性能前提下,GroupCuts算法的内存访问较代表算法减少了约61%。2、当前报文分类算法在处理大规模复杂规则集时,空间性能不够理想,为了提高算法的空间性能,提出一种采用混合切分法的报文分类算法(HIC,Hybrid Intelligent Cuttings)。该算法首先按照IP地址前缀长度将规则集分组;然后在每个分组中根据当前切分域的特点,分别对IP域和端口域采用比特位切分法和精确投影点切分法实现空间分解;最后构建混合切分结构的决策树。仿真结果表明,HIC算法具有较好的规则集适应性,其内存使用比代表算法减少了约74%。3、针对当前报文分类算法增量更新性能不足问题,提出一种基于前缀划分的报文分类算法(PreCuts)。该算法根据规则IP域的特点,依次采用三种启发式方法建立具有三层查找结构的决策树。在第一层中,按照规则IP地址的最高Byte分组规则。在第二层中,将具有相同IP前缀长度的规则划分到同一个分组。第三层决策树采用前缀比特位划分法,选取相应的划分比特将规则集划分为不同的子集。PreCuts算法所采用的启发式方法不会引入规则复制,算法查找结构中不存在冗余规则,因此增量更新不会降低算法的时间和空间性能。仿真结果表明,与代表算法相比,Pre Cuts不仅在时间和空间性能上有所提升,并且增量更新性能提升了50%以上。4、针对当前报文分类系统对网络中大容量、复杂规则集适应性不足的问题,结合本单位三网融合业务安全管控的要求,设计了一种基于FPGA的多维报文分类子系统。该系统采用二维流水线架构,提高了系统的运算性能,使其可以处理网络中大容量、复杂规则集。测试结果表明,该系统能够达到60Gbps的报文分类速率,很好的满足了当前三网融合网络管控的需求。
[Abstract]:With the gradual improvement of the degree of social informatization, the Internet has been widely infiltrated into various fields of social life. As one of the supporting technologies of Internet, packet classification plays an important role in many fields such as network measurement, firewall access list control, load balancing and network intrusion detection system. The rapid growth of Internet has brought serious challenges to packet classification technology. In order to solve the problems of insufficient throughput of current packet classification algorithms, large memory usage and update performance, it is difficult to meet the needs of the network. Based on the national 863 project "Unified Security Control Network for three Networks Convergence", this paper proposes three kinds of packet classification algorithms and designs a multi-dimensional message classification subsystem. The main research contents and innovations are as follows: 1. The traditional packet classification algorithm has more rules redundancy which leads to the time performance can not meet the needs of the network. In order to solve this problem, a message classification algorithm (GroupCuts). Based on dynamic point segmentation is proposed. On the basis of analyzing the features of the rule set, the rule set is divided into several subsets by clustering the rules with similar spatial crossover relations, and then the spatial decomposition is accomplished by dynamically selecting the projection points of the rules in each subset. Finally, a multi-decision tree search structure is established. The simulation results show that under the premise of guaranteeing the spatial performance of the algorithm, the memory access of the GroupCuts algorithm is about 61% less than that of the representative algorithm, and the spatial performance of the current packet classification algorithm is not satisfactory when dealing with large-scale complex rule sets. In order to improve the spatial performance of the algorithm, a message classification algorithm based on mixed segmentation (HIC,Hybrid Intelligent Cuttings).) is proposed. Firstly, the rule set is grouped according to the length of the IP address prefix, and then in each packet according to the characteristics of the current segmentation domain, the bit splitting method and the precise projection point segmentation method are used to realize the spatial decomposition of the IP domain and the port domain respectively. Finally, the decision tree with mixed segmentation structure is constructed. The simulation results show that the HIC algorithm has good rule set adaptability, and its memory usage is about 74% less than that of the representative algorithm, and the performance of incremental update of the current packet classification algorithm is insufficient. A message classification algorithm (PreCuts). Based on prefix partition is proposed. According to the characteristics of regular IP domain, three heuristic methods are used to build a decision tree with a three-layer search structure. In the first layer, the highest Byte grouping rule is based on a regular IP address. In layer 2, rules with the same IP prefix length are divided into the same group. The third layer decision tree adopts the prefix bit partition method, and selects the corresponding partition bits to divide the rule set into different subsets. The heuristic method used in PreCuts algorithm will not introduce rule replication, and there are no redundant rules in the search structure of the algorithm. Therefore, incremental updating does not degrade the temporal and spatial performance of the algorithm. The simulation results show that compared with the representative algorithm, Pre Cuts not only improves the performance of time and space, but also improves the performance of incremental update by more than 50%. 4. For the current packet classification system, it has a large capacity in the network. Due to the lack of adaptability of complex rule set, a multi-dimensional message classification subsystem based on FPGA is designed according to the requirement of security management and control of the three networks fusion business. The two-dimensional pipeline architecture is adopted in this system, which improves the performance of the system and enables it to deal with large capacity and complex rule sets in the network. The test results show that the system can achieve the packet classification rate of 60Gbps and meet the requirements of the current three-network convergence network management and control.
【学位授予单位】：解放军信息工程大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08

【参考文献】