基于互信息与KNN的入侵检测技术研究

发布时间：2019-04-22 12:14

【摘要】：网络技术飞速发展,导致各种网络安全问题越来越严重,所需的安全防护措施也越来越重要。入侵检测技术是一种基于预防的动态安全防范措施,它一直是信息安全领域研究的热点,有着举足轻重的地位。本文针对传统KNN算法学习效率低下的缺点,提出一种快速KNN (F-KNN)算法。其主要作了以下三个方面的改进：第一,删减训练样本库。删除训练集中的大量重复数据,以减少算法学习过程中的计算量,从而提高学习效率。第二,建立索引模型。随机选取一个训练样本作为基准点,计算其他训练样本与该基准点的距离,且由小到大进行排序,得到一个有序线性表,并抽取有序线性表中间隔相等的样本建立索引表,根据索引表和有序线性表快速查找待分类测试样本的k个最近邻,以缩小查找范围,从而提高学习效率。第三,设缓存功能。对待分类测试样本时,先与缓存的已分类测试样本比对,若有相同,则直接赋予缓存样本的类标,若无相同,再行分类学习,从而提高学习效率。本文选用KDD CUP99数据集作为实验数据,首先对该数据集进行预处理；然后使用基于互信息的特征约简算法进行特征选择；最后使用F-KNN算法对特征约简后的数据集进行异常检测。实验结果表明,F-KNN算法在不降低分类精度的前提下,大幅度提高了分类学习效率。
[Abstract]:With the rapid development of network technology, all kinds of network security problems become more and more serious, and the security protection measures are more and more important. Intrusion detection technology is a kind of dynamic security measures based on prevention. It has always been a hot spot in the field of information security and plays an important role in the field of information security. In this paper, a fast KNN (F-KNN) algorithm is proposed to overcome the disadvantage of low learning efficiency of traditional KNN algorithm. It mainly makes the following three aspects of improvement: first, delete the training sample database. In order to reduce the computational complexity in the learning process of the algorithm, a large number of duplicated data in the training set are deleted so as to improve the learning efficiency. Secondly, the index model is established. A training sample is randomly selected as the reference point, the distance between the other training samples and the reference point is calculated, and the order linear table is obtained from small to large, and the sample with equal interval in the ordered linear table is taken to set up the index table. According to the index table and the ordered linear table, k nearest neighbors of the test samples to be classified are quickly searched to reduce the search range and thus improve the learning efficiency. Third, set up cache function. When the classification test sample is treated, it is first compared with the cached classified test sample. If there is the same, the class label of the cache sample is directly assigned to the cache sample. If there is not the same, then the classification learning is performed, so as to improve the learning efficiency. In this paper, the KDD CUP99 data set is chosen as the experimental data, firstly, the data set is pre-processed, then the feature reduction algorithm based on mutual information is used for feature selection. Finally, the F-KNN algorithm is used to detect the anomaly of the reduced data set. The experimental results show that the F-KNN algorithm greatly improves the classification learning efficiency without reducing the classification accuracy.
【学位授予单位】：华东理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08

【参考文献】