基于互信息与KNN的入侵检测技术研究
发布时间:2019-04-22 12:14
【摘要】:网络技术飞速发展,导致各种网络安全问题越来越严重,所需的安全防护措施也越来越重要。入侵检测技术是一种基于预防的动态安全防范措施,它一直是信息安全领域研究的热点,有着举足轻重的地位。 本文针对传统KNN算法学习效率低下的缺点,提出一种快速KNN (F-KNN)算法。其主要作了以下三个方面的改进: 第一,删减训练样本库。删除训练集中的大量重复数据,以减少算法学习过程中的计算量,从而提高学习效率。 第二,建立索引模型。随机选取一个训练样本作为基准点,计算其他训练样本与该基准点的距离,且由小到大进行排序,得到一个有序线性表,并抽取有序线性表中间隔相等的样本建立索引表,根据索引表和有序线性表快速查找待分类测试样本的k个最近邻,以缩小查找范围,从而提高学习效率。 第三,设缓存功能。对待分类测试样本时,先与缓存的已分类测试样本比对,若有相同,则直接赋予缓存样本的类标,若无相同,再行分类学习,从而提高学习效率。 本文选用KDD CUP99数据集作为实验数据,首先对该数据集进行预处理;然后使用基于互信息的特征约简算法进行特征选择;最后使用F-KNN算法对特征约简后的数据集进行异常检测。实验结果表明,F-KNN算法在不降低分类精度的前提下,大幅度提高了分类学习效率。
[Abstract]:With the rapid development of network technology, all kinds of network security problems become more and more serious, and the security protection measures are more and more important. Intrusion detection technology is a kind of dynamic security measures based on prevention. It has always been a hot spot in the field of information security and plays an important role in the field of information security. In this paper, a fast KNN (F-KNN) algorithm is proposed to overcome the disadvantage of low learning efficiency of traditional KNN algorithm. It mainly makes the following three aspects of improvement: first, delete the training sample database. In order to reduce the computational complexity in the learning process of the algorithm, a large number of duplicated data in the training set are deleted so as to improve the learning efficiency. Secondly, the index model is established. A training sample is randomly selected as the reference point, the distance between the other training samples and the reference point is calculated, and the order linear table is obtained from small to large, and the sample with equal interval in the ordered linear table is taken to set up the index table. According to the index table and the ordered linear table, k nearest neighbors of the test samples to be classified are quickly searched to reduce the search range and thus improve the learning efficiency. Third, set up cache function. When the classification test sample is treated, it is first compared with the cached classified test sample. If there is the same, the class label of the cache sample is directly assigned to the cache sample. If there is not the same, then the classification learning is performed, so as to improve the learning efficiency. In this paper, the KDD CUP99 data set is chosen as the experimental data, firstly, the data set is pre-processed, then the feature reduction algorithm based on mutual information is used for feature selection. Finally, the F-KNN algorithm is used to detect the anomaly of the reduced data set. The experimental results show that the F-KNN algorithm greatly improves the classification learning efficiency without reducing the classification accuracy.
【学位授予单位】:华东理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
本文编号:2462823
[Abstract]:With the rapid development of network technology, all kinds of network security problems become more and more serious, and the security protection measures are more and more important. Intrusion detection technology is a kind of dynamic security measures based on prevention. It has always been a hot spot in the field of information security and plays an important role in the field of information security. In this paper, a fast KNN (F-KNN) algorithm is proposed to overcome the disadvantage of low learning efficiency of traditional KNN algorithm. It mainly makes the following three aspects of improvement: first, delete the training sample database. In order to reduce the computational complexity in the learning process of the algorithm, a large number of duplicated data in the training set are deleted so as to improve the learning efficiency. Secondly, the index model is established. A training sample is randomly selected as the reference point, the distance between the other training samples and the reference point is calculated, and the order linear table is obtained from small to large, and the sample with equal interval in the ordered linear table is taken to set up the index table. According to the index table and the ordered linear table, k nearest neighbors of the test samples to be classified are quickly searched to reduce the search range and thus improve the learning efficiency. Third, set up cache function. When the classification test sample is treated, it is first compared with the cached classified test sample. If there is the same, the class label of the cache sample is directly assigned to the cache sample. If there is not the same, then the classification learning is performed, so as to improve the learning efficiency. In this paper, the KDD CUP99 data set is chosen as the experimental data, firstly, the data set is pre-processed, then the feature reduction algorithm based on mutual information is used for feature selection. Finally, the F-KNN algorithm is used to detect the anomaly of the reduced data set. The experimental results show that the F-KNN algorithm greatly improves the classification learning efficiency without reducing the classification accuracy.
【学位授予单位】:华东理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
【参考文献】
相关期刊论文 前10条
1 曹建军;刁兴春;杜瀊;王芳潇;张潇毅;;基于蚁群特征选择的相似重复记录分类检测[J];兵工学报;2010年09期
2 窦东阳;杨建国;李丽娟;赵英凯;;基于规则的神经网络在模式分类中的应用[J];东南大学学报(自然科学版);2011年03期
3 马骏;;入侵检测系统发展简述[J];电脑知识与技术;2008年34期
4 李成云;支冬栋;;免疫算法在入侵检测模型中的应用研究[J];电脑知识与技术;2011年19期
5 陆广泉;谢扬才;刘星;张师超;;一种基于KNN的半监督分类改进算法[J];广西师范大学学报(自然科学版);2012年01期
6 蔡贺;张睿;;k最近邻域分类算法分析与研究[J];甘肃科技;2012年18期
7 卢新国,林亚平,陈治平;一种改进的互信息特征选取预处理算法[J];湖南大学学报(自然科学版);2005年01期
8 李凯齐;刁兴春;曹建军;李峰;;基于改进蚁群算法的高精度文本特征选择方法[J];解放军理工大学学报(自然科学版);2010年06期
9 徐峻岭;周毓明;陈林;徐宝文;;基于互信息的无监督特征选择[J];计算机研究与发展;2012年02期
10 贾世国;张昌城;;基于数据挖掘的网络入侵检测系统设计与实现[J];计算机工程与应用;2008年14期
,本文编号:2462823
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2462823.html