面向网络入侵检测的数据样本综合处理方法
发布时间:2018-05-20 13:00
本文选题:入侵检测 + 不平衡数据 ; 参考:《浙江工业大学》2014年硕士论文
【摘要】:入侵检测作为一个十分有效且重要的主动安全防御技术,长久以来一直是学者热点研究的前沿课题。训练数据的组成和优劣直接决定了分类模型的有效性、精确度和可伸缩性,从而影响整个入侵检测系统的性能。通过检测网络获得的训练数据具有海量、不平衡、噪音大等特点,给入侵检测系统的实时性和准确性带来了一定挑战。因此,构造入侵检测分类模型前,高效的样本综合预处理十分必要。 网络环境的特殊性对预处理提出了特殊的要求。网络样本的不断产生使得已知分布率无法直接应用于数据挖掘的不平衡处理;样本数量过大给压缩处理本身带来了麻烦,此外样本内的类别不平衡极大地影响了压缩处理的准确率。由此针对网络数据的预处理必须采取结合处理。 本文将从两个方面对样本进行预处理:(1)利用与分布率不相关的K-S统计分割数据集,降低每个数据子集的不平衡程度,减少类别不平衡对分类规则的影响。实验结果表明该方法能够提高不平衡数据分类问题的准确性和效率。(2)改进Affinity Propagation聚类算法,与簇中心距离较近的样本采取直接关联的方法,减少聚类样本数量,降低时空消耗。并依据关联结果,不断调整模型,精确聚类结果。实验表明该方法能够有效地降低聚类算法的时空代价,同时保持较好的数据压缩结果。 最后结合不平衡数据处理及样本数据压缩方法,设计独立于分类学习的预处理算法,构建一个轻量级网络安全入侵检测模型。为检验该模型的有效性,使用KDD99数据集进行实验,并采用不同分类方法学习,以测试模型的适用性。实验结果表明,本文提出的模型在3种分类器下入侵检测时间性能和准确精度都得到了有效提升。且该模型能以较优的时空性能对大数据进行预处理,并可以依据实际需求选择相应分类方法,具有实际可用性。
[Abstract]:As a very effective and important active security defense technology, intrusion detection has long been a hot topic in the hot research of scholars. The composition and advantages of training data directly determine the effectiveness, accuracy and scalability of the classification model, thus affecting the performance of the entire intrusion detection system. The practice of data has the characteristics of mass, unbalance, noise and so on. It brings some challenges to the real-time and accuracy of intrusion detection system. So, before constructing the intrusion detection classification model, the efficient sample comprehensive preprocessing is very necessary.
The particularity of the network environment puts forward special requirements for preprocessing. The continuous generation of network samples makes the known distribution not directly applied to the unbalanced processing of data mining; the large number of samples brings trouble to the compression processing itself, and the classification imbalances in the sample greatly affect the accuracy of the compression processing. The pretreatment of network data must be combined.
This article will preprocess the sample from two aspects: (1) using the K-S statistics that is not related to the distribution rate to divide the data set to reduce the imbalance degree of each subset of the data and reduce the influence of the category imbalance on the classification rules. The experimental results show that the method can improve the accuracy and efficiency of the problem of disequilibrium data classification. (2) improve the Affi Nity Propagation clustering algorithm, which is directly related to the nearest cluster center, reduces the number of cluster samples and reduces the time and space consumption. According to the correlation results, the model is constantly adjusted and the results of clustering are adjusted. The experiment shows that the method can effectively reduce the time and space cost of the low clustering algorithm and keep good data compression at the same time. Result.
Finally, combining the method of unbalanced data processing and sample data compression, a pre processing algorithm independent of classification learning is designed and a lightweight network security intrusion detection model is built. In order to test the validity of the model, the KDD99 data set is used to experiment, and the different classification methods are used to test the applicability of the model. It shows that the proposed model can effectively improve the time performance and accuracy of intrusion detection under the 3 classifiers. And the model can preprocess large data with better temporal and spatial performance, and can select the corresponding classification method according to the actual requirements. It has practical availability.
【学位授予单位】:浙江工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
【参考文献】
相关期刊论文 前10条
1 王丽娜,董晓梅,郭晓淳,于戈;基于数据挖掘的网络数据库入侵检测系统[J];东北大学学报;2003年03期
2 罗敏,王丽娜,张焕国;基于无监督聚类的入侵检测方法[J];电子学报;2003年11期
3 张军;季伟东;韩振强;;基于主机和网络的入侵检测技术的比较与分析[J];哈尔滨师范大学自然科学学报;2006年02期
4 陈仕涛;陈国龙;郭文忠;刘延华;;基于粒子群优化和邻域约简的入侵检测日志数据特征选择[J];计算机研究与发展;2010年07期
5 周荃;王崇骏;王王君;陈世福;;PC4.5:用于不均衡数据集的C4.5改进算法[J];计算机辅助工程;2006年03期
6 陈鹏,吕卫锋,单征;基于网络的入侵检测方法研究[J];计算机工程与应用;2001年19期
7 单松巍,冯是聪,李晓明;几种典型特征选取方法在中文网页分类上的效果比较[J];计算机工程与应用;2003年22期
8 杨向荣,宋擒豹,沈钧毅;基于数据挖掘的智能化入侵检测系统[J];计算机工程;2001年09期
9 李炎,李皓,钱肖鲁,朱扬勇;异常检测算法分析[J];计算机工程;2002年06期
10 李雄飞;李军;董元方;屈成伟;;一种新的不平衡数据学习算法PCBoost[J];计算机学报;2012年02期
,本文编号:1914716
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1914716.html