基于ReliefF的入侵特征选择算法研究
发布时间:2018-07-17 06:02
【摘要】:互联网的出现,不仅给人们提供了一个良好的互相交流学习的平台,拉近了人与人之间的距离,同时也把大量的威胁带到我们身边,黑客们通过各种手段破坏信息系统的安全。如何有效的保障信息的完整性,可用性和隐秘性,成为各个行业共同面对的挑战。入侵检测技术作为一种主动防御技术,在保障网络信息系统安全方面取得了重要成就。但是随着信息社会进入大数据时代,信息由原来的单一属性或低维属性向高维属性转化,数据呈现出大规模,呈现高维度、高噪声、高复杂度的特点,造成“维度灾难”现象。为传统的入侵检测系统的计算能力及实时性需求提出了严峻的挑战。 异常检测作为入侵检测的一种重要技术手段,因为可以对未知攻击进行检测而被受关注,支持向量机(SVM)作为典型有效的异常检测算法,具有较好的分类效果。但样本数据高维的特性和噪声数据,严重的影响了SVM的分类效率。为了对检测模型进行优化,降低复杂度,本文的研究工作针对ReliefF特征选择算法展开,对特征选择的基本概念,典型步骤,应用研究现状等进行了介绍。并对特征选择和入侵检测的研究现状和基本概念进行了简单介绍,包括对入侵检测模型和基本流程的说明,并对入侵检测进行归纳和分类。同时根据入侵检测存在的缺点,介绍了未来入侵检测的发展方向。 论文研究了ReliefF特征选择算法及其在入侵检测领域的应用,,提出应用方法和入侵特征的映射关系,并结合入侵检测数据集中网络数据相似高的特性,进一步提出了针对传统ReliefF应用于入侵检测领域的改进算法Re-ReliefF,改进主要针对特征权重计算方法结合入侵检测数据特征进行了优化。 为了获得更好的时间复杂度检测效果,文中应用特征选择算法对SVM分类算法的数据进行选择处理,实验结果显示,改进后的Re-ReliefF算法在性能各方面相对于ReliefF算法都有所提高,经过Re-ReliefF算法处理后的数据集,对于SVM的分类效果(即检测率)影响不大,却可节省大量检测时间。
[Abstract]:The emergence of the Internet not only provides a good platform for people to communicate and learn from each other, but also brings a large number of threats to us. Hackers undermine the security of information systems through various means. How to effectively protect the integrity, availability and privacy of information has become a common challenge for all industries. As an active defense technology, intrusion detection technology has made important achievements in ensuring the security of network information system. However, as the information society enters the era of big data, the information is transformed from the original single attribute or the low-dimensional attribute to the high-dimensional attribute, and the data presents the characteristics of large scale, high dimension, high noise and high complexity, resulting in the phenomenon of "dimensionality disaster". It presents a severe challenge to the computing power and real-time requirement of the traditional intrusion detection system. As an important technique of intrusion detection, anomaly detection has attracted much attention because of its ability to detect unknown attacks. As a typical and effective anomaly detection algorithm, support vector machine (SVM) has a good classification effect. However, the high dimensional characteristics of sample data and noise data seriously affect the classification efficiency of SVM. In order to optimize the detection model and reduce the complexity, the research work in this paper is focused on ReliefF feature selection algorithm. The basic concept, typical steps and application status of feature selection are introduced. The research status and basic concepts of feature selection and intrusion detection are briefly introduced, including the description of intrusion detection model and basic process, and the induction and classification of intrusion detection. At the same time, according to the shortcomings of intrusion detection, the development direction of intrusion detection in the future is introduced. This paper studies ReliefF feature selection algorithm and its application in the field of intrusion detection, proposes the mapping relationship between application method and intrusion feature, and combines the characteristics of high similarity of network data in intrusion detection data set. Furthermore, an improved algorithm Re-ReliefFis for traditional ReliefF application in intrusion detection is proposed. The improved algorithm is mainly aimed at the feature weight calculation method combined with intrusion detection data feature optimization. In order to obtain better time complexity detection effect, the feature selection algorithm is used to select and process the data of SVM classification algorithm. The experimental results show that the improved Re-ReliefF algorithm improves the performance of the improved Re-ReliefF algorithm compared with the ReliefF algorithm. The data set processed by Re-ReliefF algorithm has little effect on the classification effect of SVM (that is, detection rate), but it can save a lot of detection time.
【学位授予单位】:新疆大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
本文编号:2129329
[Abstract]:The emergence of the Internet not only provides a good platform for people to communicate and learn from each other, but also brings a large number of threats to us. Hackers undermine the security of information systems through various means. How to effectively protect the integrity, availability and privacy of information has become a common challenge for all industries. As an active defense technology, intrusion detection technology has made important achievements in ensuring the security of network information system. However, as the information society enters the era of big data, the information is transformed from the original single attribute or the low-dimensional attribute to the high-dimensional attribute, and the data presents the characteristics of large scale, high dimension, high noise and high complexity, resulting in the phenomenon of "dimensionality disaster". It presents a severe challenge to the computing power and real-time requirement of the traditional intrusion detection system. As an important technique of intrusion detection, anomaly detection has attracted much attention because of its ability to detect unknown attacks. As a typical and effective anomaly detection algorithm, support vector machine (SVM) has a good classification effect. However, the high dimensional characteristics of sample data and noise data seriously affect the classification efficiency of SVM. In order to optimize the detection model and reduce the complexity, the research work in this paper is focused on ReliefF feature selection algorithm. The basic concept, typical steps and application status of feature selection are introduced. The research status and basic concepts of feature selection and intrusion detection are briefly introduced, including the description of intrusion detection model and basic process, and the induction and classification of intrusion detection. At the same time, according to the shortcomings of intrusion detection, the development direction of intrusion detection in the future is introduced. This paper studies ReliefF feature selection algorithm and its application in the field of intrusion detection, proposes the mapping relationship between application method and intrusion feature, and combines the characteristics of high similarity of network data in intrusion detection data set. Furthermore, an improved algorithm Re-ReliefFis for traditional ReliefF application in intrusion detection is proposed. The improved algorithm is mainly aimed at the feature weight calculation method combined with intrusion detection data feature optimization. In order to obtain better time complexity detection effect, the feature selection algorithm is used to select and process the data of SVM classification algorithm. The experimental results show that the improved Re-ReliefF algorithm improves the performance of the improved Re-ReliefF algorithm compared with the ReliefF algorithm. The data set processed by Re-ReliefF algorithm has little effect on the classification effect of SVM (that is, detection rate), but it can save a lot of detection time.
【学位授予单位】:新疆大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
【参考文献】
相关期刊论文 前4条
1 田俊峰;黄红艳;常新峰;;特征选择的轻量级入侵检测系统[J];计算机工程与应用;2009年04期
2 张丽新;王家钦;赵雁南;杨泽红;;机器学习中的特征选择[J];计算机科学;2004年11期
3 李东灵;王健;;入侵检测系统研究现状及发展趋势[J];商丘职业技术学院学报;2013年05期
4 陈波;于泠;吉根林;;基于条件信息熵的网络攻击特征选择技术[J];小型微型计算机系统;2008年03期
相关博士学位论文 前1条
1 齐滨;高光谱图像分类及端元提取方法研究[D];哈尔滨工程大学;2012年
本文编号:2129329
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2129329.html