网络安全态势感知中非均衡数据异常分类的研究

发布时间：2018-05-03 14:23

本文选题：安全态势感知 + 非均衡　；参考：《天津理工大学》2014年硕士论文

【摘要】：现在的网络安全问题已经是一个非常严重的问题，如何能够有效及时的发现网络攻击，预防网络攻击具有非常重要的意义，现有的网络安全技术已经难以满足网络管理。而基于融合技术的网络安全态势感知技术必然成为网络管理的发展方向。网络安全态势感知是应用数据融合的方法，将来自不同安全检测工具的报警信息进行融合来分析当前网络的安全状况，并根据当前的状态预测下一步网络将会受到的攻击行为。网络非均衡数据异常分类作为网络安全态势感知最重要的一个环节，为安全态势提供非常重要的安全信息和决策。它运用到的技术包括数据挖掘技术、融合技术以及可视化等技术。本文主要运用数据挖掘相关技术，对整个网络安全态势感知中非均衡数据异常分类进行研究，这些数据是基于时间和主机的网络流量统计，如何实现高效准确的网络非均衡数据异常分类是网络安全所面临的一个严峻的挑战。为了解决这个问题，本文针对网络数据的特点做了如下工作：（1）通过分析传统的网络数据异常分类模型，结合数据的特点，针对异常分类系统存在的两个问题在数据预处理阶段做出改进：一是数据属性冗余和属性权重问题，运用粗集理论对各个属性赋予权重并进行属性约减；二是粗集理论中连续数据离散化问题，提出了针对数据特点的自适应离散化算法，该算法是根据属性值分布来确定离散间隔。实验表明该算法相比其他算法提高了异常分类的准确率，而且减少了断点数和剩余条件属性个数，减少了空间维数，提高了异常分类的效率。（2）在异常分类的阶段，本文针对新异常分类问题和非均衡数据提出了解决办法。随着时间的推移、技术的进步，网络中会不断出现新的异常类，针对这一问题提出了实时更新异常模型来解决新异常分类问题。另外一个问题就是网络中具体异常行为相对正常行为较低，导致数据分布非均衡，这样对网络具体异常分类效率比较低。本文针对这一问题提出先用单分类器，来处理正常数据和异常数据的分类，当出现少数异常数据的时候再用快速最近邻分类器进行分类，，这样在大部分时间内是单分类器在工作，大大减少了工作量提高了效率。（3）基于以上提出的方法，应用经典的KDD99数据完成算法的仿真实验，实验对比了其他相应的算法。实验结果证明本文提出的算法高效性和准确性。
[Abstract]:Now the network security problem is a very serious problem, how to find the network attack effectively and timely, prevent the network attack has a very important significance, the existing network security technology has been difficult to meet the network management. The technology of network security situation awareness based on fusion technology is bound to become the development direction of network management. Network security situational awareness (NSAS) is a method of data fusion, which combines the alarm information from different security detection tools to analyze the current network security situation, and predicts the next attack behavior of the network according to the current state. As the most important link of network security situation awareness, network disequilibrium data anomaly classification provides very important security information and decision-making for security situation. The technologies used include data mining, fusion and visualization. This paper mainly uses data mining technology to study the abnormal classification of unbalanced data in the whole network security situation awareness. These data are based on time and host network traffic statistics. How to realize efficient and accurate abnormal classification of network disequilibrium data is a severe challenge to network security. In order to solve this problem, this paper has done the following work according to the characteristics of network data: 1) by analyzing the traditional network data anomaly classification model and combining the characteristics of the data, two problems existing in the anomaly classification system are improved in the data preprocessing stage: one is the data attribute redundancy and the attribute weight problem, the other is the data attribute redundancy and attribute weight. The rough set theory is used to give weight to each attribute and reduce the attribute. Secondly, the discretization problem of continuous data in rough set theory is discussed, and an adaptive discretization algorithm is proposed to deal with the characteristics of the data. The algorithm is based on the distribution of attribute values to determine the discrete interval. Experiments show that compared with other algorithms, the algorithm improves the accuracy of anomaly classification, reduces the number of breakpoints and the number of residual attributes, reduces the spatial dimension and improves the efficiency of anomaly classification. 2) in the phase of abnormal classification, this paper proposes a solution to the problem of new abnormal classification and unbalanced data. With the development of technology and time, new abnormal classes will appear in the network. To solve this problem, a real-time update anomaly model is proposed to solve the problem of new exception classification. Another problem is that the specific abnormal behavior in the network is relatively low, which leads to the disequilibrium of the data distribution, so the classification efficiency of the network specific anomalies is relatively low. In order to solve this problem, a single classifier is proposed to deal with the classification of normal and abnormal data first, and then a fast nearest neighbor classifier is used to classify the abnormal data when there are a few abnormal data. In this way, the single classifier is working for most of the time, which greatly reduces the workload and improves the efficiency. Based on the above method, the simulation experiment of the algorithm is completed by using the classical KDD99 data, and the other algorithms are compared. The experimental results show that the proposed algorithm is efficient and accurate.
【学位授予单位】：天津理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08

【参考文献】