基于分布式流数据库系统的网络入侵检测

发布时间：2018-05-04 04:46

本文选题：特征选择 + SVM　；参考：《电子科技大学》2015年硕士论文

【摘要】：随着互联网的高速发展,安全性是越来越重要的一个话题。传统的网络安全是针对个人用户和企业用户,其使用的主要技术包括系统入侵检测、防病毒软件和防火墙。但这些安全措施通常并不能减少大规模通信网络(即骨干网络)中的非正常流量。为了从根本上降低网络中的异常流量,减少或消除用户所遭受的各类攻击,大规模通信网络与路由交换设备必须具备异常流量的检测与识别能力。流量的异常操作通常有如下两种判断方式:a)判断是否存在异常流量,这称之为流量监测;b)判断流量异常的类型,这称之为流量识别。目前流量监测按照检测的粒度主要分为三种类型,分别是:基于package、基于flow、基于traffic。本文提出了一种更细粒度的、基于动态session window来聚合IP Flow筛选特征的算法,并结合SVM算法来检测DoS攻击。同时,为了支持筛选特征的计算操作,本文扩展了Spark Stream,使其支持Stream上的SQL查询操作。在论文的研究过程中,对现有特征的选择算法、Spark Stream、Hive工作原理以及SVM核函数的选取进行了充分调研,并深入了解了目前的流量检测。首先,传统的基于熵的特征选取算法,是把不同的IP源聚合在一起来计算熵信息。这样的实现方式存在一定的缺陷,当异常发生的时候,还要再进一步分析才能知道攻击源,被攻击目标。论文根据sessionkey(srcIP,desIP,srcPort,desPort)来聚合不同的flow数据记录,进而获取网络流数据的信息熵作为训练特征来解决该问题。另外目前的研究表明,异常流量占总流量的比率和检测效果存在正相关,即当异常流量占比很低的时候,检测效果一般也很差。本文提出的session window的方式很好的解决了这个问题。最后,面对瞬间产生的大量的数据集,目前缺少主要的底层计算模型的支持,而且在异常检测算法方面也不够高效,因此本文在Spark Stream的基础上进行了扩展,支持Stream上面的SQL操作,并且支持连续查询和窗口操作。最后,本文对提出的特征选取算法进行测试,与传统的ID3和C4.5算法进行性能对比。对于特征选择结果好坏的判断,最直接有效的评估标准是比较算法所选择的特征子集与最优特征子集的相似度。但在实际应用中,最优特征子集没有评估标准。因此,为了验证特征选择算法的有效性,本文使用一种间接的验证方法,即通过所选择的特征子集在One-class SVM分类算法中的AUC指标来衡量特征选择的好坏。另外本文模拟了异常流量所占窗口总流量的不同比例,来说明基于session window的特征选择算法在不同的异常流量比例下都很稳定,同时实验的结果也表明本文基于Spark Stream的SQL扩展,工作良好,能很好的完成计算需求。
[Abstract]:With the rapid development of the Internet, security is an increasingly important topic. Traditional network security is aimed at personal and enterprise users. The main technologies used include system intrusion detection, antivirus software and firewall. However, these security measures usually do not reduce abnormal traffic in large scale communication networks (i.e. backbone networks). In order to fundamentally reduce the abnormal traffic in the network and reduce or eliminate all kinds of attacks suffered by users, large-scale communication networks and routing switching devices must have the ability to detect and identify abnormal traffic. There are usually two ways: a) to judge whether there is an abnormal flow, which is called flow monitor to judge the type of abnormal flow, which is called traffic identification. At present, traffic monitoring is divided into three types according to the granularity of detection, which are package-based, flow-based, traffic-based. In this paper, we propose a finer grained algorithm to aggregate IP Flow filtering features based on dynamic session window, and combine SVM algorithm to detect DoS attacks. At the same time, in order to support the computing operation of filtering features, the Spark Stream is extended to support the SQL query operation on Stream. In the research process of this paper, the principle of the existing feature selection algorithm, Spark Stream Hive, and the selection of the SVM kernel function are fully investigated, and the current traffic detection is deeply understood. Firstly, the traditional feature selection algorithm based on entropy aggregates different IP sources to calculate entropy information. In this paper, different flow data records are aggregated according to session key / srcIP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP / IP networks. In addition, the current research shows that the ratio of abnormal flow to total flow is positively correlated with the detection effect, that is, when the proportion of abnormal flow is very low, the detection effect is generally poor. The session window method proposed in this paper solves this problem very well. Finally, in the face of a large number of data sets generated in an instant, the main underlying computing model is lacking at present, and the algorithm of anomaly detection is not efficient enough. Therefore, this paper extends on the basis of Spark Stream. Supports SQL operations above Stream, and supports continuous queries and window operations. Finally, the proposed feature selection algorithm is tested and compared with the traditional ID3 and C4.5 algorithms. The most direct and effective criterion for judging the result of feature selection is to compare the similarity between the feature subset selected by the algorithm and the optimal feature subset. However, in practical application, the optimal feature subset has no evaluation criteria. Therefore, in order to verify the effectiveness of the feature selection algorithm, this paper uses an indirect verification method, that is, the feature selection is evaluated by the AUC index of the selected feature subset in the One-class SVM classification algorithm. In addition, this paper simulates the different proportion of the abnormal traffic in the window total traffic, to show that the feature selection algorithm based on session window is very stable under the different abnormal traffic ratio. At the same time, the experimental results also show that the SQL extension based on Spark Stream in this paper. Work well, can complete the calculation requirements well.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP311.13;TP393.08

【相似文献】