基于机器学习和统计分析的DDoS攻击检测技术研究

发布时间：2018-06-03 05:36

本文选题：多元降维分析 + 随机森林　；参考：《北京邮电大学》2017年博士论文

【摘要】：随着计算机与通信技术的快速发展,以及当前“互联网+”时代背景下,云计算、物联网、移动互联网和大数据等信息技术的兴起与蓬勃发展,分布式拒绝服务(DistributedDenialofService, DDoS)攻击已经成为导致信息网络环境最不稳定的因素之一;同时,伴随着近年来僵尸网络的盛行,DDoS攻击带来的危害更是日趋严重。由于DDoS攻击的危害性大,每次发生重大攻击事件波及范围广,因此,DDoS攻击检测始终是信息与网络安全领域一个非常重要的研究课题。然而,一些已有的研究工作仍然存在如下一些问题,如:1)保证了检测率(DetectionRate,DR)等指标,却牺牲了检测时间,且资源消耗大;2)不能较好地兼顾攻击检测的DR、正确率(Accuracy)、精确率(Precision)和假正率(False Positive Rate, FPR)等。鉴于此,本文旨在利用当前较为流行的机器学习与数据挖掘、统计分析等相关理论方法和技术,根据DDoS攻击的特点以及对攻击流量中各字段的不同属性特征进行提取、分析,以求对互联网中大流量的DDoS攻击进行实时、高效、准确的检测。本文的主要贡献和创新点包括如下几个方面:(1)针对当前大数据时代的大流量攻击行为检测,尤其是在DDoS攻击实时检测方面效果较差等一系列问题,我们以统计分析中的多元统计分析、相关性统计分析和机器学习中的主成分分析(Principal Component Analysis, PCA)为理论基础,研究并设计了一种基于多元降维分析(Multivariate Dimensionality Reduction Analysis, MDRA)算法的实时攻击检测(Real-time Attack Detection, RTAD)方法。该方法通过对网络流量属性特征字段降维处理并消除相关性,旨在解决互联网中大流量DDoS攻击的实时检测问题。在经过实验数据预处理和实验验证后,得到如下结论:RTAD方法在Precision和真负率(True Negative Rate, TNR)两项评价指标中均要优于基于多元相关性分析(Multivariate Correlation Analysis, MCA)算法的攻击检测方法;在CPU计算时间和内存消耗等方面,RTAD方法也有着明显的优势。(2)针对传统DDoS攻击集中式和准分布式检测方法无法实现协同式检测的目的,而且可扩展性差,部署困难等一系列问题,本文研究了一种基于组合分类器的DDoS攻击随机森林分布式检测(Random ForestDistributionDetection,RFDD)模型。该模型的核心部分采用的是机器学习中应用非常广泛的集成学习方法,即组合分类器的随机森林方法,并将集成学习中的随机森林算法和分布式并行计算框架相结合,通过对攻击流量中不同属性字段进行降噪声和消除相关性,以达到对其准确检测的目的。RFDD模型拓展性好,能够适应网络环境中异常监测的动态调整与部署。通过实验验证得出如下结论:本研究所采用的RFDD模型无论是在DR、Accuracy、Precision还是在FPR方面均要优于Adaboost方法,并且在取不同阈值时,RFDD模型在上述四项指标方面均能保持较好的稳定性。(3)针对已有的基于同构分类器的DDoS攻击检测模型的泛化能力和稳定性较差等一系列问题,本文研究了一种基于奇异值分解(Singular Value Decomposition, SVD)和 Rotation Forest 集成策略的异构多分类器集成学习(Heterogeneous Multi-classifier Ensemble Learning,HMEL)检测模型。该模型主要包括三个模块,即数据集预处理模块、异构多分类器检测模块和分类结果获取模块。HMEL检测模型能够对网络流量的不同属性字段进行去冗余和消除相关性。通过理论分析可以得出:该模型具有更强的泛化能力和普适性;通过与经过SVD处理和未经过SVD处理的随机森林、k-NN以及Bagging等著名机器学习算法所构成的同构分类检测器进行实验对比后,得出如下结论:HMEL检测模型在TNR、Accuracy和Precision方面接近于随机森林和Bagging,并且完全优于k-NN;同时,随着不同阈值的选取,k-NN的TNR、Accuracy和Precision均呈现出不稳定性。因此,该模型不但具有较强的检测能力,而且稳定性好。综上所述,本文以机器学习和统计分析的相关理论方法为基础,本着对网络流量属性特征“去冗余”、“降噪声”、“消除相关性”的三大原则,为解决DDoS攻击检测中的实时、分布式、准确检测以及通过具有较强泛化能力和稳定性的异构集成分类检测模型进行检测,做出了一系列积极探索和深入研究,并得出了一些具有显著优势的实验结果,从而为推动相关理论方法的进一步研究以及未来在不同场景中的应用,做出了一些有价值的工作。
[Abstract]:With the rapid development of computer and communication technology, and the current "Internet plus" era, cloud computing, Internet of things, the rise of mobile Internet and big data and other information technology and flourishing, distributed denial of service (DistributedDenialofService, DDoS) attacks have become the factors leading to the information network environment the most unstable. At the same time, with the prevalence of zombie network in recent years, the harm caused by DDoS attack is becoming more and more serious. Because of the great harm of DDoS attack and a wide range of major attacks each time, DDoS attack detection is always an important research subject in the field of information and network security. However, some existing research workers have been studied. There are still some problems as follows, such as: 1) guaranteed the detection rate (DetectionRate, DR) and other indicators, but sacrificed the detection time, and the resource consumption is large; 2) can not better take into account the attack detection DR, the accuracy rate (Accuracy), the accuracy rate (Precision) and false positive rate (False Positive Rate, FPR). In view of this, this article aims to make use of the current popular Machine learning and data mining, statistical analysis and other relevant theoretical methods and techniques, according to the characteristics of the DDoS attack and the characteristics of the different properties of the field in the attack flow, analysis, in order to carry out real-time, efficient and accurate detection of large traffic DDoS attacks in the Internet. The main contributions and innovation points of this paper include the following Several aspects: (1) aiming at a series of problems such as the detection of large traffic attack behavior in the large data age, especially in the real time detection of DDoS attack, we take the multivariate statistical analysis, the correlation statistical analysis and the principal component analysis (Principal Component Analysis, PCA) in the machine learning as the theoretical basis. A real-time attack detection (Real-time Attack Detection, RTAD) method based on the Multivariate Dimensionality Reduction Analysis (MDRA) algorithm is studied and designed. This method is designed to reduce the dimension of the network traffic attribute feature field and eliminate the phase correlation. This method is designed to solve the real traffic DDoS attack in the Internet. After the experimental data preprocessing and experimental verification, the following conclusions are obtained: the RTAD method is superior to the multiple correlation analysis (Multivariate Correlation Analysis, MCA) algorithm based on the two evaluation indexes of Precision and true negative (True Negative Rate, TNR), and in CPU computing time and memory. The RTAD method also has obvious advantages. (2) in view of the traditional DDoS attack centralized and quasi distributed detection methods can not achieve the purpose of cooperative detection, and the scalability, deployment difficulties and other problems, this paper studies a DDoS attack random forest distributed detection based on combiner classifiers (Random ForestD). IstributionDetection, RFDD) model. The core part of the model uses a very wide range of integrated learning methods in machine learning, that is, the random forest method of combining classifier, and combines the random forest algorithm in integrated learning with the distributed parallel computing framework to reduce the different attribute fields in the attack traffic. Noise and elimination of correlation in order to achieve the purpose of accurate detection of the.RFDD model is good expansibility and can adapt to the dynamic adjustment and deployment of abnormal monitoring in the network environment. Through experimental verification, the following conclusions are drawn: the RFDD model used in this study is better than the Adaboost method in DR, Accuracy, Precision or FPR. And when taking different thresholds, the RFDD model can maintain good stability in the above four indexes. (3) a series of problems, such as the generalization ability and poor stability of the existing DDoS attack detection model based on the isomorphism classifier, are studied in this paper, which is based on the singular value decomposition (Singular Value Decomposition, SVD) and Rotation Forest integration strategy for heterogeneous multiple classifier integrated learning (Heterogeneous Multi-classifier Ensemble Learning, HMEL) detection model, which mainly includes three modules, the data set preprocessing module, the heterogeneous classifier detection module and the classification result acquisition module.HMEL detection model to the network traffic of different attribute fields. Through theoretical analysis, it can be concluded that the model has stronger generalization ability and universality, and by comparing with the isomorphic classifier made up of famous machine learning algorithms such as SVD and untreated random forests, k-NN and Bagging, the following conclusions are drawn: HMEL The detection model is close to the random forest and Bagging in TNR, Accuracy and Precision, and is better than k-NN. At the same time, the TNR, Accuracy and Precision of k-NN are unstable with the selection of different thresholds. Therefore, the model not only has strong detection ability, but also has good stability. In summary, this paper is based on machine learning and Based on the theory and method of statistical analysis, in line with the three principles of "redundancy", "noise reduction" and "eliminating correlation" on network traffic attributes, it can be used to detect real-time, distributed, accurate detection and heterogeneous integrated classification detection model with strong generalization ability and stability in DDoS attack detection. A series of positive and in-depth studies have been made, and some experimental results with significant advantages have been obtained, and some valuable work has been made to promote the further research of the relevant theoretical methods and the future application in different scenes.
【学位授予单位】：北京邮电大学
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TP393.08;TP181

【参考文献】