基于Spark平台的恶意流量监测分析系统

发布时间：2018-05-07 07:37

本文选题：Netflow + Spark　；参考：《北京交通大学》2016年硕士论文

【摘要】：在DDos监测的研究方面,各种模型被提出来,但是都是针对某些特定领域特定场景下的,普适性不够；而在大数据平台方面,也涌现出了很多好的平台,比如Hadoop, Spark等,但并不直接支持恶意流量监测：在Netflow的现有工具及现有研究中,对于恶意流量监测也有部分工具,但不支持DDos监测。如果能将技术研究与技术平台的优势结合起来,对于DDos的防御也许会更有效。本文的成果在于：第一,提出了一个基于Spark平台的网络恶意流量监测系统模型,重点监测DDos攻击,拟定了相关的原则。提出了基于线性拟合的特征选择方法,基于此特征改进了基于机器学习的检测算法。第二,搭建了基于Spark平台的网络恶意流量监测平台,包括Hadoop平台与Spark平台。第三,在Spark平台及Hadoop平台,分别实现了上述四种改进算法,并进行了实验比较,选出了最优算法并进行了解释。机器学习算法在特征选取时,常常只独立考虑相关属性,这无法反映一些相关性。我们提出将请求流量与服务流量进行线性拟合并将残差作为特征,同时考虑到全面性,将平均包数与平均包大小也作为特征,这样就可以在更大程度上模拟正常流量的特征。基于新的特征,本文改进了基于机器学习的四个算法,分别是kmeans,决策树,贝叶斯学习和P枷。利用Spark支持的机器学习算法接口,我们开发了相应的算法。其中,聚类算法kmeans在确定簇类中心点中尝试了不同的方法。本文分析了机器学习算法对于恶意流量检测的可用性,分析了Spark平台对于机器学习算法的实用性,结合已有的恶意流量检测方法以及上述算法,设计并实现了一个恶意流量检测平台,可以针对蠕虫,木马,僵尸网络,DDos攻击等进行比较全面的检测。用户在使用时,可根据时间的要求选择不同的模式。本论文在基于线性拟合的特征提取方法的基础上,对改进的算法进行了实验比较及分析。
[Abstract]:In the research of DDos monitoring, a variety of models have been proposed, but they are not universal enough for some specific fields and specific scenarios. In the big data platform, many good platforms have emerged, such as Hadoop, Spark, etc. But it does not directly support malicious traffic monitoring: in the existing tools and existing research of Netflow, there are some tools for malicious traffic monitoring, but do not support DDos monitoring. A combination of technology research and the advantages of a technology platform might be more effective against DDos. The main achievements of this paper are as follows: first, a network malicious traffic monitoring system model based on Spark platform is proposed, which focuses on monitoring DDos attacks and formulates relevant principles. A feature selection method based on linear fitting is proposed, and the detection algorithm based on machine learning is improved based on this feature. Secondly, a network malicious traffic monitoring platform based on Spark platform is built, including Hadoop platform and Spark platform. Thirdly, the above four improved algorithms are implemented on Spark platform and Hadoop platform, and the experimental results are compared, and the optimal algorithm is selected and explained. In feature selection, machine learning algorithms often only consider correlation attributes independently, which can not reflect some correlations. We propose a linear combination of request traffic and service traffic with residuals as features, and the average number of packets and average packet size are also considered as features, so that the characteristics of normal traffic can be simulated to a greater extent. Based on the new features, this paper improves four algorithms based on machine learning, namely, kmeans, decision tree, Bayesian learning and P-flail. Using the machine learning algorithm interface supported by Spark, we develop the corresponding algorithm. Among them, the clustering algorithm kmeans has tried different methods in determining the cluster center point. This paper analyzes the availability of machine learning algorithm for malicious traffic detection, and the practicability of Spark platform for machine learning algorithm, combined with the existing malicious traffic detection methods and the above algorithms. A malicious traffic detection platform is designed and implemented, which can detect worm, Trojan horse, botnet DDos attack and so on. Users in use, according to the requirements of time to choose different modes. Based on the feature extraction method based on linear fitting, the improved algorithm is compared and analyzed experimentally in this paper.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP393.06

【相似文献】