基于半监督学习的网络业务流量识别方法研究
发布时间:2018-03-21 13:02
本文选题:半监督 切入点:特征选择 出处:《东南大学》2017年硕士论文 论文类型:学位论文
【摘要】:本学位论文的工作基于国家电网公司科技项目"电力信息通信网络流量预测和管道智能化关键技术研究及其应用"项目,以面向业务的流量识别与感知为研究目标,主要内容为基于半监督学习的网络业务流量识别方法研究。针对网络流量具有较多特征属性的特点,通过研究特征选择算法,提出了一种基于信息度量的半监督特征选择算法(IMSFS,Information Measure based Semi-supervised Feature Selection);针对 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)算法难以确定输入参数和数据集含有大量重复数据等问题,提出了一种改进的DBSCAN算法;还将IMSFS算法与改进的DBSCAN算法结合,提出了一种半监督DBSCAN(SDBSCAN,Semi-supervised DBSCAN)流量识别方法。在 Moore 数据集上的验证结果表明,本文提出的SDBSCAN流量识别方法在包含少量已标记样本的流量集中具有较高的分类准确率。全文共分为五章,各章节的主要内容为:第一章概述课题研究背景及研究目的,分析了流量识别的发展趋势和需要解决的关键问题,同时给出本论文的章节安排。第二章概述了流量识别,分析了半监督学习依赖的假设和半监督学习的分类等基本知识,为后续的研究奠定基础。第三章针对网络流量具有较多特征属性的特点,通过研究特征选择算法,提出了一种基于信息度量的半监督特征选择算法(IMSFS)并在Moore数据集上进行了验证。第四章针对DBSCAN算法难以确定输入参数和数据集含有大量重复数据等问题,提出了一种改进的DBSCAN算法;还将IMSFS算法与改进的DBSCAN算法结合,提出了一种半监督DBSCAN流量识别方法并在Moore数据集上进行了验证。第五章总结本学位论文的研究工作,并指出进一步研究方向。
[Abstract]:The work of this dissertation is based on the project of State Grid Corporation "Research and Application of key Technologies for Traffic Prediction and Pipeline Intelligence in Power Information and Communication Networks", which aims at traffic identification and perception oriented to business. The main content is the research of network traffic identification method based on semi-supervised learning. In this paper, a semi-supervised feature selection algorithm based on information metric is proposed, and an improved DBSCAN algorithm is proposed to solve the problem that DBSCAN(Density-Based Spatial Clustering of Applications with Noisealgorithm is difficult to determine the input parameters and that the data set contains a large number of repetitive data. By combining the IMSFS algorithm with the improved DBSCAN algorithm, a semi-supervised DBSCANN semi-supervised DBSCANs traffic identification method is proposed. The verification results on the Moore dataset show that, The SDBSCAN traffic identification method proposed in this paper has a high classification accuracy in traffic concentration containing a small number of labeled samples. The whole paper is divided into five chapters. The main contents of each chapter are as follows: chapter 1 summarizes the background and purpose of the research. The development trend of traffic identification and the key problems to be solved are analyzed, and the chapter arrangement of this paper is given. In chapter 2, the basic knowledge of traffic identification is summarized, and the hypothesis of semi-supervised learning dependence and the classification of semi-supervised learning are analyzed. The third chapter aims at the characteristics of network traffic with more feature attributes, and studies the feature selection algorithm. A semi-supervised feature selection algorithm based on information metric is proposed and validated on the Moore dataset. Chapter 4th aims at the problem that the DBSCAN algorithm is difficult to determine the input parameters and the data set contains a large number of repeated data. An improved DBSCAN algorithm is proposed, and a semi-supervised DBSCAN traffic identification method is proposed by combining the IMSFS algorithm with the improved DBSCAN algorithm, which is validated on the Moore dataset. Chapter 5th summarizes the research work of this dissertation. The further research direction is pointed out.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TM73
【参考文献】
相关期刊论文 前10条
1 姜羽;华俊;胡静;宋铁成;刘世栋;郭经红;;电力信息通信网络流量识别技术研究[J];信息化研究;2015年01期
2 刘建伟;刘媛;罗雄麟;;半监督学习方法[J];计算机学报;2015年08期
3 汤忆则;马平;高钧利;张磊;项肖峰;;基于智能电网的一体化信息通信网络发展研究[J];数字通信;2014年04期
4 李双庆;慕升弟;;一种改进的DBSCAN算法及其应用[J];计算机工程与应用;2014年08期
5 黎运发;黄名辉;;核密度估计逐点最优窗宽选择的改进[J];统计与决策;2011年14期
6 顾海林;申扬;;业务识别与流量管理在电力信息化中的应用[J];电力信息化;2010年12期
7 尤鸣宇;陈燕;李国正;;不均衡问题中的特征选择新算法:Im-IG[J];山东大学学报(工学版);2010年05期
8 赵国锋;吉朝明;徐川;;Internet流量识别技术研究[J];小型微型计算机系统;2010年08期
9 徐鹏;林森;;基于C4.5决策树的流量分类方法[J];软件学报;2009年10期
10 夏鲁宁;荆继武;;SA-DBSCAN:一种自适应基于密度聚类算法[J];中国科学院研究生院学报;2009年04期
,本文编号:1643964
本文链接:https://www.wllwen.com/kejilunwen/dianlidianqilunwen/1643964.html