对等网络流量识别技术的研究

发布时间：2018-05-01 23:26

本文选题：对等网 + 流量识别　；参考：《曲阜师范大学》2014年硕士论文

【摘要】：对等网络特有的资源共享方式，使得P2P流量增长迅速。P2P技术已经应用到互联网服务的各个领域，主要包括文件共享、流媒体播放、分布式计算、游戏娱乐等。事实表明，P2P流量已经占用了大部分带宽，甚至造成了网络拥塞；并且，由于P2P应用的广泛性和隐蔽性，使得不少非法节点产生的恶意流量加剧了带宽的消耗，甚至出现拒绝服务攻击。因此，精确高效地识别P2P流量成为对其监督和控制的一大关键问题，对于保障互联网安全具有重要意义。本文详细分析了几类P2P流量识别方法，如端口识别方法通过验证端口号来完成P2P流量识别；深度数据包识别方法根据匹配负载特征来识别P2P流量；行为特征识别方法依据提取到的流量特征来识别P2P应用；机器学习和概率统计识别法通过对样本的统计学习得到分类器，使用分类器来对P2P流量进行精确识别。在上述识别方法的基础上，深入研究了行为特征识别法，提出了两种新的流量行为特征分析方法，使得识别的精确度得以提升；并且根据对机器学习和概率统计识别方法的深入分析，在云计算环境下提出并实现了解决单机环境下处理大数据集问题的解决方案，主要工作如下： (1)由于P2P软件普遍采用动态端口以及负载加密技术，使得基于传输层端口和深度包检测技术的P2P网络流量识别方法受到限制。通过对P2P流量的分析发现其具有两种特性：一是P2P节点具有双面性特征，，即P2P节点可以同时上传下载数据；二是P2P流量的正向流与反向流包到达时间间隔方差比始终在一定区间内波动。由此提出基于节点及流量行为特征的P2P流量识别方法，并将其应用于网络流量监测中。实验表明：该方法可识别新应用及加密流量，其流识别率为93%，字节识别率为95.5%。 (2)由于内存限制使得单机环境下的P2P流量识别方法只能对小规模数据集进行处理，并且基于朴素贝叶斯分类的识别方法所使用的属性特征均为人工选择，因此，识别率受到了限制并且缺乏客观性。基于对以上问题的分析，提出了云计算环境下的朴素贝叶斯分类算法并改进了在云计算环境下属性约简算法，结合这两个算法实现了对加密P2P流量的细粒度识别。实验结果表明该方法可以高效处理大数据集网络流量，并且有很高的P2P流量识别率，结果也具备客观性。
[Abstract]:Peer-to-peer network resource sharing makes P2P traffic grow rapidly. P2P technology has been applied to various fields of Internet services, including file sharing, streaming media play, distributed computing, game entertainment and so on. The fact shows that P2P traffic has occupied most of the bandwidth and even caused network congestion. Moreover, due to the universality and concealment of P2P applications, the malicious traffic generated by many illegal nodes has increased the bandwidth consumption. There is even a denial of service attack. Therefore, accurate and efficient identification of P2P traffic becomes a key issue in monitoring and control of P2P traffic, and it is of great significance to ensure Internet security. In this paper, several kinds of P2P traffic identification methods are analyzed in detail, such as port identification method to verify port number to complete P2P traffic identification, depth packet identification method to identify P2P traffic according to matching load characteristics. Behavior feature recognition method identifies P2P applications according to extracted traffic features. Machine learning and probabilistic statistical identification method obtain classifiers through statistical learning of samples and use classifiers to identify P2P traffic accurately. On the basis of the above identification methods, the behavior feature recognition method is deeply studied, and two new traffic behavior feature analysis methods are proposed, which can improve the accuracy of identification. Based on the in-depth analysis of machine learning and probabilistic statistical identification methods, a solution to the big data set problem in a single computer environment is proposed and implemented in the cloud computing environment. The main work is as follows: Because P2P software generally uses dynamic port and load encryption technology, P2P network traffic identification method based on transport layer port and depth packet detection technology is limited. Based on the analysis of P2P traffic, it is found that P2P nodes have two characteristics: one is that P2P nodes can upload and download data at the same time; The other is that the variance ratio of the arrival time interval between the forward flow and the reverse flow always fluctuates in a certain range. A P2P traffic identification method based on node and traffic behavior is proposed and applied to network traffic monitoring. The experimental results show that this method can recognize new applications and encrypted traffic. The recognition rate of stream is 933 and the rate of byte recognition is 95.55. 2) because of memory limitation, P2P traffic identification method in single computer environment can only deal with small-scale data sets, and the attribute features used in the recognition method based on naive Bayesian classification are all manually selected. Recognition rates are limited and lack of objectivity. Based on the analysis of the above problems, the naive Bayes classification algorithm in cloud computing environment is proposed, and the attribute reduction algorithm in cloud computing environment is improved. Combining these two algorithms, the fine-grained identification of encrypted P2P traffic is realized. Experimental results show that this method can efficiently deal with big data network traffic, and has a high P2P traffic recognition rate, and the results are objective.
【学位授予单位】：曲阜师范大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.02

【参考文献】