网络流量测量中基于计数的频繁项挖掘算法研究
发布时间:2018-03-12 20:51
本文选题:数据流 切入点:频繁项 出处:《燕山大学》2014年硕士论文 论文类型:学位论文
【摘要】:大流识别是在网络流量测量数据流中查询频繁项的过程,近些年成为了数据流挖掘领域的一个研究热点。网络流量数据流具有高速、不确定、时变等特征,并且大流识别对查询结果的精度要求较高,对算法的时空开销也有一定限制。故目前的数据流频繁项挖掘算法仍存在识别效率低、处理实时性差、性能亟待改进等问题。 首先,针对PLC(Probabilistic Lossy Counting)算法的计算公式和清除策略,进一步优化时空开销,提出了一种基于计数的空间优化的大流识别算法DPLC(Delta Probabilistic Lossy Counting)。该算法引入变化度和稳态值两个概念,来衡量计算出的窗口误差值是否达到稳态,达到稳态后可以省去计算公式的时间,从而降低时间开销。此外,边界删除时采用更严格的误差值,区分对待是否使用计算出的窗口误差值替换表条目的估计值,,可以有效降低存储开销。 其次,针对LC(Lossy Counting)算法存储开销大且识别精度低,而PLC算法识别精度较高但可能出现漏报的问题,进一步优化算法性能,提出了一种非线性有损计数的大流识别算法NLC(Nonlinear Lossy Counting)。分别采用非线性函数作为统计值和删除值,通过有效控制算法的存储开销来降低输出结果的误报和避免漏报。值得注意的是,非线性函数的参数值选取是否合适对NLC算法的性能发挥有影响,实验分析总结了非线性函数的参数值与包分布、支持度之间的变化规律。设置不同参数值的非线性函数可以改变NLC算法的存储开销,进而改变了识别结果的误报率和漏报率。 最后,通过实验使用人工合成数据集和真实网络流量数据集评估算法的性能。实验结果展示了本文提出的算法在识别精度、时间开销、存储开销等方面的优势。本文实验均在MyEclipse平台下开发,用java语言实现。
[Abstract]:Large flow identification is the process of querying frequent items in the network traffic measurement data stream. In recent years, it has become a research hotspot in the field of data stream mining. The network traffic data stream has the characteristics of high speed, uncertainty, time-varying and so on. Large stream recognition requires high precision of query results and also has some limitations on the space-time overhead of the algorithm. Therefore, the current algorithms for frequent item mining of data streams still have some problems such as low recognition efficiency, poor real-time processing, and the performance needs to be improved. First of all, aiming at the calculation formula and clearing strategy of PLC(Probabilistic Lossy counting algorithm, and further optimizing space-time overhead, a large flow recognition algorithm DPLC(Delta Probabilistic Lossy counting algorithm based on counting is proposed, which introduces the concepts of degree of variation and steady-state value. To measure whether the calculated window error is steady-state, the time of the formula can be saved and the time cost can be reduced. In addition, a more strict error is used when the boundary is deleted. It can effectively reduce the storage overhead by distinguishing whether to replace the estimated values of table entries with the calculated window error values. Secondly, aiming at the problem that the LC(Lossy counting algorithm has high storage cost and low recognition accuracy, but the PLC algorithm has high recognition accuracy but may be underreported, the performance of the algorithm is further optimized. In this paper, a large flow recognition algorithm for nonlinear lossy counting, NLC(Nonlinear Lossy counting algorithm, is proposed. The nonlinear function is used as the statistical value and the deletion value respectively. By effectively controlling the storage cost of the algorithm, the false positives of the output results are reduced and the false positives are avoided. Whether the parameter value of the nonlinear function is suitable or not has an effect on the performance of the NLC algorithm. The parameter value and the packet distribution of the nonlinear function are analyzed and summarized in the experiment. The nonlinear function with different parameter values can change the storage cost of NLC algorithm and then change the false alarm rate and false alarm rate of the recognition results. Finally, the performance of the algorithm is evaluated by using artificial data sets and real network traffic datasets. The experimental results show that the proposed algorithm has high recognition accuracy and time overhead. The experiments in this paper are developed on MyEclipse platform and realized by java language.
【学位授予单位】:燕山大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.06
【参考文献】
相关期刊论文 前5条
1 ;Identifying heavy hitters in high-speed network monitoring[J];Science China(Information Sciences);2010年03期
2 李臻;杨雅辉;谢高岗;覃光成;;一种基于数据流计数的概率衰落大业务流识别方法[J];计算机研究与发展;2011年06期
3 祝然威;王鹏;刘马金;;基于计数的数据流频繁项挖掘算法[J];计算机研究与发展;2011年10期
4 王秀坤;王铁存;周国能;冯维;;挖掘数据流近似频繁项的改进算法[J];计算机工程与应用;2008年13期
5 王伟平;李建中;张冬冬;郭龙江;;一种有效的挖掘数据流近似频繁项算法[J];软件学报;2007年04期
本文编号:1603253
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1603253.html