当前位置:主页 > 科技论文 > 软件论文 >

基于密度偏倚抽样的局部距离异常检测方法

发布时间:2018-08-03 17:12
【摘要】:异常检测是数据挖掘的重要研究领域,当前基于距离或者最近邻概念的异常数据检测方法,在进行海量高维数据异常检测时,存在运算时间过长的问题.许多改进的异常检测方法虽然提高了算法运算效率,然而检测效果欠佳.基于此,提出一种基于密度偏倚抽样的局部距离异常检测算法,首先利用基于密度偏倚的概率抽样方法对所需检测的数据集合进行概率抽样,之后对抽样数据利用基于局部距离的局部异常检测方法,对抽样集合进行局部异常系数计算,得到的异常系数既是抽样数据的局部异常系数,又是数据集的近似全局异常系数.然后对得到的每个数据点的局部异常系数进行排序,异常系数值越大的数据点越可能是异常点.实验结果表明,与已有的算法相比,该算法具有更高的检测精确度和更少的运算时间,并且该算法对各种维度和数据规模的数据都具有很好的检测效果,可扩展性强.
[Abstract]:Anomaly detection is an important research field in data mining. The current anomaly detection method based on distance or nearest neighbor concept has the problem of long operation time in detecting large amounts of high-dimensional data. Many improved anomaly detection methods improve the computational efficiency of the algorithm, but the detection effect is not good. Based on this, a local distance anomaly detection algorithm based on density bias sampling is proposed. Firstly, the probability sampling method based on density bias is used to sample the data set. Then the local anomaly coefficient of the sample set is calculated by using the local anomaly detection method based on the local distance. The obtained anomaly coefficient is not only the local abnormal coefficient of the sample data but also the approximate global anomaly coefficient of the data set. Then the local outlier coefficients of each data point are sorted. The more outlier the data point is, the more likely the outlier point is. The experimental results show that the algorithm has higher detection accuracy and less computation time than the existing algorithms, and the algorithm has good detection effect and scalability for data of various dimensions and data scales.
【作者单位】: 中国科学院大学;天基综合信息系统重点实验室(中国科学院软件研究所);
【基金】:国家自然科学基金(U1435220) 国家高技术研究发展计划(863)(2012AA011206)~~
【分类号】:TP311.13


本文编号:2162421

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2162421.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户64866***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com