基于边界混合采样的非均衡数据处理算法
发布时间:2018-08-06 21:25
【摘要】:针对非均衡数据分类效果差的问题,提出一种新的基于边界混合采样的非均衡数据处理方法(BMS).首先通过引进"变异系数"找出样本的边界域和非边界域;然后对边界域中的少数类样本进行过采样,对非边界域中的多数类样本进行随机欠采样,以期达到训练数据基本平衡的目标.实验结果表明,BMS方法比其他3种流行的非均衡数据处理方法在对7个公开数据集的分类性能上平均提高了5%左右,因此,该方法可以广泛应用于非均衡数据的处理和分类中.
[Abstract]:In order to solve the problem of poor classification effect of unbalanced data, a new unbalanced data processing method based on boundary mixed sampling (BMS).) is proposed. First, the "coefficient of variation" is introduced to find out the boundary and non-boundary regions of the samples, then a few samples in the boundary domain are oversampled, and most of the samples in the non-boundary domain are randomly under-sampled. In order to achieve the basic balance of training data goal. The experimental results show that the classification performance of the BMS method is about 5% higher than that of the other three popular non-equilibrium data processing methods. Therefore, this method can be widely used in the processing and classification of disequilibrium data.
【作者单位】: 西北大学信息科学与技术学院;西北大学经济管理学院;西北大学数学学院;
【基金】:陕西省教育厅科学研究计划自然科学专项项目(15JK1738) 陕西省自然科学基金项目(2014JQ8367)
【分类号】:TP311.13
本文编号:2169036
[Abstract]:In order to solve the problem of poor classification effect of unbalanced data, a new unbalanced data processing method based on boundary mixed sampling (BMS).) is proposed. First, the "coefficient of variation" is introduced to find out the boundary and non-boundary regions of the samples, then a few samples in the boundary domain are oversampled, and most of the samples in the non-boundary domain are randomly under-sampled. In order to achieve the basic balance of training data goal. The experimental results show that the classification performance of the BMS method is about 5% higher than that of the other three popular non-equilibrium data processing methods. Therefore, this method can be widely used in the processing and classification of disequilibrium data.
【作者单位】: 西北大学信息科学与技术学院;西北大学经济管理学院;西北大学数学学院;
【基金】:陕西省教育厅科学研究计划自然科学专项项目(15JK1738) 陕西省自然科学基金项目(2014JQ8367)
【分类号】:TP311.13
【相似文献】
相关期刊论文 前2条
1 范阿琳;任树华;;一种融合变异系数的k-mean聚类分析方法[J];计算机工程与应用;2012年35期
2 王以之;陈广鑫;潘佶;;“学生体质、健康卡片”计量资料的微机计算、管理系统[J];体育科学;1986年02期
,本文编号:2169036
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2169036.html