当前位置:主页 > 科技论文 > 自动化论文 >

面向不平衡分布数据的主动极限学习机算法研究

发布时间:2018-11-05 14:03
【摘要】:近年来,随着数据获取与数据存储技术的高速发展,各行各业均积累了海量的数据,如何对这些海量数据进行分析成为了困扰机器学习与数据挖掘领域研究者的核心问题。例如,对这海量数据的类别进行标注,进而建立分类模型,无疑会大幅增加人力、物力与时间成本的开销,而主动学习则是可有效解决上述问题的利器。经过多年研究,研究人员已提出了多种有效的主动学习算法,但其均忽略了一个重要问题,即在样本不平衡分布场景下,这些算法是否会仍旧有效。故本文主要研究在类别不平衡数据中如何保持主动学习的效率与性能。针对上述问题,本文主要围绕在不平衡数据分布中,如何改进主动学习算法使其分类性能达到最优这一问题展开研究,主要研究内容包括以下两个方面:1)针对在不平衡分布数据中执行主动学习,其分类面容易形成偏倚,从而导致主动学习失效这一问题,拟采用采样技术作为学习过程的平衡控制策略,在调查了几种已有的采样算法的基础上,提出了一种边界过采样算法,并将其与主动学习相结合。且由于极限学习机具有泛化能力强、训练速度快等优点,拟采用其作为基分类器,来加速主动学习的进程。并通过12个基准数据集对加入了平衡控制策略的主动学习算法的性能进行了验证。结果表明,在不平衡场景下,主动学习方法确实会受到影响,且采取了样本采样技术的主动学习方法性能更优。2)为了实现更快的训练速度,引入了在线学习,进而提出了一种在线加权极限学习机算法,即OS-W-ELM算法。同时拟采用代价敏感学习技术作为学习过程中的平衡控制策略,并与主动学习相结合。此实验仍是以极限学习机作为基分类器。并采用与上述实验相同的12个基准数据集,对AL-OS-W-ELM算法、AL-OS-ELM算法和RS-OS-W-ELM算法的性能进行了比较。同时将AL-OS-W-ELM算法、AL-OS-ELM算法与加入了采样技术的主动学习算法在运行时间上进行了对比。结果表明,在不平衡场景下,采取了在线学习与代价敏感学习技术的主动学习方法性能更优。
[Abstract]:In recent years, with the rapid development of data acquisition and data storage technology, a variety of industries have accumulated massive data, how to analyze these massive data has become a core problem for researchers in the field of machine learning and data mining. For example, tagging the huge data category and establishing classification model will undoubtedly increase the cost of manpower, material resources and time cost, and active learning is the effective weapon to solve the above problems. After many years of research, researchers have proposed a variety of effective active learning algorithms, but they all ignore an important question, that is, whether these algorithms will still be effective in the scenario of uneven distribution of samples. Therefore, this paper focuses on how to maintain the efficiency and performance of active learning in class imbalance data. In view of the above problems, this paper focuses on how to improve the active learning algorithm to achieve the optimal classification performance in the unbalanced data distribution. The main research contents include the following two aspects: 1) in order to solve the problem of active learning in unbalanced distributed data, the classification surface is prone to bias, which leads to the failure of active learning. Based on the investigation of several existing sampling algorithms, a boundary oversampling algorithm is proposed and combined with active learning. Because extreme learning machine has the advantages of strong generalization ability and fast training speed, it is proposed to use it as a base classifier to speed up the process of active learning. The performance of active learning algorithm with balanced control strategy is verified by 12 datum data sets. The results show that the active learning method will be affected in the unbalanced scenario, and the performance of the active learning method with sample sampling technique is better. 2) in order to achieve faster training speed, online learning is introduced. Furthermore, an online weighted limit learning machine algorithm, OS-W-ELM algorithm, is proposed. At the same time, the cost sensitive learning technique is adopted as the balance control strategy in the learning process and combined with active learning. This experiment still uses the extreme learning machine as the base classifier. The performance of AL-OS-W-ELM algorithm, AL-OS-ELM algorithm and RS-OS-W-ELM algorithm is compared with 12 datum data sets. At the same time, the AL-OS-W-ELM algorithm, the AL-OS-ELM algorithm and the active learning algorithm with sampling technology are compared in the running time. The results show that the active learning method based on online learning and cost sensitive learning is better in the unbalanced scenario.
【学位授予单位】:江苏科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18

【参考文献】

中国期刊全文数据库 前4条

1 翟云;杨炳儒;曲武;;不平衡类数据挖掘研究综述[J];计算机科学;2010年10期

2 王和勇;樊泓坤;姚正安;李成安;;不平衡数据集的分类方法研究[J];计算机应用研究;2008年05期

3 林智勇;郝志峰;杨晓伟;;不平衡数据分类的研究现状[J];计算机应用研究;2008年02期

4 龙军;殷建平;祝恩;赵文涛;;主动学习研究综述[J];计算机研究与发展;2008年S1期



本文编号:2312301

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2312301.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f448d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com