基于自步学习的对标签噪声稳健的半监督分类方法研究
发布时间:2018-01-26 15:28
本文关键词: 半监督分类 流形学习 自步学习 稀疏编码 半监督学习 出处:《首都经济贸易大学》2017年硕士论文 论文类型:学位论文
【摘要】:对数据进行标注是一项冗繁、耗时且容易引起误差的工作,一方面这使得有标签数据的数目比较有限,而相对地无标签数据的数目则比较多;另一方面,在对数据进行标注过程中,数据标签容易出现错误,而很多机器学习的算法对标签带噪声的数据是敏感的。因此,人们希望探究出能利用大量无标签数据且能对噪声标签数据比较稳健的算法。针对此问题,本文提出了一种称为自步流形正则化的框架,它能利用少量有标签数据和大量无标签数据进行半监督分类,且对含噪声标签的数据具有较好的稳健性。具体地,该方法在半监督分类的流形正则化框架中引入自步学习正则化项,以此来稳健地选取有标签的训练数据;同时,我们利用数据稀疏编码中蕴含的判别信息来控制分类器的光滑度,提高分类性能。最后,我们针对构造的优化问题设计一种交替搜索策略,得到具有显性表达式的分类函数。该方法可适用于多分类问题,在对具噪声标签的数据保持稳健性的同时兼顾了分类器的复杂度和光滑度,使得模型有较小的泛化误差。三个数据集上的实验结果显示出此方法在含有噪声标签数据的分类效果上优于传统半监督分类方法。
[Abstract]:Tagging data is a redundant, time-consuming and error-prone work. On the one hand, the number of labeled data is limited, while the number of untagged data is relatively large. On the other hand, in the process of tagging data, data labels are prone to errors, and many machine learning algorithms are sensitive to label noisy data. People want to explore an algorithm that can use a lot of untagged data and be more robust to noise tagged data. To solve this problem, a framework called self-stepping manifold regularization is proposed in this paper. It can use a small amount of labeled data and a large number of untagged data for semi-supervised classification, and it has good robustness to the data with noise labels. In this method, self-learning regularization items are introduced into the manifold regularization framework of semi-supervised classification, so that the tagged training data can be selected stably. At the same time, we use the discriminant information contained in the sparse data coding to control the smoothness of the classifier and improve the classification performance. Finally, we design an alternative search strategy for the structural optimization problem. The classification function with explicit expression is obtained. This method can be applied to multi-classification problems. It keeps the robustness of the noisy data and takes into account the complexity and smoothness of the classifier. The experimental results on three data sets show that the proposed method is better than the traditional semi-supervised classification method in classification with noisy label data.
【学位授予单位】:首都经济贸易大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前2条
1 张晨光;李玉擰;;哈希图半监督学习方法及其在图像分割中的应用[J];自动化学报;2010年11期
2 张晨光;李玉擰;;基于半监督学习的眉毛图像分割方法[J];计算机工程与应用;2009年21期
,本文编号:1465955
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1465955.html