多标签分类中在线学习算法研究
发布时间:2018-04-25 10:19
本文选题:多标签分类 + 在线学习 ; 参考:《南京师范大学》2017年硕士论文
【摘要】:在多标签分类问题中,一个样本可以同时属于多个类别标签,且样本标签之间不再相互排斥。目前,多标签分类问题已在文本分类,自然场景分类和音乐情感标注等领域得到广泛应用,因此提出很多多标签分类算法。近年来,多标签分类算法大多采用批量学习的方式,它要求将整个训练数据集全部读入内存且可以通过一次学习得到最终分类模型。但在实际应用中,尤其对大规模数据集的分类问题,这种批量学习的方式,将会消耗大量的时间和空间资源。针对上述问题,本文基于在线学习理论,围绕大规模多标签分类问题展开研究,提出了两种多标签在线分类算法。主要工作如下:1.使用二类相关分解策略,结合已有的二类在线“被动-进攻”主动学习算法,提出基于分解策略的多标签在线“被动-进攻”主动学习算法(MLPAA)。算法采用主动学习的方式查询多标签样本信息,这样不仅可以利用在线学习的方式不断更新多标签分类器模型,还利用主动学习的方式探索未标注样本信息,减少人工标注代价和时间。在实验中,根据五个多标签评价准则,在八个多标签数据集上,将MLPAA算法与三个算法进行实验比较。结果表明,MLPAA算法相对于MLRPE, MLPEA和MLRPA算法具有更好的分类性能。2.基于标签排序思想,改进多类在线“被动-进攻”分类算法,提出了考虑标签相关性的多标签在线“被动-进攻”分类算法(MLLRPA)。算法通过最大化多标签样本中相关标签子集与不相关标签子集之间的间隔,预测标签排序对的方式建立相关标签与不相关标签的排序错误集合,根据错误集合的大小,进而更新分类器模型。在实验中,在10个多标签基准数据集上,根据四个多标签评价指标,将MLLRPA算法与在线MMP, BR-PE和BR-PA三个算法进行对比实验。实验结果表明本文提出的MLLRPA算法具有较好的性能。
[Abstract]:In the multi-label classification problem, a sample can belong to multiple class labels at the same time, and the sample labels are no longer mutually exclusive. At present, multi-label classification problem has been widely used in the fields of text classification, natural scene classification and music emotion tagging, so many multi-label classification algorithms are proposed. In recent years, most multi-label classification algorithms adopt batch learning, which requires the entire training data set to be read into memory and the final classification model can be obtained by one learning. However, in practical applications, especially for the classification of large-scale data sets, this mass learning method will consume a lot of time and space resources. In order to solve the above problems, based on the online learning theory, this paper focuses on the large-scale multi-label classification problem, and proposes two online multi-label classification algorithms. The main work is as follows: 1. Based on the two-class correlation decomposition strategy and the existing passive attack active learning algorithm, a multi-label online passive attack active learning algorithm based on decomposition strategy is proposed. The algorithm uses active learning to query multi-label sample information, which can not only update the multi-label classifier model by online learning, but also explore the unlabeled sample information by active learning. Reduce manual marking cost and time. In the experiment, according to five multi-label evaluation criteria, the MLPAA algorithm is compared with three algorithms on eight multi-label datasets. The results show that the MLPAA algorithm has better classification performance than MLRPE, MLPEA and MLRPA algorithms. Based on the idea of label sorting, a multi-class online "passive-attack" classification algorithm considering label correlation is proposed, and a multi-label online "passive-attack" classification algorithm is proposed. By maximizing the interval between the correlation tag subset and the unrelated tag subset in the multi-label sample, the algorithm establishes the sorting error set of the correlation label and the unrelated label by predicting the sorting pairs of the label, according to the size of the error set. Then the classifier model is updated. In the experiment, the MLLRPA algorithm is compared with the online MMP, BR-PE and BR-PA algorithms on 10 multi-label datum data sets according to four multi-label evaluation indexes. Experimental results show that the proposed MLLRPA algorithm has better performance.
【学位授予单位】:南京师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP181
【参考文献】
相关期刊论文 前1条
1 徐美香;孙福明;李豪杰;;主动学习的多标签图像在线分类[J];中国图象图形学报;2015年02期
,本文编号:1800907
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1800907.html