基于标记与特征依赖最大化的弱标记集成分类

发布时间：2019-01-08 12:32

【摘要】：弱标记学习是多标记学习的一个重要分支,近几年已被广泛研究并被应用于多标记样本的缺失标记补全和预测等问题.然而,针对特征集合较大、更容易拥有多个语义标记和出现标记缺失的高维数据问题,现有弱标记学习方法普遍易受这类数据包含的噪声和冗余特征的干扰.为了对高维多标记数据进行准确的分类,提出了一种基于标记与特征依赖最大化的弱标记集成分类方法 En WL.En WL首先在高维数据的特征空间多次利用近邻传播聚类方法,每次选择聚类中心构成具有代表性的特征子集,降低噪声和冗余特征的干扰;再在每个特征子集上训练一个基于标记与特征依赖最大化的半监督多标记分类器;最后,通过投票集成这些分类器实现多标记分类.在多种高维数据集上的实验结果表明,En WL在多种评价度量上的预测性能均优于已有相关方法.
[Abstract]:Weak marker learning is an important branch of multi-marker learning. In recent years, it has been widely studied and applied to the problems of missing marker complement and prediction of multi-marker samples. However, due to the large feature set, it is easier to have multiple semantic tags and high dimensional data with missing markers. The existing weak label learning methods are generally vulnerable to the noise and redundant features contained in this kind of data. In order to classify the high-dimensional multi-label data accurately, a new method, En WL.En WL, based on the maximization of label and feature dependency, is proposed. Firstly, the nearest neighbor clustering method is used in the feature space of high-dimensional data for many times. Each selection of clustering centers constitutes a representative feature subset to reduce the interference of noise and redundant features. Then, a semi-supervised multi-label classifier based on maximization of label and feature dependence is trained on each feature subset. Finally, multi-label classification is realized by voting to integrate these classifiers. The experimental results on a variety of high-dimensional data sets show that the prediction performance of, En WL on various evaluation metrics is superior to that of existing methods.
【作者单位】：西南大学计算机与信息科学学院;北京建筑大学电气与信息工程学院;
【分类号】：TP181

【相似文献】