基于标记与特征依赖最大化的弱标记集成分类
发布时间:2019-01-08 12:32
【摘要】:弱标记学习是多标记学习的一个重要分支,近几年已被广泛研究并被应用于多标记样本的缺失标记补全和预测等问题.然而,针对特征集合较大、更容易拥有多个语义标记和出现标记缺失的高维数据问题,现有弱标记学习方法普遍易受这类数据包含的噪声和冗余特征的干扰.为了对高维多标记数据进行准确的分类,提出了一种基于标记与特征依赖最大化的弱标记集成分类方法 En WL.En WL首先在高维数据的特征空间多次利用近邻传播聚类方法,每次选择聚类中心构成具有代表性的特征子集,降低噪声和冗余特征的干扰;再在每个特征子集上训练一个基于标记与特征依赖最大化的半监督多标记分类器;最后,通过投票集成这些分类器实现多标记分类.在多种高维数据集上的实验结果表明,En WL在多种评价度量上的预测性能均优于已有相关方法.
[Abstract]:Weak marker learning is an important branch of multi-marker learning. In recent years, it has been widely studied and applied to the problems of missing marker complement and prediction of multi-marker samples. However, due to the large feature set, it is easier to have multiple semantic tags and high dimensional data with missing markers. The existing weak label learning methods are generally vulnerable to the noise and redundant features contained in this kind of data. In order to classify the high-dimensional multi-label data accurately, a new method, En WL.En WL, based on the maximization of label and feature dependency, is proposed. Firstly, the nearest neighbor clustering method is used in the feature space of high-dimensional data for many times. Each selection of clustering centers constitutes a representative feature subset to reduce the interference of noise and redundant features. Then, a semi-supervised multi-label classifier based on maximization of label and feature dependence is trained on each feature subset. Finally, multi-label classification is realized by voting to integrate these classifiers. The experimental results on a variety of high-dimensional data sets show that the prediction performance of, En WL on various evaluation metrics is superior to that of existing methods.
【作者单位】: 西南大学计算机与信息科学学院;北京建筑大学电气与信息工程学院;
【分类号】:TP181
本文编号:2404599
[Abstract]:Weak marker learning is an important branch of multi-marker learning. In recent years, it has been widely studied and applied to the problems of missing marker complement and prediction of multi-marker samples. However, due to the large feature set, it is easier to have multiple semantic tags and high dimensional data with missing markers. The existing weak label learning methods are generally vulnerable to the noise and redundant features contained in this kind of data. In order to classify the high-dimensional multi-label data accurately, a new method, En WL.En WL, based on the maximization of label and feature dependency, is proposed. Firstly, the nearest neighbor clustering method is used in the feature space of high-dimensional data for many times. Each selection of clustering centers constitutes a representative feature subset to reduce the interference of noise and redundant features. Then, a semi-supervised multi-label classifier based on maximization of label and feature dependence is trained on each feature subset. Finally, multi-label classification is realized by voting to integrate these classifiers. The experimental results on a variety of high-dimensional data sets show that the prediction performance of, En WL on various evaluation metrics is superior to that of existing methods.
【作者单位】: 西南大学计算机与信息科学学院;北京建筑大学电气与信息工程学院;
【分类号】:TP181
【相似文献】
相关会议论文 前3条
1 Raimund Parzmair;荆德君;;高温产品的标记和跟踪[A];1999中国钢铁年会论文集(上)[C];1999年
2 钱竹青;谭庆平;刘峰;杨艳萍;;基于本体论和标记图相似性的Web服务匹配算法[A];2006年全国开放式分布与并行计算学术会议论文集(二)[C];2006年
3 沈志军;马瑞娟;俞明亮;蔡志翔;杜平;许建兰;;油蟠桃组合SSR标记连锁图谱及主要糖酸风味性状的QTL分析[A];中国园艺学会桃分会成立暨学术研讨会论文集[C];2007年
相关博士学位论文 前2条
1 付彬;基于标记依赖关系的多标记学习算法研究[D];北京交通大学;2016年
2 张韬;基于多级次同步混叠干涉的光栅标记对准测量方法[D];哈尔滨工业大学;2017年
相关硕士学位论文 前6条
1 姚前;基于部分标记图的频繁子图挖掘算法研究[D];重庆大学;2009年
2 高振华;基于标记间相关性的多标记分类算法[D];中南大学;2013年
3 任晋滔;基于多标记学习的中医问诊系统的研究[D];华东理工大学;2012年
4 王亮;基于扩展标记图的网页浏览与检索研究[D];重庆大学;2004年
5 刘倩;民国出版标记的设计与文化[D];北京印刷学院;2015年
6 郝虹;基于样例及属性特征分析的多标记分类算法研究[D];山东师范大学;2015年
,本文编号:2404599
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2404599.html