多标签学习的特征降维方法

发布时间：2018-02-12 04:45

本文关键词： 多标签学习特征降维主成分分析非负矩阵分解相似矩阵　出处：《闽南师范大学》2017年硕士论文　论文类型：学位论文

【摘要】：在多标签学习中,多标签数据的每个样本含有多个标签,标签与标签之间也不是独立存在的。多标签数据的维数较高,增加了数据挖掘的复杂度和难度。近些年来如何高效地处理多标签数据,成为研究者们研究的一个热点问题。特征降维能降低多标签数据的维度、缩小数据规模,提高多标签学习的性能。本论文提出了两种多标签学习特征降维算法:(1)基于主成分分析的多标签学习特征降维算法(MLFR-PCA)。首先该算法利用PCA原理将原始数据投影到低维空间,对数据进行密集和去噪处理。其次算法将数据的所有标签作为一个整体,在标签与特征之间引入稀疏回归,建立起标签空间与特征空间的联系,以此构造数据降维的目标函数。然后结合2,1l范数对算法进行优化处理,最终实现降低多标签数据维数的目的。(2)基于非负矩阵分解的多标签学习特征降维算法(MLFR-NMF)。首先该算法用特征矩阵与非负矩阵的乘积构建特征空间的相似矩阵。其次将数据的所有标签作为一个整体,利用已有方法构造标签空间的相似矩阵。然后在特征空间的相似矩阵与标签空间的相似矩阵之间引入最小二乘法,建立起标签空间与特征空间的联系,以此构造数据降维的目标函数。最后结合2l范数对算法进行优化处理,以实现降低多标签数据维数的目的。以上两种特征降维算法可以直接对多标签数据进行降维,不需要转化多标签数据为单标签数据,这样不仅减少了转化过程引起的工作量增大问题,也避免了因转化不准确带来的后续问题。此外,算法将数据的所有标签作为一个整体参与目标函数构造,这样可以在不破坏标签结构的情况下,有效利用标签信息实现降维。通过在真实数据集上的实验,表明了两种算法效果良好。
[Abstract]:In multi-label learning, each sample of multi-label data contains multiple tags, and the labels and tags do not exist independently. The dimension of multi-label data is higher. In recent years, how to deal with multi-label data efficiently has become a hot issue for researchers. Feature dimensionality reduction can reduce the dimension of multi-label data and reduce the scale of data. In this paper, we propose two multi-label learning feature reduction algorithms: (1) Multi-label learning feature reduction algorithm based on principal component analysis (PCA) and MLFR-PCAA algorithm. Firstly, this algorithm uses PCA principle to project raw data into low-dimensional space. Secondly, the algorithm takes all labels of data as a whole, introduces sparse regression between labels and features, and establishes the relationship between label space and feature space. The objective function of data dimension reduction is constructed, and the algorithm is optimized with 2L norm. Finally, the purpose of reducing the dimension of multi-label data is realized.) the multi-label learning feature reduction algorithm based on non-negative matrix factorization is proposed. Firstly, the product of feature matrix and non-negative matrix is used to construct the similarity matrix of feature space. Take all the labels of the data as a whole, The similarity matrix of the tag space is constructed by using the existing methods, and then the least square method is introduced between the similarity matrix of the feature space and the similarity matrix of the label space, and the relation between the tag space and the feature space is established. Finally, the algorithm is optimized with 2l norm to reduce the dimension of multi-label data. The above two feature dimensionality reduction algorithms can directly reduce the dimension of multi-label data. There is no need to convert multi-label data to single-label data, which not only reduces the increased workload caused by the conversion process, but also avoids the subsequent problems caused by inaccurate transformation. The algorithm constructs all the tags of the data as a whole to participate in the objective function, which can effectively use tag information to reduce the dimension without breaking the tag structure. The results show that the two algorithms are effective.
【学位授予单位】：闽南师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181

【参考文献】