基于稀疏和信息论的无监督特征学习算法研究

发布时间：2018-02-10 03:44

本文关键词： 特征选择子空间学习稀疏建模降维机器学习　出处：《电子科技大学》2017年博士论文　论文类型：学位论文

【摘要】：随着数据采集技术的不断进步,原始数据的维度变的越来越高。更高维度的数据能获得更多有用信息的同时也引入很多冗余、增加算法的计算复杂度。为了适应日益增长的数据维度和有效处理样本数据,减少数据的冗余,如何从高维度数据中有效的学习到低维度特征是现在数据处理及大数据中迫切需要解决的问题。随着数据采集方式越来越多样化,数据样本的数量越来越多,若对训练样本进行人工标记需要消耗大量的时间成本和人力成本。因此,无监督的数据降维特征学习方法越来越受人们所重视。本文以无监督特征学习算法为研究课题,重点研究了基于稀疏和信息论的无监督特征学习算法,所包含的两大块研究内容如下:第一,本文研究了在基于稀疏建模的无监督特征选择方法的建模及算法设计。首先,特征选择问题被建模成子空间学习模型,并通过在模型中加入稀疏约束来选择出更有用的特征。这部分提出了三种基于子空间学习的特征选择方法:1)为了更有效地选择出特征且消除负贡献对问题的影响,提出了一种基于非负子空间学习模型。为了更有效地挖掘数据内部信息,本文将自适应稀疏性框架ISD加入到子空间学习的过程中,提出了一种非负自适应稀疏约束的子空间学习模型;2)为了充分应用数据内部的信息,本文进一步考虑将数据的局部结构加入到子空间学习的过程中,提出了一种全局及局部结构保持的子空间学习模型;3)在无监督背景下,为了加入判别信息,本文将样本聚类信息作为一种判别信息加入到子空间学习的过程中,提出了一种判别子空间学习方法。第二,本文研究了基于信息论学习的鲁棒无监督特征学习方法。在特征学习中若数据存在局外点,基于Frobenius范数设计的目标函数会受到局外点的严重影响。本文采用信息论学习中最大相关熵准则对目标函数进行建模,提出了两种基于最大相关熵准则的鲁棒特征学习模型:1)为了提高无监督特征选择模型对于局外点的鲁棒性,本文采用最大相关熵准则与局部结构保持子空间学习相结合对无监督特征选择模型进行建模,并提出了基于最大相关熵准则的鲁棒无监督特征选择模型;2)为了提高稀疏主成分分析(SPCA)关于局外点的鲁棒性,本文采用最大相关熵准则对SPCA进行建模。同时为了更加充分的利用样本内部的信息,于是在SPCA模型中加入多超图学习正则项,使模型能够利用样本内部的流形信息,并提出一种基于最大相关熵准则和高阶流形约束的稀疏主成分分析模型。
[Abstract]:With the development of data acquisition technology, the dimension of raw data becomes higher and higher. The higher dimension data can obtain more useful information, but also introduce a lot of redundancy. Increase the computational complexity of the algorithm. In order to adapt to the growing data dimension and effectively process the sample data, reduce the data redundancy, How to effectively learn low-dimensional features from high-dimensional data is an urgent problem in data processing and big data. With the increasing diversity of data acquisition methods, the number of data samples is increasing. If manual marking of training samples requires a large amount of time and manpower costs, unsupervised dimensionality reduction feature learning methods are paid more and more attention. In this paper, unsupervised feature learning algorithm is taken as a research topic. This paper focuses on the unsupervised feature learning algorithm based on sparse and information theory. The research contents are as follows: first, this paper studies the modeling and algorithm design of unsupervised feature selection method based on sparse modeling. The feature selection problem is modeled as a subspace learning model. In this part, we propose three feature selection methods based on subspace learning:: 1) in order to select the feature more effectively and eliminate the negative contribution to the problem. In this paper, a non-negative subspace learning model is proposed. In order to mine the internal information of the data more effectively, the adaptive sparse framework (ISD) is added to the learning process of the subspace. In this paper, a non-negative adaptive sparse constraint subspace learning model is proposed. In order to fully apply the information inside the data, the local structure of the data is further considered in the process of subspace learning. In this paper, a global and local structure-preserving subspace learning model is proposed. In order to add discriminant information, sample clustering information is added to the process of subspace learning under unsupervised background. A discriminant subspace learning method is proposed. Secondly, a robust unsupervised feature learning method based on information theory learning is studied. The objective function based on Frobenius norm design will be seriously affected by the local point. In this paper, the maximum correlation entropy criterion in information theory learning is used to model the objective function. In order to improve the robustness of the unsupervised feature selection model to the local and outer points, two robust feature learning models based on the maximum correlation entropy criterion are proposed. In this paper, the model of unsupervised feature selection is modeled by using the maximum correlation entropy criterion and local structure preserving subspace learning. In order to improve the robustness of sparse principal component analysis (SPCA), a robust unsupervised feature selection model based on the maximum correlation entropy criterion is proposed. In this paper, the maximum correlation entropy criterion is used to model the SPCA. In order to make full use of the information inside the sample, we add the multi-hypergraph to the SPCA model to learn the regular term, so that the model can make use of the manifold information inside the sample. A sparse principal component analysis model based on maximum correlation entropy criterion and higher order manifold constraints is proposed.
【学位授予单位】：电子科技大学
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TP181

【相似文献】