子空间聚类分析新算法及应用研究
发布时间:2018-01-05 21:18
本文关键词:子空间聚类分析新算法及应用研究 出处:《江南大学》2017年博士论文 论文类型:学位论文
更多相关文章: 子空间聚类 稀疏表示 低秩表示 半监督学习 迁移学习
【摘要】:高维数据普遍存在于各个领域,特别是进入大数据时代,这对于传统聚类算法提出了很大的挑战,子空间聚类算法作为有效的解决高维数据聚类问题的有效算法吸引了研究人员的广泛关注。近来,基于稀疏表示(SR)和低秩表示(LRR)的子空间聚类算法凭借其优良的性能成为新的研究热点。本文也集中研究了基于稀疏表示和低秩表示的子空间聚类算法,对其进行了深入研究分析,提出了相关改进方法,提高了算法在处理具体问题的性能。论文的主要工作如下:1.提出了一种鲁棒的结构约束低秩表示算法(RSLRR)。低秩表示算法在挖掘数据子空间结构方法得到了成功的应用。但是基于低秩表示的算法通常分类分离的两个步骤,第一,通过求解秩最小化构造亲和图;第二,利用谱聚类算法对亲和图进行划分得到最终的分割结果。这表示亲和图的构造和谱聚类是相互依赖的,而传统的基于低秩表示的算法是无法保证最终的结果为全局最优解。论文提出的鲁棒的结构约束低秩表示算法通过将亲和图构造和谱聚类结合在一个统一的优化框架之内,通过联合优化可以同时得到数据聚类结果和数据集的低秩表示结构信息。在多个数据集上的实验证明了该算法的有效性。2.提出了一种基于流形局部约束的低秩表示算法(MLCLRR)。低秩表示算法能够有效的挖掘数据集的低维子空间结构。但是大部分基于低秩表示的算法并没有考虑数据集的非线性几何结构,那么在算法处理过程中会丢失数据集的局部结构信息和相似性信息,而这些信息对数据分析问题也起到重要作用。为了提高低秩表示算法在此问题上的性能,本文提出了一种基于流形局部约束的低秩表示算法,通过在在算法框架中引入数据的局部流形结构,本文提出的算法不仅能够有效保持数据的全局低维子空间结构,同时能够挖掘数据的局部非线性几何结构信息。在不同计算机视觉任务上的实验表明了算法的有效性。3.提出了一种Latent Space结构约束低秩表示算法(Lat RSLRR)。大部分已经提出的基于稀疏表示和低秩表示的子空间聚类算法实在原始空间上对数据集进行处理,当原始数据集的维数较高时,会大大增加算法的时间成本。本文提出了一种基于Latent Space的结构约束低秩表示算法,通过在低维Latent Space中求解数据的低秩表示系数大大提高了计算效率。同时多数低秩表示算法采用数据集本身作为数据字典,当数据集中含有较多噪声和例外点时,会严重影响算法最终性能,本文提出的算法通过利用矩阵恢复技术求解得到的鉴别性字典作为低秩表示的字典。子空间聚类问题上的实验表明了算法的有效性。4.将半监督学习和低秩表示进行了有机的结合,通过将图嵌入学习和稀疏回归方法统一在一个优化框架之中,提出了基于低秩表示的半监督学习算法。目前,大部分基于图的半监督学习算法考虑了数据的局部近邻信息,但是忽略了样本数据的全局结构信息。本文提出的方法通过将数据投影到低维子空间中学习得到低秩权重矩阵,在亲和图的构造过程中充分利用数据集的已标记样本信息。降维过程中,算法能够有效的保留数据集的全局结构信息,并且学习得到的低秩权重矩阵能够有效的降低噪声数据对最终结果的影响。在多个数据集上的实验表明了该算法能够获得较高的分类准确率。5.提出了一种熵加权迁移软子空间聚类算法。为了获得较高的聚类准确率,传统聚类算法通常需要大量历史样本数据的支持,这带来的影响是:如果当前数据采集环境中存在信息丢失或者数据之间的划分关系不明确的情况下,这会导致聚类算法的失效。迁移学习对解决数据样本不足的问题具有很好的效果,通过利用数据集的历史信息,本文提出了一种熵加权的软子空间聚类算法。在多个UCI标准数据集和高维基因表达数据集上的实验表明了算法能够充分利用数据集的历史信息弥补当前数据样本量不足的缺点,提高聚类算法的准确率。
[Abstract]:High dimensional data exists in various fields, especially in the era of big data, it is a big challenge to the traditional clustering algorithm, subspace clustering algorithm is an effective algorithm effectively solve the clustering problem of high dimensional data has attracted wide attention from researchers. In recent years, based on sparse representation (SR) and low rank (LRR) subspace clustering algorithm with its excellent performance has become a new research topic. This paper also concentrated on the sparse subspace clustering algorithm and low rank based on the in-depth research and analysis, put forward relevant improvement methods, improve the performance of the algorithm in dealing with specific problems. The main work of this paper the structure of the thesis are as follows: 1. a robust low rank constraint representation algorithm (RSLRR). The low rank representation algorithm in data mining subspace structure method has been successfully used. But based on low rank representation The two step, the classification algorithm is usually separated by solving the first rank minimization tectonic Affinity Diagram; second, using spectral clustering algorithm to classify the affinity graph to get the final segmentation result. This indicates the affinity graph structure and spectral clustering are interdependent, and the traditional algorithm based on low rank representation is not guaranteed the final result is the global optimal solution. The structure of the proposed robust low rank constraint representation algorithm by affinity graph structure and spectral clustering combination within a unified optimization framework, through the joint optimization can be obtained simultaneously low rank data clustering results and data sets representing structural information. On multiple data sets the experiment proved that.2. the effectiveness of the algorithm this paper proposes a new algorithm based on low rank manifold local constraints (MLCLRR). The low rank representation algorithm to a low dimensional subspace of data mining in the effective structure. But most based on low rank representation algorithm does not consider the nonlinear geometric structure of the data set, then the local structure information in the algorithm process lost data set and the similarity information, and the information of data analysis problems also play an important role. In order to improve the performance of low rank representation algorithm on this problem in this paper. A low rank manifold representation algorithm based on local constraints, through the introduction of data in the local manifold structure in the algorithm framework, the proposed algorithm can not only effectively maintain the data of the global low dimensional space structure, at the same time to local nonlinear geometric structure information of data mining. In different computer vision tasks on the experiment the.3. algorithm presents a Latent Space constraint structure low rank representation algorithm (Lat RSLRR). Most have been proposed based on sparse representation and Subspace clustering algorithm of low rank representation is the original space to deal with the data set, when the high dimension of the original data set, the algorithm will greatly increase the cost of time. This paper presents a structural constraint Latent low rank representation algorithm based on Space, by Latent Space in the low dimensional representation of data in low rank solution the coefficient of the computational efficiency is greatly improved. At the same time, the majority of low rank representation algorithm using the data set itself as the data dictionary, when the data set contains more noise and exceptional point, will seriously affect the final performance of the algorithm, this algorithm through the identification of the dictionary is obtained by using matrix recovery technology as a low rank representation of the subspace dictionary. The problem of clustering experiments show the effectiveness of the.4. algorithm of semi supervised learning and low rank representation for the organic combination of the graph embedding learning and sparse regression method In a unified optimization framework, proposes a semi supervised learning algorithm based on low rank representation. At present, most of the semi supervised learning algorithm based on graph considering local neighbor information of the data, but ignore the global structure information of the sample data. The method proposed in this paper by projecting the data onto a low dimensional subspace learning low rank weight matrix, full data set of labeled samples in the construction process of information using the affinity graph. In process of reduction, the algorithm can effectively preserve the global structure information data set, the low rank weight matrix and learning can effectively reduce the effect of noise data on the final result. In multiple data the set of experiments show that the algorithm can achieve higher classification accuracy.5. an entropy weighted migration soft subspace clustering algorithm is proposed. In order to obtain a higher clustering accuracy, the traditional Clustering algorithms usually need a large number of historical data, the impact of this is: if the relationship between the division of information loss current data acquisition environment or data uncertainty, which causes the failure of clustering algorithm. Transfer learning has good effect on solving the problem of insufficient data, through the use of data in the history of information, this paper proposes a soft subspace clustering algorithm for weighted entropy. Experiments on the data sets show that the algorithm can make full use of the historical data set to make up for the current lack of sample information disadvantages expressed in multiple UCI data sets and high dimension gene, to improve the accuracy of clustering algorithm.
【学位授予单位】:江南大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前8条
1 张涛;唐振民;吕建勇;;一种基于低秩表示的子空间聚类改进算法[J];电子与信息学报;2016年11期
2 许凯;吴小俊;尹贺峰;;基于分布式低秩表示的子空间聚类算法[J];计算机研究与发展;2016年07期
3 刘展杰;陈晓云;;局部子空间聚类[J];自动化学报;2016年08期
4 王卫卫;李小平;冯象初;王斯琪;;稀疏子空间聚类综述[J];自动化学报;2015年08期
5 许凯;吴小俊;;基于重建系数的子空间聚类融合算法[J];计算机应用研究;2015年11期
6 舒振球;赵春霞;张浩峰;;基于监督学习的稀疏编码及在数据表示中的应用[J];控制与决策;2014年06期
7 王骏;王士同;邓赵红;;聚类分析研究中的若干问题[J];控制与决策;2012年03期
8 陈黎飞;郭躬德;姜青山;;自适应的软子空间聚类算法[J];软件学报;2010年10期
,本文编号:1384872
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1384872.html
最近更新
教材专著