基于改进字典学习的隐子空间聚类算法的研究

发布时间：2019-06-12 14:02

【摘要】：聚类分析作为一种数据分析的工具,是指将抽象的数据对象进行聚集而形成多个簇的分析过程,其在模式识别,机器学习,文档检索,数据挖掘等领域有着广泛的应用。近年来,随着网络的普及,计算机图像技术的发展,使得行业内新增了大量的图像视频数据,并且伴随着人们对视频图像清晰度的要求越来越高,出现了高达数百TB的高维度数据。大多数传统聚类算法都是针对低维度的数据进行设计的,因而很难高效的处理高维度数据。子空间聚类算法作为传统聚类算法的一种扩展,是处理高维度数据聚类的一种有效途径。本文的主要研究内容是针对基于稀疏表示的隐子空间聚类算法进行改进,进而提高算法的聚类性能,具体内容如下:1.详细介绍了稀疏表示模型与字典学习模型的基本原理,并分别讲解了稀疏表示领域与字典学习领域的一些经典的算法的步骤及优缺点,包括MP,OMP,MOD,KSVD等。接着介绍了子空间聚类与谱聚类的一些背景知识,并详细推导谱聚类的算法流程,为之后算法的改进奠定基础。2.综合阐述了一种基于谱聚类,稀疏表示,以及字典学习的子空间聚类算法,即隐子空间聚类算法(LSC),并详细介绍了算法的主要思想及相关的推导过程。3.针对隐子空间聚类算法的训练字典缺乏稳定性和判别性这一缺陷,提出了一种基于判别式字典学习的隐子空间聚类算法的改进算法(ILSC)。该算法在字典学习阶段利用一小部分训练样本的标签信息,改进字典学习模型,除了原有的重构误差项外新增稀疏编码误差项,构造出具有判别性的自适应字典,使得信号的稀疏表示更加准确,进而提高算法的聚类精度。4.ILSC算法为了增强字典判别性而新增了两个误差项,导致字典学习阶段的耗时也成倍增加,针对此缺陷,提出了一种基于增量式字典训练算法的ILSC算法的改进算法I2LSC。该算法引入增量式算法的思想,每次读取一小撮训练数据,增量式的更新字典及相应误差项,在保证字典判别性的同时大大缩减字典学习阶段的耗时。
[Abstract]:Clustering analysis, as a tool of data analysis, refers to the analysis process in which abstract data objects are aggregated to form multiple clusters. Cluster analysis has a wide range of applications in pattern recognition, machine learning, document retrieval, data mining and other fields. In recent years, with the popularity of the network and the development of computer image technology, a large number of image and video data have been added in the industry, and with the increasing requirements for video image clarity, hundreds of TB high-dimensional data have emerged. Most of the traditional clustering algorithms are designed for low-dimensional data, so it is difficult to deal with high-dimensional data efficiently. Subspace clustering algorithm, as an extension of traditional clustering algorithm, is an effective way to deal with high-dimensional data clustering. The main research content of this paper is to improve the hidden subspace clustering algorithm based on sparse representation, and then improve the clustering performance of the algorithm. The specific contents are as follows: 1. The basic principles of sparse representation model and dictionary learning model are introduced in detail, and the steps, advantages and disadvantages of some classical algorithms in sparse representation field and dictionary learning field are explained respectively, including MP,OMP,MOD,KSVD and so on. Then some background knowledge of subspace clustering and spectral clustering is introduced, and the algorithm flow of spectral clustering is deduced in detail, which lays the foundation for the improvement of the algorithm. 2. This paper comprehensively expounds a subspace clustering algorithm based on spectral clustering, sparse representation and dictionary learning, that is, hidden subspace clustering algorithm (LSC), and introduces in detail the main idea of the algorithm and the related derivation process. In order to solve the problem that the training dictionary of hidden subspace clustering algorithm is lack of stability and discrimination, an improved hidden subspace clustering algorithm based on discriminant dictionary learning, (ILSC)., is proposed. In the dictionary learning stage, the algorithm improves the dictionary learning model by using a small part of the label information of the training samples. In addition to the original reconstruction error term, the sparse coding error term is added to construct the discriminant adaptive dictionary, which makes the sparse representation of the signal more accurate, and then improves the clustering accuracy of the algorithm. 4. ILSC algorithm adds two error items to enhance dictionary discrimination. In order to solve this problem, an improved ILSC algorithm I2LSC based on incremental dictionary training algorithm is proposed. The algorithm introduces the idea of incremental algorithm, reads a handful of training data at a time, updates the dictionary and the corresponding error items incrementally, which greatly reduces the time consuming in dictionary learning stage while ensuring dictionary discrimination.
【学位授予单位】：江南大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【参考文献】