基于深度学习的图像特征学习和分类方法的研究及应用

发布时间：2018-07-07 18:04

本文选题：深度学习 + 特征学习　；参考：《华南理工大学》2016年博士论文

【摘要】：图像分类是计算机视觉领域热门研究方向之一,也是其他图像应用领域的基础。图像分类系统通常分为底层特征提取、图像表达、分类器这三个重要组成部分。其中,特征往往是决定整个系统优劣的重要部分,良好的特征能够准确地提取出有利于解决问题的信息。要设计一个有效的特征往往需要相应领域的先验信息,因此研究者们提出了各种针对自身领域的特征。但是如果采用这些底层特征直接进行像大规模图像分类,常常会达不到很好的效果。另外,底层特征需要耗费大量时间设计和调优,这使得底层特征的发展比较缓慢。底层特征难以设计和调优的瓶颈使得图像分类领域难以更进一步。因此研究者们从设计特征转而研究学习特征,希望能够从图像中自动地学习出有效的特征。研究发现利用深度卷积网络能够从海量的图像中自主地学习出底层到高层的特征,并使得图像分类任务接近人类的水平。因此,特征学习成为了图像分类领域的重点方向,且具有广泛的应用价值。针对图像分类中特征学习的问题,本文沿着将单层特征学习扩展到多层特征学习,并将深层特征学习方法应用到实际问题这一路线,对特征学习进行了研究,主要研究内容和创新点如下:1.研究了单层特征学习方法和多层特征学习与分类方法,将受限玻尔兹曼机、自动编码机、稀疏编码和子空间学习都作为单层特征学习方法进行研究。通过研究多层特征学习与分类方法,我们可以将有监督的单层特征学习方法应用到卷积网络中。2.本文提出了基于流形学习的逐层鉴别式特征学习方法——DLANet。该特征学习方法采用了卷积网络结构,将鉴别式局部配准(Discriminative Locality Alignment,DLA)用于学习卷积结构中的滤波器组,使得特征在降维后的子空间中有更好的鉴别性。我们将DLANet特征作为底层特征用于LLC-SPM图像分类框架中,并应用到场景分类任务上。我们在NYU Depth V1、Scene-15和MIT Indoor-67三个场景分类数据集上进行了实验,实验结果表明可学习的DLANet特征优于其他手工特征,同时也优于同类的PCANet特征和LDANet特征。本文提出的场景分类系统与其他方法相比也是可比的。3.本文提出了一个新的训练深度神经网络准则,最大间隔最小分类误差(Max-margin Minimum Classification Error,M3CE)。不同于Softmax和交叉熵准则,最小分类误差(Minimum Classification Error,MCE)准则希望提升标注对应的后验概率并降低混淆类别的后验概率。为了能够更好地训练深度网络,防止梯度弥散,我们改进了MCE中的损失函数提出了M3CE。我们在MNIST和CIFAR-10数据集上进行实验,实验表明M3CE作为交叉熵的有效补充能够取得较好的结果。4.本文将深度卷积网络应用到文本行语言分类和手写印刷体分类问题。为了更好地训练卷积神经网络以适应文本行数据库,本文提出了文本行输入方式,该技术能够同时处理三个尺度的文本行。通过这个技术,卷积网络能够在训练时覆盖更多的文本内容从而学习到更具鉴别性的特征。本文提出文本行图片自重现机制(Self-Reappeared Padding Scheme,SRPS)来解决样本不足的问题。另外,为了同时解决解决语言分类和手写印刷体分类两个问题,本文提出了两阶段多任务学习框架来学习得到鲁棒的共享特征。最后,本文在3种卷积神经网络结构上试验并分析本文提出的方法。实验结果表明文本行输入方式能够明显地提升识别率,而两阶段多任务学习得到的卷积神经网络分别在语言分类和手写印刷体分类问题上获得高于95%和99%的准确率。
[Abstract]:Image classification is one of the hot research fields in the field of computer vision, and it is also the basis of other image application fields. The image classification system is usually divided into three important components, the underlying feature extraction, the image expression and the classifier. Among them, the feature is often the important part of the whole system, and the good feature can be extracted accurately. It is beneficial to solve the problem of the problem. To design an effective feature often requires a prior information in the corresponding domain, so the researchers have proposed a variety of characteristics for their own domain. But if these underlying features are used directly to classify a large scale image directly, it often fails to achieve good results. In addition, the underlying features need to be made. It takes a lot of time to design and tune, which makes the development of the underlying features relatively slow. The bottleneck in the design and optimization of the underlying features makes it difficult to further the image classification field. Therefore, the researchers turn from the design features to the learning features, hoping to learn the effective features automatically from the images. The degree convolution network can learn the characteristics of the bottom to the high level from the massive image, and make the image classification task close to the human level. Therefore, the feature learning has become the key direction of the image classification field and has a wide application value. Learning is extended to multi-layer feature learning, and the deep feature learning method is applied to the practical problem. The main research content and innovation are as follows: 1. the single layer feature learning method and multi-layer feature learning and classification method are studied, and the limited Boltzmann machine, automatic coding machine, sparse coding and subdivision are carried out. Spatial learning is studied as a single feature learning method. By studying multi-layer feature learning and classification methods, we can apply a supervised single layer feature learning method to convolution network (.2.) in this paper, a hierarchical feature learning method based on manifold learning is proposed in this paper, DLANet. is used in the feature learning method. Discriminative Locality Alignment (DLA) is used to learn the filter banks in the convolution structure, which makes the feature better in the subspace after reducing the dimension. We use the DLANet feature as the underlying feature in the LLC-SPM image classification framework and apply it to the scene classification task. Experiments are carried out on three scene classification data sets of NYU Depth V1, Scene-15 and MIT Indoor-67. The experimental results show that the learning DLANet features are superior to other handmade features, and are also superior to the PCANet features and LDANet features of the same kind. The proposed scene classification system is also a comparable.3. article. The new training depth neural network criterion, the maximum interval minimum classification error (Max-margin Minimum Classification Error, M3CE). Unlike the Softmax and the cross entropy criterion, the minimum classification error (Minimum Classification Error, MCE) is expected to increase the posterior probabilities corresponding to the annotation and reduce the posterior probability of the confusion category. Well training depth network and preventing gradient dispersion, we improved the loss function in MCE and proposed M3CE.. We carried out experiments on MNIST and CIFAR-10 data sets. The experiment shows that M3CE is an effective complement to cross entropy and good results can be obtained..4. in this paper, the depth convolution network is applied to text line language classification and handwriting printing. In order to better train the convolution neural network to adapt to the text row database, this paper proposes a text line input method, which can handle three scales of text simultaneously. Through this technique, the convolution network can cover more text content in training to learn more discriminative features. Self-Reappeared Padding Scheme (SRPS) is used to solve the problem of lack of sample. In addition, in order to solve two problems of language classification and handprint classification, this paper proposes a two stage multi task learning framework to learn robust sharing features. Finally, this paper is in 3 convolution neural networks. The experimental results show that the text row input method can obviously improve the recognition rate, and the convolution neural network obtained in the two stage multitask learning obtains the accuracy of higher than 95% and 99% on the classification of language classification and the handprint classification.
【学位授予单位】：华南理工大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP391.41

【相似文献】