基于深度学习的图像多标签分类算法研究

发布时间：2018-07-12 15:07

本文选题：深度学习 + 卷积神经网络　；参考：《北京邮电大学》2016年硕士论文

【摘要】：随着移动互联网时代的到来,图像和视频数据急剧增长,造就了图像大数据时代的到来。这就导致传统的图像单标签分类技术已经无法满足对含有复杂语义图像的分类识别的需求,亟待分类识别速度快、精度高的多标签分类技术的出现。本文针对图像多标签语义分类过程,研究其图像预处理、特征提取和多标签分类器训练算法。良好的图像特征对图像分类识别系统的性能至关重要,而图像预处理结果的好坏严重影响到能否提取到图像的本质特征以及提取过程的复杂度,同时多标签分类器决定整个分类识别系统能否充分利用上述获取的良好特征和标签本身的特性以提高最终的分类识别性能。本文的主要工作如下:1.阐述了图像预处理基本理论,针对图像具有尺度不一、像素之间具有很强的相关性及高维度等特性,以及对比度差异可能对图像特征提取过程具有不良影响等问题,提出了联合使用图像尺度归一化、亮度和对比度归一化、白化等技术预处理图像。2.阐述了多标签分类基本理论,得出充分利用标签间相关性对提高分类性能具有重要作用的结论,针对RAkEL算法需要设置较多参数及较多数据的交叉验证才能获得最佳性能的不足问题,提出了采用基于GPU的并行交叉算法。该算法充分利用GPU强大的并行运行能力,同时执行针对不同的参数验证过程,从而提高了训练速度。3.阐述了深度学习的基本理论,从隐含层层数、权值共享等方面重点阐述了卷积神经网络(CNN)模型,本文最终采用一个输入层、三个卷积层及三个特征映射层的卷积神经网络结构,同时采用池化技术低维度化特征向量以避免训练时的过拟合问题。接着提出了本文的核心算法CNN-RAkEL算法,并阐述了 CNN和RAkEL的结合原理及基于CNN-RAkEL的多标签分类系统的学习训练过程。基于GPU和Pylearn2的深度学习模型库在PASCALVOC2007图像数据库上进行系统试验和参数调优。仿真实验表明在图像多标签分类领域,本文提出的基于CNN-RAkEL的多标签分类系统识别率比CNN-SVM(PASCAL VOC 2007的最佳水平保持者)提高高达9.416个百分点。
[Abstract]:With the advent of the era of mobile Internet, the rapid growth of image and video data has brought about the arrival of the era of large data data. This leads to the fact that the traditional image single label classification technology has been unable to meet the needs of classification and recognition containing complex semantic images. It is urgent to classify and recognize the fast and high precision multi label classification technology. In this paper, the image preprocessing, feature extraction and multi label classifier training algorithm are studied for the image multi label semantic classification process. The good image features are very important to the performance of the image classification and recognition system, and the quality of the image preprocessing results seriously affects the essential characteristics of the image extraction and the complexity of the extraction process. At the same time, the multi label classifier determines whether the whole classification recognition system can make full use of the good features obtained and the characteristics of the tag itself to improve the final classification performance. The main work of this paper is as follows: 1. the basic theory of image preprocessing is described, and the image has a different scale, and there is a strong correlation between pixels. And the high dimension and other characteristics, and the contrast difference may have a bad influence on the process of image feature extraction, and put forward the combined use of image scale normalization, brightness and contrast normalization, whitening and other technology preprocessing image.2. to explain the basic theory of multi label classification, and make full use of the correlation between labels to improve the classification. The conclusion is that the RAkEL algorithm needs to set more parameters and cross validation of more data to obtain the problem of optimal performance. A parallel cross algorithm based on GPU is proposed. The algorithm makes full use of the powerful parallel running ability of GPU, and the execution of the needle to different parameters is improved. The training speed.3. expounds the basic theory of deep learning, and focuses on the convolution neural network (CNN) model from hidden layers, weight sharing and so on. In this paper, a convolution neural network structure with an input layer, three coiling layers and three feature mapping layers is adopted, and the low dimension eigenvector of pool technology is used to avoid the structure of the convolution neural network. The problem of overfitting in training is given. Then the core algorithm CNN-RAkEL algorithm is proposed, and the combination principle of CNN and RAkEL and the learning and training process of the multi label classification system based on CNN-RAkEL are introduced. The system experiment and parameter tuning of the GPU and Pylearn2 based depth learning model base on the PASCALVOC2007 image database are carried out. True experiments show that the recognition rate of the multi label classification system based on CNN-RAkEL is up to 9.416 percentage points higher than the CNN-SVM (the best level holder of PASCAL VOC 2007) in the image multi label classification field.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.41;TP181

【参考文献】