图像特征表示的学习算法研究

发布时间：2018-08-14 16:32

【摘要】：在众多计算机视觉任务中,本质的难题之一是生成具有良好判别性的图像表示,即高性能的图像特征。由于图像特征不仅应对类内变化有足够的鲁棒性,而且应对类间变化有足够的判别性,因此设计优秀的图像特征是一项极具挑战性的工作。图像特征总体分为图像块层次特征和图像层次的特征(即局部特征和全局特征),前者用于描述一个图像块,后者用于描述一幅完整的图像。本文研究图像特征表示的学习方法,并分别提出了生成图像块特征和生成图像特征的新算法,用于提高场景／对象识别的性能。现将主要研究成果总结如下：(1)首先,本文提出了一种新的图像层特征表示,并用于图像分类。传统的词包(Bag-of-Words)模型完全丢弃了特征的空间分布信息,丧失了一定的判别能力。为此,我们提出了空间相关图(Spatial Correlogram)特征表示法,它通过捕获视觉词对在空间范围内共同出现的频率,描述了局部特征的空间分布信息,从而提高了图像识别的判别能力。然而该方法仍缺少对图像特征整体空间结构的描述,为了进一步提高该特征的区分度,我们又将相关图特征与空间金字塔模型结合,生成一种混合特征。在场景／对象数据库上的详细实验对比表明,本文提出的相关图特征和混合特征能取得相对于传统的词包模型更高的图像分类准确率。(2)其次,本文提出了一种新的图像块特征表示——高效的核描述子(Efficient Kernel Descriptor, EKD)。图像块特征的设计同样属于计算机视觉领域内的基本研究内容,优秀的图像块特征表示能够有效地提高图像分类、对象识别等相关算法的性能,但人为设计图像块特征间的差异往往不能足够理想地反映图像块间的相似性。核描述子(Kernel Descriptor, KD)方法提供了一种新的方式生成图像块特征,在图像块间匹配核函数基础上应用核主成分分析(Kernel Principal Component Analysis,KPCA)方法进行特征表示且在图像分类应用上获得不错的性能。然而,该方法需要利用所有联合基向量去生成核描述子特征,导致算法时间复杂度较高。为此,我们设计了高效的核描述子算法。算法建立在不完整Cholesky分解基础上自动选择少量的标志性(Pivot)联合基向量以提高算法效率,实验结果表明高效的核描述子(EKD)在图像／场景分类应用中相对原始核描述子(KD)获得了更加优秀的性能。(3)再次,在构建高效的核描述子(EKD)思路基础上,我们又提出了一种新的图像层特征表示——高效的层次化核描述子(Efficient Hierarchical Kernel Descriptor, EHKD)。原始核描述子(KD)特征只能用于描述图像块,因此Bo等在核描述子(KD)算法框架上提出了层次化核描述子(Hierarchical Kernel Descriptor, HI KD)用于描述整幅图像。但由于层次化核描述子(HKD)构造过程与核描述子(KD)构造过程类似,所以生成层次化核描述子(HKD)算法也会遇到生成核描述子(KD)算法中的计算效率问题。为了克服这个问题,我们设计了高效的层次化核描述子算法。该算法同样依赖不完整Cholesky分解,采用逐层递归方式调用计算高效核描述子(EKD)过程形成图像层次的特征表示。实验结果表明,高效的层次化核描述子(EHKD)相对于层次化核描述子(HKD)具有计算效率以及特征表示能力上的优势。(4)最后,本文提出了一种监督方式下的图像块特征表示——基于监督学习的高效核描述子(Supervised Efficient Kernel Descriptor, SEKD)。之前提到的无论是核描述子(KD)方法还是高效的核描述子(EKD)方法,都属于无监督学习的范畴,它们通过图像块间的相似度来设计图像块层次的特征,并且展示出了相对于手工设计的图像块特征在对象识别等领域更加优秀的性能。这两种方法都是从核的角度给出了梯度朝向直方图的解释,利用像素点的信息“长出”图像块层次特征。但这种方式最大的缺陷就是图像块间计算相似度时并没有考虑图像块本身的类标信息,因此设计一种监督模式下融入图像类标信息的特征学习方法是非常必要的。为此,我们提出了基于监督学习的高效核描述子算法,该算法以融合图像类标的不完整Cholesky分解算法为基础。实验结果表明,基于监督学习的高效核描述子(SEKD)相对于无监督方式下学习得到的特征具有表示维度更短,判别能力更强的优势。
[Abstract]:In many computer vision tasks, one of the intrinsic difficulties is to generate well-discriminatory image representation, i.e. high-performance image features. Since image features are robust enough to deal with intra-class variations and discriminant enough to deal with inter-class variations, designing excellent image features is a challenging task. Image features are generally divided into image block hierarchical features and image level features (i.e. local features and global features), the former is used to describe an image block and the latter is used to describe a complete image. The main research results are summarized as follows: (1) Firstly, a new image layer feature representation is proposed for image classification. The traditional Bag-of-Words model completely discards the spatial distribution information of features and loses some discriminant power. Spatial Correlogram (SCR) is a feature representation method, which describes the spatial distribution of local features by capturing the frequency of common occurrence of visual word pairs in the spatial range, thus improving the discriminant ability of image recognition. In addition, we combine the correlation graph features with the spatial pyramid model to generate a hybrid feature. Detailed experiments on the scene/object database show that the proposed correlation graph features and hybrid features can achieve higher image classification accuracy than the traditional word packet model. (2) Secondly, this paper proposes a new image classification method. Efficient Kernel Descriptor (EKD) is a new feature representation of image blocks. The design of image block features also belongs to the basic research content in the field of computer vision. Excellent image block feature representation can effectively improve the performance of image classification, object recognition and other related algorithms, but artificially designed images. Kernel Descriptor (KD) method provides a new way to generate image block features. Kernel Principal Component Analysis (KPCA) method is applied to feature representation based on matching kernel functions between image blocks. However, this method needs all joint basis vectors to generate kernel descriptor features, which results in high time complexity. Therefore, we design an efficient kernel descriptor algorithm. The algorithm is based on the incomplete Cholesky decomposition and automatically selects a small number of Pivot associations. The experimental results show that the efficient kernel descriptor (EKD) achieves better performance than the original kernel descriptor (KD) in image / scene classification applications. (3) Thirdly, on the basis of constructing an efficient kernel descriptor (EKD), we propose a new image layer feature representation, which is efficient. Efficient Hierarchical Kernel Descriptor (EHKD). Primitive Kernel Descriptor (KD) features can only be used to describe image blocks, so Bo et al. proposed Hierarchical Kernel Descriptor (HI KD) to describe the whole image in the framework of kernel descriptor (KD) algorithm. The construction process is similar to that of the kernel descriptor (KD), so the generation hierarchical kernel descriptor (HKD) algorithm will also encounter the computational efficiency problem in the generation kernel descriptor (KD) algorithm. To overcome this problem, we design an efficient hierarchical kernel descriptor algorithm. The experimental results show that the efficient hierarchical kernel descriptor (EHKD) has advantages over the hierarchical kernel descriptor (HKD) in computational efficiency and feature representation ability. (4) Finally, a supervised image block feature representation is proposed. Supervised Efficient Kernel Descriptor (SEKD). The previously mentioned kernel descriptor (KD) methods and efficient kernel descriptor (EKD) methods belong to the category of unsupervised learning. They design block-level features through similarity between image blocks and display them. Compared with the hand-designed image block features, these two methods give the interpretation of gradient-oriented histogram from the point of view of kernel, and use the information of pixels to "grow" the image block hierarchical features. Considering the label information of the image block itself, it is necessary to design a feature learning method which integrates the label information of the image in supervised mode. For this reason, we propose an efficient kernel descriptor algorithm based on supervised learning. The algorithm is based on the incomplete Cholesky decomposition algorithm which integrates the label information of the image class. Supervised Learning Efficient Kernel Descriptor (SEKD) has the advantage of shorter representation dimension and stronger discriminant ability than unsupervised learning.
【学位授予单位】：北京交通大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP391.41

【相似文献】