基于深度学习表征的图像检索技术

发布时间：2018-04-03 07:17

本文选题：图像检索　切入点：深度表征　出处：《中国科学技术大学》2017年博士论文

【摘要】：随着近年来数码相机及智能手机的大规模普及,以及存储设备容量的持续增加,多媒体内容特别是视觉数据呈现出爆发式的增长态势。因而,对于海量的视觉内容,如何进行迅速有效的检索一直是国内外学术与工业界的研究热点。早期的图像检索系统通常使用基于文本的查询方式,通过将用户提供的查询文本与互联网网页内容进行匹配,从而检索到与查询文本相关联的图像。随着计算机视觉技术的发展,基于内容的图像检索(Content-Based Image Retrieval,CBIR)在解析用户查询意图、增强用户体验等方面起到了与文本查询相互补的作用,同时也在商品搜索、地标检索、商标查重等商用场景中起到了突出的作用。深度学习技术在最近几年内呈现出了非常引人注目的研究进展。在图像内容表征方面,基于深度学习的图像表征(简称为深度表征)更是在众多的计算机视觉任务中表现出了优异的性能。而在深度学习的多种模型中,深度卷积神经网络(Convolutional Neural Network,CNN)则尤其擅长于对图像的内容进行抽象与描述,在图像检索领域获得了广泛的关注与深入的研究。不同于传统的图像表征方式,深度表征侧重于对图像进行语义层面的全局表达,通过一个端到端的模型提取图像中的重要信息,并使用紧凑的特征描述对图像内容进行有效的描述。尽管现有的基于深度表征的图像检索方法已经取得了令人瞩目的检索性能,但仍有一系列问题难以克服:(1)不同于传统的基于局部视觉特征的表征方法,深度表征在语义层面对图像进行整体的刻画,因而呈现出对局部细节表征不够突出,且对图像空间位置、几何形变比较敏感的特点;(2)基于局部表征的方法可以利用局部特征之间的空间关系对图像匹配进行几何校验,以实现更加精确的匹配,而深度表征则难以利用这一性质对检索性能进行增强;(3)现有的方法多使用具有人工标注的公共基准数据集对检索算法的性能进行验证,无法实现对任意查询实时响应的检索质量评估,不便于搜索引擎根据需要对检索结果进行修正。针对以上问题,本论文的研究内容围绕着基于深度表征的图像检索技术进行展开,包括如何构造良好的表征方式,如何对检索结果进行性能增强,以及如何对检索结果进行有效的实时评估。论文的创新点包括以下几点:(1)论文提出一种基于通用目标检测技术的深度表征方式,可充分结合深度学习的语义表征能力与图像显著区域的判别能力。首先,本文使用通用物体检测子在图像中检测出少量最有可能包含物体的区域,然后在这些区域中提取深度表征。同时为了对区域中的局部属性进行描述,本文在其中提取局部不变特征,并与深度表征进行融合,可实现更加丰富的图像表征。(2)论文提出在深度表征层面进行数据库增强与查询结果重排序,分别在线下索引阶段与线上查询阶段,以极小的计算与存储开销对检索性能进行增强。在索引阶段,本文利用数据库图像之间的相互关系,通过邻域信息实现无监督的特征更新,使得获得的特征具有更好的检索性能。在查询阶段,本文提出对初始检索结果进行残差表达,充分利用查询特征的邻域信息,对检索结果进行重排序。(3)论文提出一种基于检索结果相关性的方法对检索质量进行自动评估,并实现线上多检索结果选优等应用。对每个检索结果,本文通过其深度表征之间的相关性构造一特征矩阵,并使用卷积神经网络对检索质量进行回归学习。由多种表征方式获得的相关性矩阵可以拼接在一起,实现基于多特征融合的质量评估方法。本文从深度表征出发,分别在特征构造、线下索引、线上重排、质量评估等各方面对图像检索技术进行充分而全面的研究。论文分别从方法层面、实验层面与应用层面对所提出的方法进行阐述与验证,充分证明方法的可靠性与实用性。
[Abstract]:With the popularization of digital camera and intelligent mobile phone in recent years, and the storage capacity continues to increase, the multimedia content especially visual data showing explosive growth. Therefore, the visual content of the mass, how quickly effective retrieval has been a hot research in domestic and foreign academic and industrial circles. Early image retrieval the system usually use text query based on the matching, users query text and web content, so as to retrieve and query text associated with the image. With the development of computer vision technology, content-based image retrieval (Content-Based Image, Retrieval, CBIR) in the analysis of the user's query intention, and enhance the user experience. Play a complementary role and text query, but also in product search, standard search, check and other business with the trademark in the scene To highlight the role of deep learning technology. In recent years showing progress very compelling. In image content representation, image representation based on deep learning (referred to as the depth of characterization) is in many computer vision tasks showed excellent performance. A variety of models in the deep learning in depth, convolutional neural network (Convolutional Neural Network, CNN) is especially good at the content of image abstraction and description, received wide attention and in-depth research in the field of image retrieval. The image representation is different from the traditional, expression of global depth representation focuses on the semantic level of the image, by a end to end model to extract important information in the image, and use the compact feature description by description of image content. Although the image retrieval based on the existing depth characterization Cable method has achieved remarkable retrieval performance, but there are still a series of problems difficult to overcome: (1) different from the traditional method to characterize the local visual features based on depth representation in the semantic layer facing the overall image of the characterization, thus showing the details of characterization is not prominent, and the spatial location of image features. The geometric deformation sensitive; (2) method based on local representation can exploit the spatial relationship between local features of image matching, geometric calibration, matching to achieve more accurate, and the depth of characterization is difficult to use this property to enhance the retrieval performance; (3) the existing methods used with a common reference data manual mark set to verify the performance of the retrieval algorithm, can not achieve the retrieval quality evaluation of query real-time response to arbitrary, not easy to search engine according to the needs of the search results Amendment to the above problems, the research content of this paper around the depth image retrieval technology based on the characterization of, including how to construct the representation of good search results, how to enhance performance, and how to search results and effective real-time assessment. The innovation of this paper include the following: (1) paper a general representation depth based on target detection technique, can fully combine discriminative semantic representation capability and image saliency of deep learning. First of all, this paper use generic object detection in images detected in small amounts are most likely to contain the object area, then the depth of characterization extraction in these areas. At the same time in order to local property the area is described, in which the local invariant feature extraction, and integration with the depth of characterization, can achieve more rich image representation (2). This paper proposes enhanced database and query results reordering in depth characterization level, respectively, under the online index stage and online query stage, the retrieval performance is enhanced by computation and storage overhead minimum. In the indexing stage, using the relationship between the image database, through the implementation of neighborhood information unsupervised feature updates, the the retrieval has better performance. In the query stage, the initial retrieval results are proposed to make full use of the residual expression, the query feature of neighborhood information to re rank the search results. (3) this paper proposes a method based on the correlation of the search results automatically assess the quality of the retrieval, the retrieval results and realize multi line optimization for each application. The retrieval results, the correlation between the depth of the structure through the characterization of a feature matrix, and the use of convolutional neural network The retrieval quality regression. Correlation matrix obtained by multiple representations can be spliced together to achieve quality evaluation method based on multi feature fusion. In this paper, starting from the depth of characterization, respectively in the feature structure, line index, line rearrangement, quality evaluation, the parties face image retrieval technology fully and comprehensively studied. This paper respectively. From the aspect of method, experiment and application level in the proposed method of verification, proved the reliability and practicability of the method.

【学位授予单位】：中国科学技术大学
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】