面向检索的图像深度表示和编码

发布时间：2018-04-25 09:36

本文选题：基于内容的图像检索 + 深度神经网络　；参考：《中国科学技术大学》2017年硕士论文

【摘要】：随着移动互联网时代的到来和多媒体技术不断快速的发展,互联网上拥有着以图像为代表的海量多媒体数据,并且每天都在快速增加着。而在互联网上对于这些海量的图像数据,通常的做法是将其转化为二进制比特流来进行存储以便节约存储空间。而对于互联网上的一个重要的图像数据使用场景:基于内容的图像检索(Content BasedImage Retrieval,CBIR),常见的做法也是提取图像特征向量,将特征向量转化为二进制比特流并存储。这两部分产生的码字分别存储,在多媒体数据量爆炸的今天,需要消耗大量的系统计算和存储资源。图像压缩部.分和图像特征可以使用同一套码字吗?如果可以使用相同的码字来实现码字复用的话,在海量数据的背景下,可以节约大量的系统计算和存储资源的消耗。对于图像压缩而言,存在很多例如标准jPEG等的经典图像压缩标准。这些压缩的方式是保留图像中的主要信息量而丢弃图像中的那些相对不重要的信息量,从而实现图像压缩的目的。图像检索亦然,提取出的图像特征向量也是保存着图像的主要信息量,因此可以用于检索这一任务。既然双方的工作都是提取图像的主要信息,并将其使用二进制比特进行存储。那么存储两种信息量之间必然存在一定的信息冗余,如何减少甚至消除两者之间存在的信息冗余,减少对于存储的二进制码字所需要的系统资源消耗,是本研究的目的。衡量信息冗余是否被消除的方式有两种:一种是在工作性能没有降低的情况下,编码出的码流大小小于单独压缩码流和特征码流之和;另一种则是在编码码流大小等于单独压缩和特征码流之和时,提升了码流的工作性能。在本研究中我们选择了第二种衡量方式作为研究目的和实验方法。为了解决该问题,我们提出了一种统一的图像深度编码方式。针对互联网上典型的基于内容的图像检索场景,图像搜索引擎,压缩和恢复的对象都是小尺度缩略图。使用深度神经网络对输入的图像进行编码,使得编码出的码字在既可以重建出原缩略图像的情况下,也能直接的被用于图像检索。检索时不同图像之间的相似度使用二进制码字间的汉明距离来定义。这样得到的编码系统实现了码字的复用,从而减少了图像压缩与检索两者之间的信息冗余。首先我们训练一个可以用于缩略图压缩的卷积神经网络编码器,其将缩略图压缩为二进制比特流,并且可以通过解码器将这些二进制比特流解压恢复出原缩略图。接着我们会训练一个用用用来提取图像特征的深度神经网络,并将特征加以量化。量化后的二进制特征一方面可以进制比特的方式存储,另一方面也可以用于图像检索。然后我们将两部分的网络加以结合,使用基于内容的图像检索中的三元组图像数据对网络进行参数微调,使得两部分产生的整体码字均使用到图像检索的工作中。在相关的实验测试中,对于图像压缩部分,我们训练的统一编码系统可以将32 × 32 × 3的缩略图压缩至压缩比5.3,与标准JPEG相比,在恢复重建效果相当时,压缩效率高于标准JPEG。,而在基于内容的图像检索的实验中,统一的编码系统得到的码字检索效果优于仅仅使用图像特征提取器得到的二进制特征向量。这样在没有使用额外码字空间的情况下,提升了在图像检索方面的效果,相对而言减少了图像压缩与检索两者之间的信息冗余。我们的工作为图像同时压缩和检索指明了一个非常有前景的方向。
[Abstract]:With the advent of the era of mobile Internet and the rapid development of multimedia technology, the Internet has a large number of multimedia data represented by images, which are rapidly increasing every day. On the Internet, the common practice is to convert them into binary bitstreams for storage so that they can be stored in the Internet. Saving storage space. For an important image data use scene on the Internet: Content BasedImage Retrieval (CBIR), the common practice is to extract image feature vectors, transform the feature vectors into binary bitstreams and store them. These two parts are stored respectively in the number of multimedia numbers. A large amount of system computing and storage resources need to be consumed today. Image compression. Can the same code be used in the image compression division. If the same code word can be used to reuse the code word, a large amount of system computing and storage resources can be saved in the background of massive data. In terms of shrinkage, there are many classic image compression standards such as standard jPEG. These compression methods are to retain the main amount of information in the image and discard the relatively unimportant amount of information in the image so as to achieve the purpose of image compression. Image retrieval is also an image feature vector extracted from the image as well as the main letter of the image. It can be used to retrieve this task. Since both sides work to extract the main information of the image and use the binary bits to store it, then there must be some information redundancy between the two kinds of information, how to reduce or even eliminate the redundant information stored between the two, and reduce the binary storage binary. There are two ways to measure whether information redundancy is eliminated. One is that the size of the coded stream is less than the sum of the single compressed and characteristic stream, and the other is that the size of the coded stream is equal to the single compression and the feature stream. In order to improve the performance of the stream, we choose second methods of measurement in this study as research purposes and experimental methods. In order to solve this problem, we propose a unified image depth coding method. The object is the small scale thumbnail. Using the deep neural network to encode the input image, the coded codeword can be used directly to the image retrieval when the original abbreviated image can be rebuilt. The similarity between the different images is defined by the Hamming distance between the two input code words. The coding system realizes the reuse of the code word, thus reducing the information redundancy between the image compression and the retrieval. First, we train a convolutional neural network coder that can be used for the compression of the thumbnail, which compresses the thumbnail to the binary bit stream and can be decompressed and recovered by the decoder by the decoder. The original thumbnail. Then we will train a deep neural network used to extract the features of the image and quantify the features. The quantized binary features can be stored in the form of a bit, on the other hand, and the other can be used for image retrieval. Then we combine the two parts of the network to use a content based graph. Like the three tuple image data in the retrieval, the parameters of the network are adjusted to make the total codewords produced by the two parts are used in the work of image retrieval. In the related experimental tests, the unified coding system we trained can compress the 32 * 32 * 3 contraction to the compression ratio 5.3 for the image compression part, compared with the standard JPEG, When the restoration and reconstruction effect is equal, the compression efficiency is higher than the standard JPEG., and in the content based image retrieval experiment, the codeword retrieval effect of the unified coding system is better than the binary feature vector only obtained by using the image feature extractor. The effect of the cable reduces the information redundancy between image compression and retrieval. Our work indicates a very promising direction for image compression and retrieval.

【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】