面向大规模多模态图像检索的关键技术研究及系统实现

发布时间：2018-05-18 17:31

本文选题：哈希 + 多模态　；参考：《山东大学》2017年硕士论文

【摘要】：随着互联网的广泛普及以及信息技术的不断进步,相较于文字,人们越来越倾向于使用图像来表达、传递和获得信息。由此,互联网上图像的数量呈现出爆炸性增长趋势,应用大规模图像数据面临着巨大的挑战。如何快速且准确的从大规模图像数据中检索到相似图像是图像检索领域长期以来研究的热点问题之一。机器学习是处理大规模数据的重要工具之一,其中哈希学习由于其可以减少数据的存储空间以及优越的计算性能,在近年来成为研究热点。哈希学习将高维空间的数据经过映射到低维的汉明空间,得到紧凑的二进制码(哈希码),从而减少数据的存储空间;在获得哈希码的同时,使得学习得到的哈希码保持住原始空间数据间的相似性,在进行检索时,只需要计算哈希码间的汉明距离,就可以快速的得到数据间的相似度,降低计算速度,从而提升检索效率。本文在谱哈希的基础上提出一种面向多模态图像检索的离散哈希方法(Discrete Multi-view Hashing,简称DMVH)。该方法可以利用图像丰富的多模态信息提升检索的性能。首先,提取图像的多模态特征(如GIST、SIFT),并对多模态特征进行预处理,使得多模态特征的维数保持一致。然后,使用一种新的构建图像数据间相似性矩阵的方法,在保留数据局部结构相似性的同时,保持住数据之间的语义相似性。最后,将高维空间的数据通过学习得到的映射矩阵映射到低维空间得到哈希码。由于哈希码是离散的,直接对其进行优化比较困难。基于此,又引入两个辅助变量,使得在优化过程中不用松弛离散条件,减小优化误差,从而得到更高质量的哈希码。本文在三个公开的数据集上对DMVH的性能进行验证,并且与几个较先进的哈希方法进行比较,实验结果表明本文DMVH的性能要优于所比较的哈希方法的性能。最后,以DMVH为核心算法,设计并实现了多模态图像检索系统。该系统主要是将"以图搜图"的功能提供给用户。用户将所需要检索的图像上传到系统中,在对图像进行特征提取、特征融合后,使用在DMVH学习过程中得到的哈希函数将融合过的特征映射到汉明空间,生成哈希码,并计算该哈希码与数据库中已保存的哈希码之间的汉明距离,将与检索图像哈希码汉明距离小的哈希码对应的图像作为检索结果返回给用户,最终实现"以图搜图"的检索功能。
[Abstract]:With the popularity of the Internet and the continuous progress of information technology, people are more and more inclined to use images to express, transfer and obtain information. As a result, the number of images on the Internet is increasing explosively, and the application of large scale image data is facing a great challenge. How to retrieve similar images quickly and accurately from large scale image data is one of the hot issues in the field of image retrieval for a long time. Machine learning is one of the most important tools to deal with large-scale data. Hash learning has become a hot research area in recent years because it can reduce the storage space of data and improve computing performance. Hash learning maps high-dimensional space data to low-dimensional hamming space, resulting in compact binary code (hash code), which reduces the storage space of data. The learning hash code keeps the similarity between the original spatial data. In the retrieval, only the hamming distance between the hash codes is calculated, the similarity between the data can be obtained quickly, and the computing speed can be reduced. In order to improve the efficiency of retrieval. Based on spectral hashing, a discrete hash method for multimodal image retrieval is proposed in this paper. This method can improve the performance of retrieval by using abundant multi-modal information. Firstly, the multi-modal features (such as GIST / sift) are extracted, and the multi-modal features are preprocessed to make the dimension of the multi-modal features consistent. Then, a new method of constructing similarity matrix between image data is proposed to preserve the similarity of the local structure of the data and the semantic similarity of the data at the same time. Finally, the data of high dimensional space is mapped to the low dimensional space by learning the mapping matrix to obtain the hash code. Since hash codes are discrete, it is difficult to optimize them directly. Based on this, two auxiliary variables are introduced, so that the optimization error can be reduced without loosening discrete conditions in the optimization process, and a higher quality hash code can be obtained. This paper verifies the performance of DMVH on three open datasets and compares it with several more advanced hash methods. The experimental results show that the performance of DMVH in this paper is better than that of the compared hash method. Finally, a multimodal image retrieval system is designed and implemented with DMVH as the core algorithm. The system mainly provides the function of "map searching" to users. Users upload the images they need to retrieve to the system. After feature extraction and feature fusion, the fused features are mapped to hamming space by using the hash function obtained in the process of DMVH learning, and the hash codes are generated. The hamming distance between the hash code and the stored hash code in the database is calculated, and the image corresponding to the hash code corresponding to the small hamming distance of the retrieval image hash code is returned to the user as the retrieval result. Finally, the retrieval function of "search map by map" is realized.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】