大规模多模态多标签数据哈希方法研究

发布时间：2018-06-08 18:24

本文选题：哈希 + 多模态　；参考：《山东大学》2017年硕士论文

【摘要】：近年来,随着我国和世界移动互联网技术和移动设备的不断加速发展,数据的规模越来越庞大,数据的存储方式和种类也越来越多种多样。多模态多标签数据的处理在生活中变得十分重要,例如网页、新闻等都常表示为文字、图片和视频的组合,每个网页也常具有数个关键字作为标签。对于如何在多模态数据中搜索到合适的内容,应此而生的跨模态检索已经成为一个紧要的问题。哈希方法由于其在存储性能和计算性能的优越性的存在,尤其适合解决此类问题。哈希方法是将原始数据的特征映射到海明空间,得到它们的二值编码,通过计算数据的哈希码之间的海明距离得到检索结果,大大提高了数据的检索效率;同时,以哈希码代替原始数据存储,也极大提高了空间存储效率。哈希方法根据其模型是否使用样本的监督信息训练数据可分为:有监督哈希方法、半监督哈希方法和无监督哈希方法。无监督哈希方法采用无标签的数据进行训练来得到相应的哈希码。有监督哈希方法采用带标签的数据来提升检索的性能。半监督哈希方法采用部分数据的监督信息,相对于无监督哈希方法使用了标签等信息提升了性能,相比监督哈希方法对数据要求更低,不必所有数据都含有标签。然而在现实生活中,上述单模态哈希方法并不总能有效解决问题,于是多模态哈希方法因此产生。多模态哈希方法主要目的是利用一个模态中的数据检索另一模态中的相似数据,其主要形式是检索另一模态中与某一哈希码海明距离相近的哈希码。在大规模数据广泛利用的今天,对多模态多标签数据进行检索具有良好的应用前景和较高的学术价值,同时哈希方法和多标签学习所利用的机器学习也是当前互联网行业以及计算机科技产业的研究热点,具有很大商业价值。优秀的多模态哈希算法在网页检索、图片检索等领域可以提高检索速度,进而提升用户体验,具有广泛的应用前景和价值。不同于传统的单一标签学习,多标签学习的训练样本包含多个标签,更加贴近现实生活中的情况,然而对其应用单标签学习方法并不能得到良好的反回结果,且多标签学习更加昂贵和耗时。利用哈希方法对多标签数据进行检索可以降低检索的时间复杂度和空间复杂度。本文旨在使用哈希方法对大规模多模态多标签数据进行跨模态检索。针对多模态多标签数据集设计哈希算法,利用图片检索数据库中的相似文本和利用文本检索数据库中的相似图片。以往提出的多种多模态哈希方法很少考虑多标签的影响,或者仅仅利用标签构建一个简单的相似性矩阵,即当两个样本具有至少一个相同标签时使矩阵的对应项为1,否则另其值为0。而事实上这样的方法不能充分利用多标签所含有的信息,我们期望找到一个能够充分利用多标签所含信息的模型来提高方法的性能。本文假设每一个标签对应海明空间中的一个哈希码,而数据样本的哈希码可以由标签和标签哈希码的线性组合来产生。对于测试样本,我们在最小化相似样本对间的海明距离的同时,最大化相异样本对间的海明距离,利用产生的投影矩阵来获得测试集的哈希码。在学习到哈希码之后,由于学到的哈希码是由锚点的线性组合产生的,所以我们认为这些哈希码包含大量的标签信息,所以我们拓展哈希码的应用范围,对这些哈希码应用多标签分类函数,即用哈希码作为多标签分类的训练样本特征。由于特征使用的是压缩后的二进制编码,所以其还具有开销低、速度快等优点。我们将上述方法在三个常用的公开数据集上与现有的最新多模态哈希方法进行比较,实验结果证明本方法的性能优于作为对比的各个多模态哈希方法。我们同样使用哈希码做了多标签分类的对比试验,实验证明我们的方法是有效的。
[Abstract]:In recent years, with the rapid development of mobile Internet technology and mobile devices in our country and the world, the scale of data is becoming more and more large, and the ways and types of data storage are becoming more and more diverse. The processing of multimodal and multi label data is becoming very important in life, such as web pages, news and so on as words, pictures and videos. Each web page often has several keywords as labels. For how to search suitable content in multimodal data, cross modal retrieval that should be generated by this method has become a critical problem. Hash method is especially suitable for solving such problems because of its superiority in storage performance and computing performance. The method is to map the characteristics of the original data to the Hamming space, get their two value code, get the retrieval results by calculating the hash distance between the hash codes of the data, greatly improve the retrieval efficiency of the data. At the same time, the hash code is used instead of the original data storage, and the space storage efficiency is greatly raised. Hash method is based on the model of the hash. The supervised hash training data can be divided into: supervised hash method, semi supervised hash method and unsupervised hash method. Unsupervised hash method is used to train the corresponding hash code using unlabeled data. The supervised hash method uses the labeled data to improve the performance of the retrieval. Semi supervised hash method is used. Compared with the unsupervised hash method, the supervised hash method uses the label information to improve the performance compared with the unsupervised hash method. Compared with the supervised hash method, the data are not required to contain all the tags. However, in real life, the single mode hash method does not always effectively solve the problem, so the multimodal hash method is therefore the result. The main purpose of the multimodal hash method is to retrieve the similar data in another mode by using the data in one mode, the main form of which is to retrieve the hash code that is similar to a hash code in another mode. Today, it is well used to retrieve multi-modal and multi label data in the extensive use of large-scale data. The prospect and high academic value, and the machine learning used by hash method and multi label learning is also the research hotspot of the Internet industry and the computer science and technology industry. It has great commercial value. The excellent multimodal hash algorithm can improve the retrieval speed in the web search, picture retrieval and so on, and then improve the user's body. It has wide application prospect and value. Unlike traditional single label learning, the training sample of multi label learning contains multiple labels, which is more close to the real life. However, the application of single label learning method can not get good back results, and the multi label learning is more expensive and time-consuming. The retrieval of multi label data can reduce the time complexity and space complexity of the retrieval. This paper aims to use hash method to carry out cross modal retrieval on large-scale multi-modal and multi label data. Similar pictures in the library. Many of the previous multimodal hash methods rarely consider the influence of multi label, or simply construct a simple similarity matrix using labels, that is, when two samples have at least one same label, the corresponding term of the matrix is 1, otherwise the other is 0.. In fact, the method can not be fully utilized. We expect to find a model that can make full use of the information contained in multiple tags to improve the performance of the method. This paper assumes that each tag corresponds to a hash code in the sea space, and the hash code of the data sample can be produced by a linear combination of labels and label hash codes. For test samples, We consider the hash code of the test set to be obtained by minimizing the Hamming distance between the similar samples, maximizing the hash distance between the different samples and using the generated projection matrix. After learning the hash code, the hash codes learned are produced by the linear combination of the anchors, so we think that these hash codes contain a large number of marks. We sign the information, so we extend the application range of hash code and apply the multi label classification function to these hash codes, that is, the hash code is used as the training sample feature of the multi label classification. Because the feature uses the compressed binary encoding, it also has the advantages of low overhead and fast speed. We use the above method in three common common public codes. The experimental results show that the performance of this method is better than the various multimodal hash methods used as contrast. We also use hash code to make a comparative test of multi label classification. The experiment proves that our method is effective.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【相似文献】