基于稀疏编码理论的图像多标签排序算法研究

发布时间：2018-09-05 18:42

【摘要】：在当今高速互联网时代,众多数字影像设备的普及,加上互联网技术的进步,互联网图像如今在我们的生活中扮演着越来越重要的角色,网络图像搜索也已经成为一个计算机视觉领域内非常活跃和相当具有挑战性的研究课题。需要指出的是,与十年前的情况不同,现在的互联网使得数字图像可以很容易地创建、上传、共享和分布在互联网上。比如Facebook,YouTube,Flickr等社群媒体允许图像的上载者提供一组能够描述该图像的关键词(亦称Social Tags),后系统利用这些关键词来索引图像,由于图像的语义标注是通过网络由用户共同协作来完成的,因此这类图像集合也被称为Collaboratively-TaggedImages。这些标注信息一方面可以直接作为Web图像索引,同时也可被用于自动图像标注技术研究中的训练样本。由于Flickr等图像共享网站上存在着大量的带标签的图像集合,这种基于社群标注(Social Tagging)的共享方式将会极大地改善互联网海量图像组织及检索的性能,因而如何更加有效地利用这些带标签的图像集合成为改善自动图像标注性能的关键问题之一。需要指出的是,用户通常是按照随机的顺序(Random Order)来上传图像对应的标签,即用户所提交的标签集合往往并不按照标签与图像内容的语义相关性(Tag Relevance)的大小进行排序。另外,用户标注的关键词标签集合中存在大量的噪声标签(Noisy Tags),而目前Flickr尚没有提供基于相关性(Relevance-based Ranking)的检索排序机制。这种标签集合随机排序的特性制约了海量图像检索性能的更进一步的应用。目前Flickr图像共享网站提供了两种图像排序方式:1.Most Recent:即按照用户上传图像的时间戳排序;2.Most Interesting:即按照用户点击率、评论数量等排序,但Flickr目前尚不能提供按照语义相关度检索的模式。如何依据关键词表征图像的相关度(Relevance)大小实现标签排序(Tag Ranking)成为了新的研究热点。换句话说,尽管基于社群标注的共享方式极大地改善了互联网海量图像组织及检索的性能,然而用户通常是按照随机的顺序(Random Order)来上传图像对应的标签,即用户所提交的标签集合往往并不按照标签与图像内容的语义相关性(Tag Relevance)的大小进行排序。这种标签集合随机排序的特性制约了海量图像检索性能的更进一步的应用,因此标签排序正逐渐成为多媒体研究领域的一个新的热点。需要指出的是,经过语义相关性排序后的图像集合,可以作为表征语义关键词的有效训练样本,更好地解决区域层图像标注中的小样本学习问题。如前所述,社群标签在互联网上蔚然成风,已经成为一种捕述、归类、检索内容的流行的方式,并已经在实际的社群媒体系统的管理和检索中获得了成功的应用。鉴于社群标签对于网络图像检索的重大意义,越来越多的研究人员针对社群图像标签展开研究。尽管用户为社群图像提供了标签来描述图像的内容,由于这些标签是由不同的文化背景、知识结构的网络用户按照自己对图像内容的主观理解手工输入的,因此社群图像的标签的质量尚不能直接作为可靠的图像索引关键词进行基于关键词的图像检索。目前社群图像的标签主要存在着标签排列的无序性、标签内容的不精确性等问题,因此带标签的社群图像语义理解主要围绕着改善标签排序和标签内容的精准度等方面展开。目前已有部分研究机构(如MSRA)针对Tag Ranking问题展开研究。由于一幅图像可能同时标注有若干个语义语义概念标记,这是一个典型的多标记学习问题,图像本身具有一定程度的语义歧义性。而实现标签集合按照语义相关度进行排序,则抽象为一个典型的多标记排序(Multi-Label Ranking)问题。目前针对多标记学习的研究较多,而针对多标记排序问题的研究还相对较少。与多标记排序问题相似的研究包括样本典型性排序(Typicality Ranking)和标签排序(Tag Ranking)。现有的针对标签排序(Tag Ranking)的算法大多着力于基于标签相关度的标签排序(Relevance-based Tag Ranking)。直观地,给定一幅图像和标注的标签集合,若标签集合中的某个标签A的相关度高于标签B,则说明给定图像表征标签A的典型性高于表征标签B的典型性,即标签A更能表征图像的语义内容。换言之,通过计算得到的给定图像的K近邻图像子集中,标签A在子集中出现的频率要更高。这类算法主要有两种代表性工作。(1)基于统计模型(Statistical Modeling)的排序算法;(2)基于数据驱动(Data-driven)的算法。基于统计模型的排序算法利用核密度估计思想估计出图像中每个标签与图像本身的语义相关度,其本质就是估计样本的典型性(Typicality),如果图像中表征某个语义标签的区域的低层视觉特征较为典型,即其与其它标有相同标签的区域的特征向量在特征空间中较为接近,则该标签的语义相关度就高;同时考虑到标签之间的语义相关性,采用随机游走算法改善排序结果,实现最终的标签排序。然而算法使用基于全局低层视觉特征来表征具有多标签语义的图像,因此无法较好地估计出每个标签在特征空间的密度值。而基于数据驱动的排序算法通过简单的图像全局特征匹配得到给定图像的近邻图像子集,通过近邻投票策略(Neighbor-voting)统计其标签序列中每个关键词的出现频度,按照频度的高低实现对标签序列的排序。与基于统计模型的排序算法不同,基于数据驱动的排序算法在选择待排序图像的近邻样本集合时,只单纯地使用图像的视觉特征,而无需考量图像的标签信息。直观地,由于算法较为简单,因此基于近邻投票机制的标签排序算法在海量图像数据集上体现出了较好的扩展性。但需要指出的是,由于这类算法忽略了标签之间的语义相关性,因此其排序性能不甚理想;其次,该算法亦采用全局视觉特征来表征图像,因此在图像相似性度量上无法取得令人满意的效果。据此,本文提出了一种改进的图像多标签排序算法,通过引入信号处理领域的稀疏表示理论,将近邻图像检索问题转化为稀疏重构问题,以此来改进近邻图像集合选取的语义相关性,进而改善图像多标签排序算法的性能。近年来,压缩感知(Compressed Sensing)和特征选择的理论与方法结合,用来对图像形成更加有效的稀疏表示(Sparse Representation),成为计算机视觉和机器学习领域的研究热点问题。斯坦福大学的Tibshirani和加州大学伯克利分校的Breiman等人几乎同时提出了对特征选择稀疏施以(?)1-范数约束的Lasso思想,以促使被选择出来的特征尽可能稀疏,提高数据处理过程的可解释性(interpretable)和精度。以Lasso为代表的变量选择方法已成为统计学在分析高维数据所采用的主流手段。因此,可在稀疏表示基础上研究图像语义理解的理论与方法。本文所提出的基于稀疏表示的图像多标签排序算法的具体思路如下:首先,该算法本质上归属于基于语义相关度排序的图像多标签排序算法。给定一幅待排序测试图像,以及海量已标注社群图像集合。我们将这幅待排序测试图像看成是一个待重构的测试样本,而将海量已标注社群图像集合看成是过完备字典。通过稀疏表示理论,我们可以认为,待重构的测试样本可以由该过完备字典中的少数样本稀疏重构得到,并基于学习得出的稀疏系数向量来表征子弹中的每个已标注图像与测试样本图像的语义相似度和相关性。因此,所学习得到的稀疏系数向量中的每个维度表示了测试样本图像与字典中每个已标注图像的语义相关性。最终基于所学习得到的语义相关性来获得测试图像的近邻图像子集,并利用近邻投票策略来统计标签序列中每个关键词的出现频度,按照频度的高低实现对标签序列的排序。此外,本文所提的算法同时考虑到标记之间的语义相关性(即共生关系),采用随机游走算法改善排序结果,实现最终的标记排序。我们利用MATLAB编程语言实现了本文所提的算法,并且在NUS-WIDE图像数据集上进行了实验验证。通过与经典的基于K近邻的标签排序算法进行比较,验证了我们所提出的基于稀疏表示的图像标签排序算法的有效性。
[Abstract]:In today's high-speed Internet era, the popularity of many digital imaging devices, coupled with the advancement of Internet technology, Internet images are playing an increasingly important role in our lives. Network image search has become a very active and challenging research topic in the field of computer vision. Unlike a decade ago, the Internet now makes it easy to create, upload, share, and distribute digital images on the Internet. Social media, such as Facebook, YouTube, Flickr, allow image uploaders to provide a set of keywords (also known as Social Tags) that describe the image. To index images, the semantic annotation of images is accomplished by the cooperation of users through the network, so this kind of image set is also called Collaborative-Tagged Images. There are a large number of tagged image sets on image sharing websites such as R. This sharing method based on social tagging will greatly improve the performance of mass image organization and retrieval on the Internet. Therefore, how to use these tagged image sets more effectively is the key to improve the performance of automatic image annotation. One of the problems is that users usually upload tags corresponding to images in Random Order, i.e. the set of tags submitted by users is not always sorted according to the size of tag-to-image semantic relevance (Tag Relevance). At present, Flickr does not provide a Relevance-based Ranking-based retrieval sorting mechanism. The random sorting of label sets restricts further application of massive image retrieval performance. Most Interesting: that is, according to the click rate of users, the number of comments and so on, but Flickr can not provide the retrieval mode according to semantic relevance at present. In other words, although community-based annotation sharing greatly improves the performance of mass image organization and retrieval on the Internet, users usually upload tags corresponding to images in random order, i.e. the set of tags submitted by users is not always in accordance with the semantic relevance between tags and image content (Tag R). The random ordering of label sets restricts the further application of massive image retrieval performance. Therefore, label ordering is becoming a new hotspot in multimedia research field. It should be pointed out that the image set ordered by semantic correlation can be used as the key to represent semantics. As mentioned earlier, community labeling has become a popular way to capture, classify, and retrieve content on the Internet, and has been successfully applied in the management and retrieval of real social media systems. Although users provide tags to describe the content of a community image, because these tags are from different cultural backgrounds, network users with knowledge structures have their own subjective understanding of the content of the image. The label quality of community image can not be directly used as a reliable image indexing keyword for keyword-based image retrieval. At present, the label of community image mainly exists the disorder of label arrangement and the imprecision of label content, so the semantic understanding of labeled community image mainly focuses on. Some research institutes (such as MSRA) have studied the Tag Ranking problem. Since an image may be labeled with several semantic conceptual markers at the same time, it is a typical multi-marker learning problem. The image itself has a certain degree of semantic ambiguity. However, the realization of tag set sorting according to semantic relevance is abstracted as a typical multi-label Ranking problem. At present, there are many researches on multi-label learning, while there are relatively few researches on multi-label sorting problem. Most of the existing Tag Ranking algorithms focus on Relevance-based Tag Ranking. Intuitively, given an image and a label set, if the correlation of a label A in the label set is higher than that of label B, it is shown that In other words, the frequency of tag A appearing in the subset of K-nearest neighbor image of a given image is higher than that of tag B. This kind of algorithm mainly has two representative works. (1) Statistical model-based algorithm. Modeling sorting algorithm; (2) Data-driven sorting algorithm. Statistical model-based sorting algorithm uses kernel density estimation to estimate the semantic correlation between each tag in an image and the image itself. Its essence is to estimate the Typicality of the sample, if the image represents the region of a semantic tag. Low-level visual features are more typical, that is, if the feature vectors of the region with the same label are closer in the feature space, the semantic relevance of the label will be high; considering the semantic correlation between the labels, random walk algorithm is used to improve the sorting results and achieve the final label sorting. Because global low-level visual features represent images with multi-label semantics, it is impossible to estimate the density of each label in the feature space. Data-driven sorting algorithm gets a subset of the nearest neighbor images of a given image by simple image global feature matching, and counts them by Neighbor-voting strategy. Different from the sorting algorithm based on statistical model, the data-driven sorting algorithm only uses the visual features of the image when selecting the nearest neighbor sample set of the sorted image without considering the label information of the image. Intuitively, tag sorting algorithm based on nearest neighbor voting mechanism shows good scalability in massive image datasets because of its simplicity. However, it should be pointed out that this kind of algorithm ignores the semantic correlation between tags, so its sorting performance is not very good. Secondly, the algorithm also uses global visual features. In this paper, an improved image multi-label sorting algorithm is proposed. By introducing the sparse representation theory in the field of signal processing, the nearest neighbor image retrieval problem is transformed into a sparse reconstruction problem, which improves the selection of nearest neighbor image sets. In recent years, the combination of Compressed Sensing (CS) and feature selection theory and method to form more effective sparse representations for images has become a hot topic in the field of computer vision and machine learning. Tibshirani and Breiman of the University of California, Berkeley, et al. almost simultaneously proposed the idea of associating feature selection sparsely with (?) 1-norm constraints to make the selected features as sparse as possible and to improve the interpretability and accuracy of the data processing process. For statistical analysis of high-dimensional data, the theory and method of image semantic understanding can be studied on the basis of sparse representation. The idea of image multi-label sorting algorithm based on sparse representation proposed in this paper is as follows: Firstly, the algorithm essentially belongs to image multi-label sorting based on semantic correlation sorting. Given a test image to be sorted and a large set of labeled community images, we consider the test image to be sorted as a test sample to be reconstructed, and the large set of labeled community images as an over-complete dictionary. The sparse reconstruction of a few samples from the over-complete dictionary can be used to characterize the semantic similarity and correlation between each labeled image and the sample image in the bullet based on the sparse coefficient vector obtained from the learning. Therefore, each dimension of the sparse coefficient vector obtained from the learning represents the test sample image and the dictionary. Finally, the nearest neighbor image subset of the test image is obtained based on the acquired semantic correlation, and the frequency of each key word in the tag sequence is counted by the nearest neighbor voting strategy, and the tag sequence is sorted according to the frequency. Considering the semantic correlation between tags (i.e. the symbiosis relationship), the Random Walk algorithm is used to improve the sorting result and achieve the final sorting. We implement the algorithm proposed in this paper by using MATLAB programming language and carry out experimental verification on the NUS-WIDE image data set. Comparison is made to verify the effectiveness of our proposed sparse representation based image label sorting algorithm.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】