结合视觉注意机制与递归神经网络的图像检索

发布时间：2019-05-13 10:36

【摘要】：目的图像检索是计算机视觉的一项重要任务。图像检索的关键是图像的内容描述,复杂图像的内容描述很具有挑战性。传统的方法用固定长度的向量描述图像内容,为此提出一种变长序列描述模型,目的是丰富特征编码的信息表达能力,提高检索精度。方法本文提出序列描述模型,用可变长度特征序列描述图像。序列描述模型首先用CNN(convolutional neural network)提取底层特征,然后用中间层LSTM(long short-term memory)产生局部特征的相关性表示,最后用视觉注意LSTM(attention LSTM)产生一组向量描述一幅图像。通过匈牙利算法计算图像之间的相似性完成图像检索任务。模型采用标签级别的triplet loss函数进行端对端的训练。结果在MIRFLICKR-25K和NUS-WIDE数据集上进行图像检索实验,并和相关算法进行比较。相对于其他方法,本文模型检索精度提高了5 12个百分点。相对于定长的图像描述方式,本文模型在多标签数据集上能够显著改善检索效果。结论本文提出了新的图像序列描述模型,可以显著改善检索效果,适用于多标签图像的检索任务。
[Abstract]:Objective Image retrieval is an important task of computer vision. The key of image retrieval is the content description of image, and the content description of complex image is very challenging. The traditional method describes the image content with a fixed length vector. Therefore, a variable length sequence description model is proposed in order to enrich the information expression ability of feature coding and improve the retrieval accuracy. Methods in this paper, a sequence description model is proposed, and a variable length feature sequence is used to describe the image. The sequence description model first uses CNN (convolutional neural network) to extract the underlying features, and then uses the intermediate layer LSTM (long short-term memory to generate the correlation representation of local features. Finally, the visual attention LSTM (attention LSTM) is used to generate a set of vectors to describe an image. The Hungarian algorithm is used to calculate the similarity between images to complete the task of image retrieval. The model uses label level triplet loss function for end-to-end training. Results Image retrieval experiments were carried out on MIRFLICKR-25K and NUS-WIDE datasets and compared with related algorithms. Compared with other methods, the retrieval accuracy of this model is improved by 512 percentage points. Compared with the fixed-length image description, this model can significantly improve the retrieval effect on multi-label datasets. Conclusion A new image sequence description model is proposed in this paper, which can significantly improve the retrieval effect and is suitable for multi-label image retrieval.
【作者单位】：国防科学技术大学计算机学院;
【基金】：国家自然科学基金项目(U1435219)~~
【分类号】：TP391.41;TP183

【相似文献】