融合注意力和动态语义指导的图像描述模型

发布时间：2018-08-18 18:06

【摘要】：针对当前图像语义描述生成模型对图像内目标细节部分描述不充分问题,提出了一种结合图像动态语义指导和自适应注意力机制的图像语义描述模型。该模型根据上一时刻信息预测下一时刻单词,采用自适应注意力机制选择下一时刻模型需要处理的图像区域。此外,该模型构建了图像的密集属性信息作为额外的监督信息,使得模型可以联合图像语义信息和注意力信息进行图像内容描述。在Flickr8K和Flickr30K图像集中进行了训练和测试,并且使用了不同的评估方法对所提模型进行了验证,实验结果表明所提模型性能有较大的提高,尤其与Guiding-Long Short-Term Memory模型相比,得分提高了4.1、1.8、2.4、0.8、3.1,提升幅度达到6.3%、4.0%、7.9%、3.9%、17.3%;与Soft-Attention相比,得分分别提高了1.9、2.4、3.3、1.5、2.74,提升幅度达到2.8%、5.5%、11.1%、7.5%、14.8%。
[Abstract]:An image semantic description model based on dynamic semantic guidance and adaptive attention mechanism is proposed to solve the problem of inadequate description of target details in the current image semantic description generation model. According to the information of the previous moment, the model predicts the words of the next moment, and adopts the adaptive attention mechanism to select the image region to be processed by the next moment model. In addition, the model constructs the dense attribute information of the image as additional monitoring information, which enables the model to combine image semantic information and attention information to describe the image content. The proposed model is trained and tested in Flickr8K and Flickr30K images, and different evaluation methods are used to verify the proposed model. The experimental results show that the performance of the proposed model is greatly improved, especially compared with the Guiding-Long Short-Term Memory model. The score increased by 4.1 / 1.82.40.80.81, and reached 6.3 / 4.07.9and 3.9m / 17.3.The score increased by 1.92.43.31.52.74 respectively compared with Soft-Attention, and the range of promotion reached 2.80.11.511.7.5and 14.80.The score increased by 1.92.43.31.52.74, respectively, and reached the range of 2.81.7.5.
【作者单位】：江南大学物联网技术应用教育部工程研究中心;
【基金】：中央高校基本科研业务费专项资金No.JUSRP51510~~
【分类号】：TP183;TP391.41

【相似文献】