融合图像场景及物体先验知识的图像描述生成模型
[Abstract]:Objective at present, the methods of image description based on the deep convolution neural network (CNN) and the long-and long-term memory (LSTM) network model are usually based on the prior knowledge of the object category information to extract the CNN features of the image. Ignoring the priori knowledge of the scene in the image, resulting in the lack of accurate description of the scene in the generated sentences, it is easy to misjudge the position relationship of the object in the image. In order to solve this problem, an image description generation model (F-SOCPK) which combines the prior information of scene and object category is designed. The scene priori information in the image and the prior information of the object category are incorporated into the model, and the description sentences of the image are generated in collaboration. Improve the quality of sentence generation. Methods first, the parameters of CNN-S model were trained on the large-scale scene data set Place205, so that the CNN-S model could contain more prior information of the scene, and then the parameters of the model were migrated to CNNd-S by the method of migration learning. For capturing scene information in an image to be described; At the same time, the parameters in the CNN-O model are trained on the large-scale object class data set Imagenet, and then transferred to the CNNd-O model to capture the object information in the image. After extracting the scene information and object information of the image, they are fed into the language model LM-S and LMO respectively, and then the output information of LM-S and LM-O is transformed by Softmax function to get the probability score of each word in the single word list. Finally, the final value of each word is calculated by using the weighted fusion method, and the word corresponding to the maximum probability is taken as the output of the current time step, and finally the description sentence of the image is generated. Results the experiment was carried out on three open datasets MSCOCO,Flickr30k and Flickr8k. The model designed in this paper can reflect the BLEU index of sentence coherence and accuracy. The METEOR index, which reflects the accuracy and recall rate of the words in the sentence, and the CIDEr index, which reflect the semantic richness, all exceed the models that use the object category information alone, especially on the Flickr8k data set and on the CIDEr index. It's 9% higher than the Object-based model based on the object category alone and nearly 11% higher than the Scene-based model based on the scene category alone. Conclusion the method presented in this paper has a remarkable effect, and its performance is greatly improved on the basis of the benchmark model, and the performance of the proposed method is superior to that of other mainstream methods. Especially on larger data sets (such as MSCOCO), its advantages are obvious, but on smaller data sets (such as Flickr8k), its performance needs to be further improved. In the next step, more visual priori information, such as action category, object-to-object relationship and so on, will be incorporated into the model to further improve the quality of the description sentence. At the same time, more visual techniques, such as deeper CNN model, target detection, scene understanding and so on, will be combined to further improve the accuracy of sentences.
【作者单位】: 井冈山大学数理学院;井冈山大学流域生态与地理环境监测国家测绘地理信息局重点实验室;同济大学计算机科学与技术系;井冈山大学电子与信息工程学院;
【基金】:流域生态与地理环境监测国家测绘地理信息局重点实验室基金项目(WE2016015) 江西省教育厅科学技术研究项目(GJJ160750,GJJ150788) 井冈山大学科研基金项目(JZ14012)~~
【分类号】:TP391.41
【相似文献】
相关期刊论文 前10条
1 周卫东,冯其波,匡萃方;图像描述方法的研究[J];应用光学;2005年03期
2 吴娱;赵嘉济;平子良;杜昊翔;;基于指数矩的图像描述[J];现代电子技术;2013年14期
3 任越美;程显毅;李小燕;谢玉宇;;基于概念级语义的图像描述与识别[J];计算机科学;2008年07期
4 毛玉萃;;一种面向用户需求的图像描述方法[J];制造业自动化;2010年11期
5 周昌;郑雅羽;周凡;陈耀武;;基于局部图像描述的目标跟踪方法[J];浙江大学学报(工学版);2008年07期
6 宫伟力;安里千;赵海燕;毛灵涛;;基于图像描述的煤岩裂隙CT图像多尺度特征[J];岩土力学;2010年02期
7 胡美燕,姜献峰,柴国钟;Hu矩在一次性输液针图像描述中的应用[J];中国图象图形学报;2005年02期
8 谢玉鹏;吴海燕;;基于AAM的人脸图像描述与编码[J];计算机仿真;2009年06期
9 阿木古楞,杨性愉,平子良;用变形雅可比(p=4,q=3)-傅立叶矩进行图像描述[J];光电子·激光;2003年09期
10 于永新;冯志勇;;基于常识库支持的图像描述和检索系统[J];计算机应用研究;2009年02期
相关博士学位论文 前2条
1 梁浩然;自然图像的视觉显著性特征分析与检测方法及其应用研究[D];浙江工业大学;2016年
2 汤进;基于图理论的图像描述与检索方法研究[D];安徽大学;2007年
相关硕士学位论文 前2条
1 钟艾妮;人脸识别中图像描述方法的研究[D];哈尔滨工业大学;2010年
2 陈影;基于复杂网络理论的图像描述与识别方法研究[D];安徽大学;2014年
,本文编号:2456050
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2456050.html