基于深度学习的图像语义标注与描述研究
发布时间:2018-04-11 07:37
本文选题:图像标注 + 卷积神经网络 ; 参考:《广西师范大学》2017年硕士论文
【摘要】:随着信息科学技术的高速发展,伴随而来的是多样的媒体数据飞速增长,这得益于数字设备的普及和存储技术的发展。面对大量无标签数据的产生,如文本、音频、图像及视频等,如何管理和使用这些无标注数据,俨然成为一个亟需解决的问题。当前的图像语义标注技术可以有效地对图像进行标注,这不仅能够帮助人们管理大量的无标记图像,还能够让机器更智能的理解图像,所以图像语义标注是一项非常有意义的研究工作。所谓图像理解技术,其核心技术是在图像处理分析基础上,结合计算机视觉和自然语言处理等相关理论,进而分析、理解图像内容,并以文本语义信息的形式反馈给人类。因此图像理解技术的完成不仅需要图像标注,还需要图像描述。图像标注的任务是以图像为对象,语义信息为载体,研究图像中有何物体以及物体之间的联系。图像描述的任务是以自然语言处理技术分析并产生标注词,进而将生成的标注词组合为自然语言的描述语句。近年来,图像描述得到了研究界的极大兴趣,同图像标注工作一样,它们都具有广阔的应用前景。论文以图像语义标注为研究主线,以多媒体数据中的图像作为研究对象,以图像描述为应用扩展,按照特征提取表示-语义映射模型构建-分析理解语义的研究思路,重点研究图像标注中的目标识别和语义分析问题,其中包括特征学习、多标签分类、语义关联性分析和单词语句序列生成等技术。基于以上研究,本文的主要工作有:为了缩减不同模态数据间的语义鸿沟,提出了 一种基于深度卷积神经网络(Deep Convolutional Neural Network,CNN)和集成的分类器链(Ensembles of Classifier Chains,ECC)的图像多标注混合架构CNN-ECC。该模型框架主要由生成式特征学习和判别式语义学习两阶段构成。第一步利用改进的卷积神经网络学习图像多示例融合的高级视觉特征。第二步基于获取的视觉特征与图像的语义标签集训练集成的分类器链,集成的分类器链不仅能够学习到视觉特征包含的语义信息,还能够充分挖掘语义标签间的关联性,使得生成的标签间具有更强的关联性,从而避免产生冗余的标签。最终利用训练得到的模型对未知的图像进行自动语义标注。图像标注为图像描述工作奠定了基础,为了将图像生成的标注词组装成自然语言的语句描述,提出了一种基于卷积神经网络(Convolutional Neural Network,CNN)和双向长短期记忆单元(Double Long-short Term Memory,DLSTM)的图像描述模型 CNN-DLSTM。该模型框架由视觉模型和语言模型两部分组成。首先视觉模型用于学习图像视觉内容概念,生成图像关键语义词。其次语言模型基于人工的描述序列学习词法与语法,结合视觉概念词和相应的语法生成对应的语言描述,完成图像描述任务。为了使模型生成的语句更加类人化,最后CNN-DLSTM还引入了一个生成描述质量的置信评估模型,选择性输出得分更高的图像描述语句。图像的内容不仅复杂而抽象,而且在语义概念上也存在模糊和多义性等特点。因而本文在图像标注的特征学习、语义学习等关键工作上做出改进,实现图像自动标注,改善了图像标注及描述性能。
[Abstract]:With the rapid development of information science and technology, accompanied by the rapid growth of a variety of media data, which benefited from the development of digital devices and storage technology. In the face of a large number of unlabeled data, such as text, audio, image and video, how to manage and use the unlabeled data, has become a a problem to be solved. The current image semantic annotation technology can annotate the image effectively, unmarked images of this can not only help to manage a large number of people, also can make the machines understand more intelligent image, so image semantic annotation is a very meaningful research work. The image understanding technology, its core in the image processing technology is the basis of the analysis, combined with computer vision and Natural Language Processing and other related theories, and analysis, understand the content of the image, and the semantic information of text in the form of feedback to Human image understanding technology. Therefore the need not only to complete image annotation, image description. The task still need image annotation based on image semantic information for the object, as the carrier, research object and object relation between any image. The task of image description is based on Natural Language Processing technology analysis and annotation, which will generate a statement the combination of natural language annotation words. In recent years, image description has been great interest in the research community, with the image annotation work, which has a wide application prospect. Based on image semantic annotation is the main line to the image in multimedia data as the research object, the description of the image application, in accordance with the characteristics of extraction - research ideas of constructing - Analysis and understanding semantic mapping model, object recognition and semantic analysis focus on image annotation, including The characteristics of learning, multi label classification, semantic association analysis and word and sentence sequence generation. Based on the above research, the main works of this paper are: in order to reduce the semantic gap between different modal data, proposes a convolutional neural network based on (Deep Convolutional Neural Network, CNN) and the integrated classifier chain (Ensembles of Classifier Chains, ECC) of the image annotation CNN-ECC. hybrid architecture of the model framework is mainly composed of generative learning and discriminative learning of semantic features two stage. The first step in the use of improved convolution neural network multi instance learning image fusion advanced visual features. The second step semantic label visual features and image acquisition based on the training set the integrated classifier chain, integrated classifier chain can not only learn the semantic information contained in visual features, but also can fully dig the semantic labels The relevance, relevance is more generated between tags, so as to avoid redundant label. Finally using the model which is trained on the unknown image automatic semantic annotation. The image has laid the foundation for image annotation description, in order to generate image annotation phrases into natural language sentence description, proposed a based on a convolutional neural network (Convolutional Neural Network, CNN) and two long short term memory unit (Double Long-short Term Memory, DLSTM) of the image description model CNN-DLSTM. the model framework by visual and language model is composed of two parts. The first visual model for image visual content of concept learning, image semantic key. Second language model based on artificial description of sequence learning of lexical and syntax, combined with visual concept words and the corresponding grammar generates a corresponding language description, complete graph Like describing the task. In order to make the statement more humanoid model generation, finally, CNN-DLSTM also introduced a confidence evaluation model is generated to describe the quality of the output image selective score higher description statement. The content of the image is complicated and abstract, but also vague and ambiguous and other features in the semantic concept. So this study in the characteristics of image annotation, semantic learning and other key work to improve, to achieve automatic image annotation, image annotation and description of improved performance.
【学位授予单位】:广西师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41
【参考文献】
相关期刊论文 前5条
1 李志欣;施智平;张灿龙;王金艳;;混合生成式和判别式模型的图像自动标注[J];中国图象图形学报;2015年05期
2 裴明涛;王永杰;贾云得;郭志强;;基于多尺度模板匹配和部件模型的车牌字符分割方法[J];北京理工大学学报;2014年09期
3 向征;谭恒良;马争鸣;;改进的HOG和Gabor,LBP性能比较[J];计算机辅助设计与图形学学报;2012年06期
4 尹文杰;韩军伟;郭雷;贺胜;许明;;基于显著区域的图像自动标注[J];计算机应用研究;2011年10期
5 李志欣;施智平;李志清;史忠植;;图像检索中语义映射方法综述[J];计算机辅助设计与图形学学报;2008年08期
,本文编号:1735035
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1735035.html