基于机器学习的自然图像中文本检测及多文种辨识方法研究

发布时间：2018-06-21 08:54

本文选题：文本检测 + 文种辨识　；参考：《延边大学》2017年硕士论文

【摘要】：文字在人类思想情感以及文化传承中是十分重要的符号工具,在社会生产生活的各个方面都体现出了文字的重要性与不可替代性。在现代城市环境中,文字是普遍存在的元素,如海报、道路标志、牌匾灯箱等,其中不乏大量的文字信息。在自然图像中,文字所表达的语义信息是理解图像内容时一个很重要的参考信息。自然图像中的文种辨识是基于内容的图像检索和多语种系统开发领域的一个重要方向。在自然图像场景中文字的检测及其文种的辨识有相当大的难度:不同自然场景中的文字含有不同的特性,例如颜色不同、数量不一、大小与间隔不同等;而且在自然图像中,文字的背景往往很复杂,同时存在着诸如噪声、倾斜和透视变换等各种问题。这些都对自然图像中的文字检测和文种辨识工作带来了极大的困难。如何有效地对包含有多种语言文字的自然图像进行处理成为自然场景分析与理解中亟待解决的难题。本学位论文提出了一种基于视觉显著性和边缘密集度的文本区域检测方法以及基于图像特征和机器学习方法的文种辨识方法。首先,提出了基于视觉显著性和边缘密集度的文本区域检测方法。该文本区域检测方法通过多尺度谱残差方法来检测视觉显著性区域,接着在视觉显著性区域内使用Sobel算子来对图像进行检测边缘,通过计算图像的边缘密集度,再使用数学形态学方法对图像边缘进行预处理,最终通过自然图像中文字排列的先验知识来检测文本区域。其次,提出了基于基本图像特征与机器学习方法的文种辨识方法。该方法对阿拉伯数字、英文、俄文、日文假名、简体中文和朝鲜文构建了文字样本图像并提取其骨架,利用该骨架的基本图像特征构造相应文种的特征集,并根据不同文种的结构特征,结合分类方法的特性,将文种辨识分为两个阶段.·粗分类阶段和细分类阶段。在粗分类阶段,使用支持向量机将文字划分为两大类,第一类中包含阿拉伯数字、英文、俄文和日文假名,第二类中包含简体中文和朝鲜文。在辨识阶段,使用支持向量机对第一类文字进行文种辨识,使用BP神经网络对第二类文字进行辨识。实验结果表明,本文提出的基于视觉显著性与文字边缘密集度的文本检测方法得到了 73%的检测率,基于基本图像特征与机器学习方法的文种辨识方法得到了 73.33%的辨识率,解决了自然图像中的文本检测与文种辨识问题,同时也验证了本学位论文所提出方法的正确性与可行性。
[Abstract]:Writing is a very important symbolic tool in human thoughts and emotions as well as cultural heritage. It embodies the importance and irreplaceable character in all aspects of social production and life. In modern urban environment, characters are common elements, such as posters, road signs, plaques and lampboxes, among which there is a lot of text information. In natural images, the semantic information expressed by text is an important reference information in understanding image content. Language recognition in natural images is an important direction in the field of content-based image retrieval and multilingual system development. Text detection and text recognition in natural image scenes are quite difficult: text in different natural scenes contains different characteristics, such as different colors, different quantities, different sizes and intervals, and in natural images, The background of text is often very complex, and there are many problems such as noise, tilt and perspective transformation. All these bring great difficulties to text detection and language recognition in natural images. How to effectively process the natural images containing many languages and characters has become a difficult problem to be solved in the analysis and understanding of natural scenes. In this dissertation, a text region detection method based on visual salience and edge intensity, and a text recognition method based on image features and machine learning methods are proposed. Firstly, a text region detection method based on visual salience and edge intensity is proposed. The text region detection method uses multi-scale spectral residuals method to detect the visual significant region, then uses Sobel operator to detect the edge of the image in the visual salience region, and calculates the edge density of the image. Then the edge of the image is preprocessed by mathematical morphology, and the text region is detected by the prior knowledge of the text arrangement in the natural image. Secondly, a language identification method based on basic image features and machine learning is proposed. In this method, the Arabic numerals, English, Russian, Japanese pseudonyms, simplified Chinese and Korean characters were constructed and their skeleton was extracted, and the feature sets of the corresponding languages were constructed by using the basic image features of the skeleton. According to the structural characteristics of different languages and the characteristics of classification methods, the text identification is divided into two stages: coarse classification stage and fine classification stage. In the rough classification stage, the support vector machine is used to divide the characters into two categories. The first includes Arabic numerals, English, Russian and Japanese pseudonyms, and the second includes simplified Chinese and Korean. In the phase of identification, support vector machine (SVM) is used to identify the first kind of characters and BP neural network is used to identify the second kind of characters. The experimental results show that the proposed text detection method based on visual salience and text edge density has a 73% detection rate, and a text recognition rate of 73.33% based on basic image features and machine learning methods. The problems of text detection and text identification in natural images are solved, and the correctness and feasibility of the methods proposed in this dissertation are also verified.
【学位授予单位】：延边大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】