自然场景图像中的文本定位和提取算法研究
发布时间:2018-11-24 19:28
【摘要】:近些年来,随着互联网技术和信息技术的飞速发展,手机、数码相机等便携式设备的普及,人们可以根据需要随时随地获取图像并上传到网络。而文字,作为人与人之间交流的媒介,也是信息传递的重要方式。但是自然场景图像中的文本提取仍然是一个复杂的问题。首先,文本作为人工设计的结构,不同语言的文本会表现出不同的结构特点,例如像中国、日本、韩国等东亚国家拥有大规模的字符集,复杂的字符结构和多样的字形。因此,使用一个简单的方法来检测所有语言仍然难以实现。其次,在图像的采集过程中,不可避免会受到各种因素的影响,如不均匀光照、复杂的背景图案等,这些都给文本检测造成了困难。因此,自然场景图像文本定位识别技术仍然是一个热门的研究课题。图像文本定位作为图像文本信息提取中关键的一步,其定位结果将直接影响着后续的文本识别OCR过程。这里,本文主要针对水平英文文本,研究设计了一种多分辨率策略的自然场景图像文本定位框架,可以对自然场景文本图像进行由粗到精的定位提取,从而获得文本区域图像。首先,在文本区域粗定位阶段,该框架会将每张图像转换为3个尺度,目的是能够让算法检测出不同大小的字符。之后,通过训练卷积神经网络来对提取的对象区域分类。在这一阶段里,主要使用两种方法获得对象区域,第一种是基于最稳定极值区域的提取方法,第二种是基于笔画宽度变换的方法。实验结果证明,由于卷积神经网络能够有效地检测出可能存在的字符区域,所以图像文本区域粗定位阶段的性能效果主要受提取的对象区域集合的影响。在本文实验中,用基于最稳定极值区域方法比使用笔画宽度变换方法得到的对象集更加完整。然后,在文本区域精提取阶段,本文首先设计了一套基于图像灰度共生矩阵特征和对比度显著性特征的规则来对多分辨率图像的粗定位结果进行融合。之后,为了去除假阳性文本区域,本文将融合后的结果送入自适应增强分类器,并得到最终的图像文本行。其中,自适应增强分类器是使用梯度方向直方图作为特征描述器来进行训练的。实验结果证明该阶段的方法能够有效地提高图像文本定位的准确率。从本文设计的自然场景图像文本定位框架里得到的结果可以进一步的使用图像二值化方法来分割处理,最终可以直接利用OCR程序来对其进行文本识别。
[Abstract]:In recent years, with the rapid development of Internet technology and information technology, mobile phones, digital cameras and other portable devices, people can get images anytime and anywhere and upload them to the network. And text, as a medium of communication between people, is also an important way of information transmission. However, text extraction from natural scene images is still a complex problem. First of all, text as a manually designed structure, text in different languages will show different structural characteristics, such as China, Japan, Korea and other East Asian countries have a large-scale character set, complex character structure and a variety of glyph. Therefore, using a simple method to detect all languages is still difficult to implement. Secondly, in the process of image acquisition, it is inevitable to be affected by various factors, such as uneven illumination, complex background patterns and so on, which make text detection difficult. Therefore, text location and recognition technology of natural scene image is still a hot research topic. As a key step in image text information extraction, image text location will directly affect the subsequent OCR process of text recognition. In this paper, a multi-resolution strategy based text localization framework for natural scene images is designed for horizontal English text, which can extract the text images from coarse to fine, and then obtain text region images. Firstly, in the rough location phase of the text region, the framework converts each image into three scales, so that the algorithm can detect characters of different sizes. After that, the extracted object regions are classified by training convolution neural network. In this stage, two methods are mainly used to obtain the object region, one is based on the most stable extremum region, the other is based on the stroke width transformation. Experimental results show that due to the convolution neural network can effectively detect possible character regions, the performance of image text regions in rough location stage is mainly affected by the extracted object region set. In this paper, the method based on the most stable extremum region is more complete than the method of stroke width transformation. Then, in the stage of text region extraction, a set of rules based on gray level co-occurrence matrix feature and contrast salience feature is designed to fuse the rough location results of multi-resolution image. After that, in order to remove the false positive text area, the fused results are sent to the adaptive enhancement classifier and the final line of image text is obtained. The adaptive enhancement classifier is trained by using gradient histogram as feature descriptor. Experimental results show that this method can effectively improve the accuracy of image text location. The results obtained from the text localization framework of the natural scene image can be further segmented by image binarization method, and finally the text recognition can be carried out directly by using OCR program.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.41
本文编号:2354765
[Abstract]:In recent years, with the rapid development of Internet technology and information technology, mobile phones, digital cameras and other portable devices, people can get images anytime and anywhere and upload them to the network. And text, as a medium of communication between people, is also an important way of information transmission. However, text extraction from natural scene images is still a complex problem. First of all, text as a manually designed structure, text in different languages will show different structural characteristics, such as China, Japan, Korea and other East Asian countries have a large-scale character set, complex character structure and a variety of glyph. Therefore, using a simple method to detect all languages is still difficult to implement. Secondly, in the process of image acquisition, it is inevitable to be affected by various factors, such as uneven illumination, complex background patterns and so on, which make text detection difficult. Therefore, text location and recognition technology of natural scene image is still a hot research topic. As a key step in image text information extraction, image text location will directly affect the subsequent OCR process of text recognition. In this paper, a multi-resolution strategy based text localization framework for natural scene images is designed for horizontal English text, which can extract the text images from coarse to fine, and then obtain text region images. Firstly, in the rough location phase of the text region, the framework converts each image into three scales, so that the algorithm can detect characters of different sizes. After that, the extracted object regions are classified by training convolution neural network. In this stage, two methods are mainly used to obtain the object region, one is based on the most stable extremum region, the other is based on the stroke width transformation. Experimental results show that due to the convolution neural network can effectively detect possible character regions, the performance of image text regions in rough location stage is mainly affected by the extracted object region set. In this paper, the method based on the most stable extremum region is more complete than the method of stroke width transformation. Then, in the stage of text region extraction, a set of rules based on gray level co-occurrence matrix feature and contrast salience feature is designed to fuse the rough location results of multi-resolution image. After that, in order to remove the false positive text area, the fused results are sent to the adaptive enhancement classifier and the final line of image text is obtained. The adaptive enhancement classifier is trained by using gradient histogram as feature descriptor. Experimental results show that this method can effectively improve the accuracy of image text location. The results obtained from the text localization framework of the natural scene image can be further segmented by image binarization method, and finally the text recognition can be carried out directly by using OCR program.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.41
【参考文献】
相关期刊论文 前1条
1 欧文武,朱军民,刘昌平;自然场景文本定位[J];中文信息学报;2004年05期
相关博士学位论文 前1条
1 张健;复杂图像文本提取关键技术与应用研究[D];南开大学;2014年
,本文编号:2354765
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2354765.html