基于卷积神经网络的场景文本定位及多方向字符识别研究
本文选题:文本定位 + 字符识别 ; 参考:《华中科技大学》2016年博士论文
【摘要】:随着智能交通、盲人导航和智能物流应用的快速发展,包含路标、广告牌、车牌、书籍和物品包装等场景图像中的文本定位与识别已成为计算机视觉领域研究的热点。由于场景文本图像不仅存在分辨率低、光照不均匀、失焦模糊、仿射失真问题,还含有树木、砖墙和栏杆等复杂多变的背景纹理干扰,文字本身的颜色、字体、大小、方向和排列方式也具有多样性,直接利用现有的光学字符识别技术处理,识别精度低,对应用环境变化的适应性差。因此,如何快速、准确、鲁棒地定位和识别场景图像中的文字仍然是一个具有挑战性的研究课题。大量的观察试验发现,虽然场景文本图像中的背景纹理干扰是复杂多变的,但字符笔画区域的纹理特征却是相对不变的。基于字符笔画区域纹理特征的这种不变性,本文利用卷积神经网络,提出一种字符笔画区域的纹理特征提取方法,并分别结合字符笔画的几何特征以及字符区域的场景上下文特征,来抑制背景纹理干扰,以提高场景图像中文本定位的准确性与适应性。此外,为了提高字符识别对文本方向变化的适应性,我们提出一种字符均匀采样点区域的纹理特征和对应的结构特征提取方法,并利用特征词袋模型和支撑向量机(SVM)进行字符分类。因此,本文分别从场景图像文本定位和识别两个方向进行研究,并取得了如下的研究成果:首先,由于卷积神经网络通过设计层次结构学习可以获取丰富的高层语义信息,能有效地提取背景纹理复杂的目标区域特征,故本文通过卷积神经网络,提取候选字符的纹理特征,并设计了基于联合几何和纹理特征的连通域SVM分类器,以抑制非字符连通域。此外,为了精确定位多方向文本区域,本文对倾斜矫正后的候选文本区域,利用几何相似度度量和基于梯度统计特征的SVM分类器进行过滤,排除背景干扰,实现文本的精确定位。本文提出的方法对场景文本的位置、角度、尺度和灰度变化有较好的适应性,而且能有效地抑制复杂背景纹理干扰,提高场景图像文本定位的精确性和适应性。其次,利用场景分割模型,提出将场景上下文和卷积神经网络结合的场景文本定位方法。对于场景图像中字符和背景区域的分类,大多数方法仅仅考虑字符级的特征,如对边缘密度、笔画宽度或梯度分布等进行判断,对于类字符的背景,容易得到错误的分类结果。对此,本文提出利用候选字符周边区域的场景上下文信息辅助场景文本定位的思想。首先,利用纹理基元增强方法(TextonBoost)和全连接的条件随机场获取图像中每个像素点属于树木、路标、墙、天空等14类目标的概率,同时,提取场景图像中最大稳定极值区域,并将其扩展为矩形块区域。然后,将矩形块区域中所有像素点的概率向量平均作为该区域的场景上下文特征,结合卷积神经网络和SVM分类器,进行字符和非字符分类。最后,利用场景上下文特征以及几何和颜色信息将字符区域组合成文本区域。该方法可以有效地抑制文字存在概率较小的场景中的复杂背景纹理干扰,提高场景文本定位的准确性。最后,为了适应不同方向的文本识别,提出一种结合区域纹理特征和结构特征的抗旋转性字符表达模型。目前,场景文本的识别技术往往只研究水平方向的字符,缺乏通用的文字表达模型。针对这一问题,本文利用字符结构之间的相对方向和相对位置关系设计字符特征。在归一化字符图像的均匀采样点上以任意一点为目标,计算和其他点的去方向性梯度统计特征获得其纹理特征,并同时记录对应的空间坐标关系作为结构特征,利用特征词袋模型对这两种特征进行统计,进而通过SVM分类器分类识别。由于提取的字符特征具有旋转不变性,故该模型能适应不同文本方向的变化。在标准字符数据集和任意方向字符数据集上的实验结果表明,本文提出的方法可以获得较高的识别精度。
[Abstract]:With the rapid development of intelligent traffic, blind navigation and intelligent logistics applications, the location and recognition of text in scene images, including road signs, billboards, license plates, books and goods packaging, has become a hot spot in the field of computer vision. Because scene text images not only have low resolution, uneven illumination, blurred and affine distortion, and affine distortion. It also contains complex and changeable background texture interference such as trees, brick walls and railings. The color, font, size, direction and arrangement of the text itself are also diverse, and are processed directly by the existing optical character recognition technology, and the recognition accuracy is low and the adaptability to the application environment is poor. Therefore, how to quickly, accurately and robust location The text in the scene image is still a challenging research topic. A large number of observation experiments have found that although the background texture interference in the scene text image is complex and changeable, the texture features of the character strokes are relatively unchanged. This invariance based on the texture features of the character strokes is used in this paper. The convolution neural network (convolution neural network) proposes a method of texture feature extraction in character strokes. It combines the geometric features of character strokes and the context features of the character region to suppress the background texture interference, in order to improve the accuracy and adaptability of the text location in the scene image. In addition, in order to improve the character recognition to the text direction. According to the adaptability of the change, we propose a texture feature and the corresponding structural feature extraction method for the region of the character uniform sampling point, and use the feature word bag model and the support vector machine (SVM) to classify the characters. Therefore, this paper studies the two directions from the scene image text location and recognition, and the following research results are obtained. First, because the convolution neural network can acquire rich high-level semantic information through the design hierarchy process, it can effectively extract the complex target area features of the background texture. Therefore, this paper extracts the texture features of the candidate characters through the convolution neural network, and designs a connected domain SVM classifier based on the joint geometry and texture features. In order to suppress the non character connected domain. In addition, in order to locate the multi direction text area accurately, this paper filters the candidate text region after the tilt correction, uses the geometric similarity measure and the SVM classifier based on the gradient statistical feature to filter, excludes the background interference, and realizes the precise location of the text. The method proposed in this paper has the position and angle of the scene text. Degree, scale and gray scale change have good adaptability, and can effectively suppress complex background texture interference and improve the accuracy and adaptability of scene image text location. Secondly, using scene segmentation model, a scene text location method combining scene context and convolution neural network is proposed. The classification of regions, most of the methods only consider the character level features, such as the edge density, the stroke width or the gradient distribution. For the background of the character class, it is easy to get the wrong classification results. In this paper, the idea of using the scene context information in the surrounding region of the candidate character to assist the scene text location is proposed. Using the texture element enhancement method (TextonBoost) and the full connection conditional random field, each pixel in the image belongs to the probability of 14 kinds of targets, such as trees, road signs, walls, and sky. At the same time, the maximum stable extremum area in the scene image is extracted and expanded into a rectangular block region. Then, the probability of all pixels in the rectangle block region is given. As the context feature of the scene in the region, it combines the convolution neural network and the SVM classifier to classify the character and non character. Finally, the character region is combined into the text region by using the scene context features and the geometric and color information. This method can effectively suppress the complex background in the scene with small probability. Texture interference improves the accuracy of scene text location. Finally, in order to adapt to different directions of text recognition, an anti rotation character expression model combining regional texture features and structural features is proposed. At present, the scene text recognition technology often only studies the characters in the horizontal direction and lacks a universal word expression model. In this paper, the character features are designed by using the relative direction and relative position relationship between the character structures. At the uniform sampling point of the normalized character image, the texture features are obtained by any point in the uniform sampling point of the normalized character image, and the statistical features of the direction gradient of other points are calculated, and the corresponding spatial coordinates are recorded as the structural features. The two features are classified by the feature word bag model, and then the SVM classifier is classified. Because the extracted character features have rotation invariance, the model can adapt to the change of different text directions. The results of the standard character data set and the arbitrary direction character dataset show that the proposed method can be obtained. Higher recognition accuracy.
【学位授予单位】:华中科技大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP391.41;TP183
【相似文献】
相关期刊论文 前10条
1 黄治虎;;图像文本定位技术研究[J];计算机光盘软件与应用;2013年01期
2 谢凤英;姜志国;汪雷;;基于空白条方向拟合的复杂文本图像倾斜检测[J];计算机应用;2006年07期
3 侯跃云;刘立柱;;文本图像语种识别技术[J];计算机应用;2006年S1期
4 陆小川;伊兵哲;平西建;程娟;;含噪文本图像的中英文文种识别研究[J];计算机工程与设计;2007年21期
5 贺志明;;射影文本图像的校正[J];电气自动化;2008年01期
6 刘仁金;高远飙;郝祥根;;文本图像页面分割算法研究[J];中国科学技术大学学报;2010年05期
7 李晓昆;基于笔划识别的文本图像压缩[J];微型机与应用;1998年09期
8 曾凡锋;付亚南;;基于文字笔画结构的文本图像校正处理[J];无线互联科技;2014年02期
9 童莉,平西建;基于信息度量的图像特征与文本图像分类[J];计算机工程;2004年17期
10 贺志明;;数码相机拍摄的透视文本图像的校正[J];上海工程技术大学学报;2007年03期
相关会议论文 前1条
1 李兰兰;吴乐南;;一种带噪声文本图像的增强算法[A];全国第16届计算机科学与技术应用(CACIS)学术会议论文集[C];2004年
相关重要报纸文章 前1条
1 ;认识自动OCR技术[N];计算机世界;2000年
相关博士学位论文 前10条
1 朱安娜;基于卷积神经网络的场景文本定位及多方向字符识别研究[D];华中科技大学;2016年
2 章东平;视频文本的提取[D];浙江大学;2006年
3 戴祖旭;文本载体信息隐藏研究[D];华中科技大学;2007年
4 许剑峰;数字视频中的文本分割的研究[D];华南理工大学;2005年
5 谭利娜;文本图像鲁棒认证技术研究[D];湖南大学;2012年
6 王振;数字视频中文本的提取方法研究[D];中国海洋大学;2011年
7 黄晓冬;基于特征融合的视频文本获取研究[D];北京邮电大学;2010年
8 张昕;自然场景图像文本信息提取的理论与方法[D];清华大学;2014年
9 孙羽菲;低质量文本图像OCR技术的研究[D];中国科学院研究生院(计算技术研究所);2005年
10 刘丽;近重复文本图像匹配研究[D];华东师范大学;2014年
相关硕士学位论文 前10条
1 肖媛;文本图像复原方法的研究[D];昆明理工大学;2015年
2 李晓鑫;嵌入式平台下场景图片中文字定位与识别的实现[D];内蒙古大学;2015年
3 袁俊淼;基于几何约束的笔划宽度变换(SWT)算法及其字幕文本定位应用[D];电子科技大学;2015年
4 滕苑;二值文本图像数字水印研究[D];吉林大学;2015年
5 张鑫;脱机手写维吾尔文本图像中粘连字符定位及分割[D];新疆大学;2015年
6 王国成;基于形态学的文本图像光照均衡化算法研究及实现[D];电子科技大学;2015年
7 尹占辉;场景图像文本区域定位方法研究与实现[D];西安电子科技大学;2014年
8 张胜龙;基于文本图像二值算法的优化研究[D];湘潭大学;2015年
9 孙婷;基于连通域的中英文混排扭曲图像校正研究[D];北方工业大学;2016年
10 徐浩然;基于Harris角点的网络视频中文本区域检测方法的研究[D];吉林大学;2016年
,本文编号:1825371
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1825371.html