场景图像文本定位与字符识别方法研究

发布时间：2019-05-24 09:04

【摘要】：场景图像中的文本包含着丰富而又准确的信息,在工业自动化、交通管理、自动翻译、残障人士服务等领域中存在广泛的应用需求。但由于场景图像受非均匀光照、背景纹理和文字多样性等影响,现有方法场景文本提取的准确性较低。因此,如何从这些场景图像中准确地提取文本信息已成为模式识别领域的研究热点,开展本项目的研究对提高场景图像文本识别系统的准确性和鲁棒性具有重要的实用价值。本文主要工作及贡献包括:首先,基于文本区域字符灰度值一致性,x方向梯度幅值呈凸形分布和文本字符相近邻的特点,本文提出一种基于卷积神经网络(CNN)和支撑向量机(SVM)输出得分的场景图像文本定位方法。依据文本区域x方向梯度幅值的凸形分布和字符灰度值一致性,检测文本区域的典型点,并通过典型点位置和灰度聚类提取候选连通成分,再对上述候选连通成分以外的区域,用k-means聚类方法进一步提取其它的候选连通成分。然后,使用基于CNN的文本连通成分SVM分类器,利用CNN提取连通成分的纹理特征,再使用SVM输出得分抑制非文本连通成分,并将近邻的连通成分组合成候选文本区域;最后,针对提取的候选区域梯度方向直方图HOG特征,利用支持向量机验证候选区域。对于ICDAR2011和ICDAR2013的场景文本图像数据集,本文定位方法分别获得76%和78%的F值,表明该方法有效地抑制了复杂背景纹理干扰。其次,基于文本行内字符颜色的相似性,提出一种基于颜色聚类和梯度向量流的文本区域字符切割方法。先利用k-means聚类方法,对像素点色彩空间位置分布进行聚类获得k个候选图层,再用连通成分的占空比、宽高比等几何特征,提取候选字符连通成分所在图层;并在同质区域寻找远离边缘的点作为候选切分像素点,利用灰度差值的平方作为代价,寻找累计代价最小的切割路径。在ICDAR2013场景图像文本数据集上,本文方法获得87.9%的F值,实验表明,颜色聚类可有效地抑制非均匀光照和遮挡的干扰。最后,基于字符结构的旋转不变性,提出一种多方向单个字符识别模型。采用变形HOG算子和同心圆形模板采样,提取局部联合HOG纹理特征和采样点之间的象限关系结构特征,组合上述两种特征得到字符特征,进而通过学习建立特征词典的字符词袋模型,然后,利用支持向量机识别字符。针对ICDAR字符数据集、Chars74K数据集和手工收集的数据集进行字符识别实验,本文提出的方法分别获得82%、87%和73%的准确率,表明提出的模型对旋转变化具有较好的鲁棒性。
[Abstract]:The text in the scene image contains rich and accurate information, which has a wide range of application requirements in the fields of industrial automation, traffic management, automatic translation, service for the disabled and so on. However, due to the influence of non-uniform lighting, background texture and text diversity, the accuracy of scene text extraction is low. Therefore, how to extract text information accurately from these scene images has become a research focus in the field of pattern recognition. The research of this project has important practical value to improve the accuracy and robustness of scene image text recognition system. The main work and contributions of this paper are as follows: firstly, based on the consistency of the gray value of the characters in the text area, the amplitude of the gradient in the x direction is convex and the nearest neighbor of the text characters. In this paper, a text location method of scene image based on convolution neural network (CNN) and support vector machine (SVM) output score is proposed. According to the convexity distribution of the gradient amplitude in the x direction of the text region and the consistency of the character gray value, the typical points in the text region are detected, and the candidate connected components are extracted by the typical point position and gray clustering, and then the regions other than the candidate connected components are extracted. Other candidate connected components were further extracted by k-means clustering method. Then, the text connected component SVM classifiers based on CNN are used, the texture features of connected components are extracted by CNN, and then the non-text connected components are suppressed by SVM output score, and the nearest neighbor connected components are combined into candidate text regions. Finally, the support vector machine (SVM) is used to verify the candidate region according to the gradient direction histogram HOG feature of the candidate region. For the scene text image datasets of ICDAR2011 and ICDAR2013, the F values of 76% and 78% are obtained by the localization method, respectively, which shows that the method can effectively suppress the complex background texture interference. Secondly, based on the similarity of character color in text line, a text region character cutting method based on color clustering and gradient vector stream is proposed. Firstly, k-means clustering method is used to cluster the spatial position distribution of pixel color to obtain k candidate layers, and then the geometric features such as duty cycle and aspect ratio of connected components are used to extract the layers in which the candidate characters are connected. In the homogeneous region, the point far from the edge is found as the candidate segmentation pixel point, and the square of the gray difference is used as the cost to find the cutting path with the lowest cumulative cost. On the text dataset of ICDAR2013 scene image, the F value of 87.9% is obtained by this method. The experimental results show that color clustering can effectively suppress the interference of non-uniform light and occlusion. Finally, based on the rotation invariance of character structure, a multi-direction single character recognition model is proposed. The deformed HOG operator and concentric circular template sampling are used to extract the local joint HOG texture features and the quadrant structure features between the sampling points, and the character features are obtained by combining the above two features. Then the character word bag model of feature dictionary is established by learning, and then the character is recognized by support vector machine (SVM). Character recognition experiments are carried out for ICDAR character datasets, Chars74K datasets and manual collected datasets. The accuracy of the proposed method is 82%, 87% and 73% respectively, which shows that the proposed model has good robustness to rotation change.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.41

【相似文献】