图像中无约束文本的定位与分割研究

发布时间：2018-06-19 21:05

本文选题：iFAST检测算法 + 笔划连通分割　；参考：《广西师范大学》2017年硕士论文

【摘要】：静态图像和动态视频(帧)中的文本识别,分两个阶段进行:首先对图像中的文本进行检测与提取,从输入的原始图像中分割出文本区域,即文本检测;然后对检测出的文本区域进行识别,从输入的文本区域图像识别出相应的文本结果,即文本识别。其中文本检测和定位主要用来确定图像中文本的位置,并找出这些文本的边界框,是整个流程中最为关键的一步。文本分割尽可能去除文本周围的背景,便于随后的文本识别。计算机视觉要实现图像的处理、分析和理解,文本检测和定位是必不可少的基础步骤和关键阶段,这是本文研究的意义所在。文献研究显示,自然场景图像中的文本识别,难以直接套用传统标准(有约束)图像中的文本识别算法,因为自然场景图像中文本字与字之间存在着尺寸大小不同、方向不同、字体不同、模糊程度不同、光照度不同、被障碍物遮掩程度不同等差异;另外实时性要求相对较高。任何文本皆由笔划组成,而笔划检测的关键在于检测笔划上的角点。角点检测常用算法 SURF、AGAST、BRISK、FAST、SIFT、ORB 中,FAST(Features from Accelerated Segment Test)算法虽不具尺度不变性,但具有一定程度旋转不变性和仿射不变性,更为重要的是速度明显较快,较适合于实时应用,故本文基于FAST算法和笔划宽度转换算法,提出了一种改进FAST检测算法(iFAST-improved FAST)--一种快速文本角点检测算法,用于定位和分割图像中含有无约束文本的区域。iFAST检测算法,首先检测图中笔划的角点,然后根据角点属性提取成文本片段,接着使用多尺度自适金字塔模型训练级联分类器以去除多余的非文本区域。该算法能快速、鲁棒、精确地检测与分割出图像中大小不同文本区域。还采用基于文本方向投票的有效文本聚类算法,将检测到区域聚集到文本行,以允许后续阶段(例如OCR模块)处理。利用文本识别领域常用的ICDAR2013和MSRA-TD500两个数据集作为训练集和测试集,并与其它算法做了性能对比,结果发现本文提出的iFAST可以在多样性文本和多方向的文本取得较好的性能,iFAST检测算法与常用MSER文本检测算法相比,产生的区域数目减少为原区域数目的2分之1,且能检测多25%的字符,同时检测速度高4倍。采用后续分类阶段的iFAST检测算法可减少为1/7的原区域分割数目,且比MSER检测算法快近3倍。
[Abstract]:The text recognition in static image and dynamic video (frame) is divided into two stages: firstly, text detection and extraction are carried out in the image, and the text region is segmented from the input original image, that is, text detection; Then the detected text region is recognized, and the corresponding text result is recognized from the input text region image, that is, text recognition. Text detection and location is the most important step in the whole process, which is mainly used to determine the location of the Chinese text of the image, and to find the boundary box of the text. Text segmentation removes the background around the text as much as possible to facilitate subsequent text recognition. In order to realize image processing, analysis and understanding, text detection and location are essential basic steps and key stages of computer vision, which is the significance of this study. Literature studies show that text recognition in natural scene images is difficult to directly apply to text recognition algorithms in traditional (constrained) images, because there are different sizes and directions between Chinese characters and characters in natural scene images. Different fonts, different fuzzy degree, different illumination, different degree of occlusion by obstacles, the other requirements are relatively high real-time. Any text consists of strokes, and the key to stroke detection is to detect corner points on strokes. Although the algorithm of corner detection is not scale-invariant, but has a certain degree of rotation invariance and affine invariance, it is more important that the speed is obviously faster, and it is more suitable for real-time application, although the algorithm of corner detection is not scale-invariant, but has a certain degree of rotation invariance and affine invariance. Therefore, based on fast algorithm and stroke width conversion algorithm, an improved fast text corner detection algorithm is proposed, which is used to locate and segment the region. IFAST detection algorithm contains unconstrained text. Firstly, the corner points of strokes in the graph are detected, then extracted into text fragments according to the corner attributes, and then cascaded classifiers are trained by multi-scale adaptive pyramid model to remove redundant non-text regions. The algorithm is fast, robust and accurate to detect and segment different text regions. An efficient text clustering algorithm based on text direction voting is also used to cluster the detected regions into text lines to allow subsequent stages (such as OCR modules) to process. Two data sets, ICDAR2013 and MSRA-TD500, which are commonly used in the field of text recognition, are used as training set and test set, and the performance of ICDAR2013 and MSRA-TD500 are compared with other algorithms. The results show that the iFAST proposed in this paper can achieve better performance in the diversity of text and multi-directional text detection algorithm compared with the usual MSER text detection algorithm. The number of regions generated is reduced to 1 / 2 of the original region number and can detect more than 25% of the characters, and the detection speed is 4 times higher. Using the iFAST detection algorithm in the subsequent classification stage can reduce the number of original regions to 1 / 7, and is nearly three times faster than the MSER detection algorithm.
【学位授予单位】：广西师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】