基于卷积神经网络的自然场景中数字的识别

发布时间：2018-07-18 18:50

【摘要】：伴随着人类社会步入大数据时代,越来越多的多媒体数据涌入互联网中,面对海量的图片数据,人们迫切的希望可以利用计算机来自动识别处理这些多媒体数据,这也推动了计算机视觉这一领域的发展,其中从复杂背景的图片中提取文本信息一直是计算机视觉中的一个热点、难点。近年来神经网络在计算机视觉的各个方向都获得了突破性的进展,原因是相比于传统的人工提取图像特征的方式,神经网络最大的优势是可以自动提取高层特征,这在处理自然场景等复杂问题中尤其重要,而卷积神经网络又因自身结构的特点避免了处理图像这种高维数据带来的计算量的指数增长。因此使用卷积神经网络进行场景文本识别的研究也越来越成为主流。在这样的背景下,本文的整体思路是将自然场景下的数字识别分为字符定位和字符识别两个任务,首先利用卷积神经网络实现字符区域的定位,在获得字符区域的准确位置后,利用循环神经网络对该区域包含的字符串进行识别。在字符定位任务中,本文通过对物体检测任务和自然场景下字符识别任务的分析与对比,将目前在物体检测领域的一个主流框架Faster-RCNN应用于字符定位任务中,将字符串当作一个特殊的物体。在应用Faster-RCNN框架时针对字符识别任务对框架的输出、网络规模、Anchor比例和IOU阈值等几个方面做了优化。在字符识别任务中,本文使用卷积网络和循环网络融合的网络结构,用卷积网络提取特征,用循环网络生成最终的字符序列。分别训练这两个部分的网络,组成一个完整的识别系统,并在几个公开数据集上进行验证,最后在字符定位的精度方面获得了优于其他方法的效果。
[Abstract]:With the human society stepping into the era of big data, more and more multimedia data pour into the Internet. In the face of massive picture data, people are eager to use computers to automatically identify and process these multimedia data. This also promotes the development of computer vision, in which extracting text information from images with complex backgrounds has been a hot and difficult point in computer vision. In recent years, neural networks have made a breakthrough in all directions of computer vision. The reason is that compared with traditional methods of extracting image features manually, neural networks have the greatest advantage of automatically extracting high-level features. This is particularly important in dealing with complex problems such as natural scenes, and convolution neural networks avoid exponential growth in computation resulting from processing high-dimensional data such as images because of their own structural characteristics. Therefore, the research of scene text recognition based on convolution neural network is becoming more and more popular. In this background, the whole idea of this paper is to divide the digital recognition in the natural scene into two tasks: character location and character recognition. Firstly, we use convolution neural network to locate the character region, and get the exact location of the character region. Cyclic neural network is used to identify the string contained in the region. In the task of character location, by analyzing and comparing the object detection task and the character recognition task in natural scene, this paper applies Faster-RCNN, a mainstream framework in the field of object detection, to the character location task. Treat a string as a special object. In the application of Faster-RCNN framework, several aspects such as the output of character recognition task, the scale of network Anchor ratio and the threshold of IOU are optimized. In the task of character recognition, we use convolutional network and cyclic network structure to extract the feature and generate the final character sequence by using the convolutional network. The network of these two parts is trained to form a complete recognition system, and verified on several open data sets. Finally, the accuracy of character localization is better than that of other methods.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TP183

【参考文献】