金融票据中印刷号码识别算法的研究
发布时间:2019-07-01 09:31
【摘要】:我国彩色发票版面特别复杂、多样,其上一些字符很小。而且用过的发票是由针式打印机打印而成,很多发票版面不清晰、已歪斜、已扭曲。在很多发票上还有不规范盖章、签字,因此如何正确确定发票图像上各种字符的位置、如何正确分割出不同字号字符、如何确定小字符的较高维有效特征、如何设计对应的高效的分类器、如何设计有效训练样本库等问题,都是到目前为止没有很好解决的难题。本文针对发票编号识别难题,以图像处理和模式识别等理论为基础,结合改进的版面分析和识别技术,提出了一种较为有效的发票号码识别算法。预处理阶段,首先采用了中值滤波技术等多种滤波技术相结合方式滤波,去除掉发票图像上的椒盐噪声。对于倾斜的发票图像,本文采用的是改进的方向白游程图像的倾斜校正方法。然后利用迭代阈值法对图像二值化,根据发票特征和灰度直方图的分析,设计发票号码的定位方法。最后采用水平垂直投影法对单个号码进行分割,采用模板法对字符进行归一化。特征提取阶段,对印刷体号码提取了40维有效特征,确保了小字号号码也能有足够的区别其它号码的特征。对号码的识别阶段,提出了改进的排序学习前向掩蔽模式分类器,优化了王守觉院士的排序学习前向掩蔽模型,使其分类效果更好。此外,论文在发票编号训练样本库和测试库的建立方面做了一定工作,建立了有400张发票编号训练样本库和300张发票编号测试库。它们是由40张发票编号训练样本初始库和30张发票编号测试初始库经加不同噪声、旋转不同角度和缩放不同比率而产生。基于该发票编号训练样本库而设计的排序学习前向掩蔽模式分类器有较高的识别率和良好的抗噪性能。实验表明,利用该模式分类器对号码的识别率明显高于传统的BP网络的识别率,抗噪性也优于BP网络,识别速度也有提高。
[Abstract]:The layout of color invoice in China is very complex and diverse, and some characters on it are very small. And the used invoice is printed by needle printer, many invoice layout is not clear, has been skewed, distorted. In many invoices, there are still some problems, such as how to correctly determine the position of various characters on the invoice image, how to correctly segment the characters of different font sizes, how to determine the higher dimensional effective features of small characters, how to design the corresponding efficient classifiers, how to design the effective training sample database, and so on, which have not been solved so far. In order to solve the problem of invoice number recognition, based on the theory of image processing and pattern recognition, combined with the improved layout analysis and recognition technology, a more effective invoice number recognition algorithm is proposed in this paper. In the preprocessing stage, the median filtering technology and other filtering techniques are used to remove the salt and pepper noise from the invoice image. For tilted invoice images, this paper adopts an improved tilting correction method for directional white run distance images. Then the iterative threshold method is used to binarize the image. According to the analysis of invoice characteristics and gray histogram, the location method of invoice number is designed. Finally, the horizontal vertical projection method is used to segment a single number, and the template method is used to normalize the characters. In the feature extraction stage, the 40-dimensional effective features are extracted from the printed number, which ensures that the small size number can also have enough features to distinguish other numbers. For the recognition stage of numbers, an improved sorting learning forward masking pattern classifier is proposed, and the ranking learning forward masking model of academician Wang Shoujue is optimized to make the classification effect better. In addition, the paper has done some work in the establishment of invoice number training sample database and test database, and established 400 invoice number training sample database and 300 invoice number test database. They are produced by 40 invoice number training sample initial library and 30 invoice number test initial library by adding different noise, rotating different angles and scaling different ratios. The sorting learning forward masking pattern classifier based on the invoice number training sample database has high recognition rate and good anti-noise performance. The experimental results show that the recognition rate of the number is obviously higher than that of the traditional BP network, the anti-noise is also better than that of the BP network, and the recognition speed is also improved.
【学位授予单位】:青岛大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41
本文编号:2508369
[Abstract]:The layout of color invoice in China is very complex and diverse, and some characters on it are very small. And the used invoice is printed by needle printer, many invoice layout is not clear, has been skewed, distorted. In many invoices, there are still some problems, such as how to correctly determine the position of various characters on the invoice image, how to correctly segment the characters of different font sizes, how to determine the higher dimensional effective features of small characters, how to design the corresponding efficient classifiers, how to design the effective training sample database, and so on, which have not been solved so far. In order to solve the problem of invoice number recognition, based on the theory of image processing and pattern recognition, combined with the improved layout analysis and recognition technology, a more effective invoice number recognition algorithm is proposed in this paper. In the preprocessing stage, the median filtering technology and other filtering techniques are used to remove the salt and pepper noise from the invoice image. For tilted invoice images, this paper adopts an improved tilting correction method for directional white run distance images. Then the iterative threshold method is used to binarize the image. According to the analysis of invoice characteristics and gray histogram, the location method of invoice number is designed. Finally, the horizontal vertical projection method is used to segment a single number, and the template method is used to normalize the characters. In the feature extraction stage, the 40-dimensional effective features are extracted from the printed number, which ensures that the small size number can also have enough features to distinguish other numbers. For the recognition stage of numbers, an improved sorting learning forward masking pattern classifier is proposed, and the ranking learning forward masking model of academician Wang Shoujue is optimized to make the classification effect better. In addition, the paper has done some work in the establishment of invoice number training sample database and test database, and established 400 invoice number training sample database and 300 invoice number test database. They are produced by 40 invoice number training sample initial library and 30 invoice number test initial library by adding different noise, rotating different angles and scaling different ratios. The sorting learning forward masking pattern classifier based on the invoice number training sample database has high recognition rate and good anti-noise performance. The experimental results show that the recognition rate of the number is obviously higher than that of the traditional BP network, the anti-noise is also better than that of the BP network, and the recognition speed is also improved.
【学位授予单位】:青岛大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41
【参考文献】
相关期刊论文 前10条
1 苑玮琦;金灿;;基于结构特征的纸币号码识别方法[J];计算机工程与应用;2014年08期
2 朱颢东;李红婵;;基于特征加权模糊模板匹配的字符识别[J];兰州理工大学学报;2013年01期
3 王威;刘百华;韩宇菲;孙洪庆;孟凡清;;一种通过方向白游程校正文本图像倾斜的方法[J];科学技术与工程;2012年22期
4 赵高长;张磊;武风波;;改进的中值滤波算法在图像去噪中的应用[J];应用光学;2011年04期
5 吴锐;黄剑华;唐降龙;刘家锋;;基于灰度直方图和谱聚类的文本图像二值化方法[J];电子与信息学报;2009年10期
6 史玉林;李飞飞;孙益顶;;基于均值滤波和小波分析的图像去噪[J];电子测量技术;2008年08期
7 段敬红;栾丹;;人民币号码自动识别方法研究[J];计算机工程与科学;2008年01期
8 胡旺;李志蜀;黄奇;;基于双窗口和极值压缩的自适应中值滤波[J];中国图象图形学报;2007年01期
9 鲁娟娟;陈红;;BP神经网络的研究进展[J];控制工程;2006年05期
10 迟晓君;孟庆春;;基于投影特征值的车牌字符分割算法[J];计算机应用研究;2006年07期
相关硕士学位论文 前2条
1 贾彦金;票据印刷号码自动识别技术研究[D];西安理工大学;2008年
2 钱刚;基于ARM的纸币号码图像采集及预处理系统的研究[D];南京航空航天大学;2007年
,本文编号:2508369
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2508369.html