粘连字符验证码的识别研究
发布时间:2018-10-05 19:05
【摘要】:二十一世纪是信息爆炸的时代,互联网技术高速发展,给人们生活带来了极大的便利,同时各类网络资源滥用问题也引起了研究者的高度重视。验证码机制应运而生,被运用到各大网站中,用以防止资源被计算机程序恶意占用以及保护信息的隐私性。对验证码的识别研究,不仅可以找出验证码设计的漏洞来防止程序的自动攻击,也在无形中推动了数字图像处理、模式识别和机器学习等技术的发展。本文主要针对粘连字符验证码进行识别研究,这类验证码主要的特点是字符间存在粘连,有简单粘连亦有复杂粘连。着重选取了六个网站的登录或邮箱注册验证码,分别是交通银行、CSDN、知乎、新浪邮箱、新浪微博和网易验证码,根据它们各自的特点提出了相应的破解算法。论文的主要研究工作和成果如下:1、针对交通银行字符倾斜粘连的问题,提出了一种改进的投影分割算法,主要在投影分割算法的基础上加入动态旋转,提升了投影分割点判断的准确性;·针对CSDN字符紧密粘连的情况,将投影分割与字符宽度和像素点和结合。最终二者分割率均能达到近100%。2、对知乎验证码二值化后字符轮廓残缺的情况,提出了一种基于分级区域点搜索的轮廓线修补算法;对传统的连通域分割算法进行了改进,使其在有字符碎块的情况下也能成功分割;对含曲线噪声的新浪验证码,直接利用连通域块的凹凸性和外围像素点信息,辨别出噪块中的字符碎片。3、针对新浪微博扭曲倾斜粘连字符的分割问题,提出一种改进的滴水算法,主要在滴水算法的基础上加入旋转卡壳算法和竖直投影,既解决倾斜问题又能成功分割粘连字符;对设计复杂的网易验证码,主要将连通域分割、竖直投影和滴水分割三种算法相结合,利用它们各自的分割特征,成功从图片中逐步分离出各字符。
[Abstract]:The 21 century is the era of information explosion. The rapid development of Internet technology has brought great convenience to people's life. At the same time, the abuse of various network resources has attracted great attention of researchers. The verification code mechanism has come into being and has been applied to various websites to prevent resources from maliciously occupied by computer programs and to protect the privacy of information. The research on the identification of the verification code can not only find the loophole in the design of the verification code to prevent the automatic attack of the program, but also promote the development of digital image processing, pattern recognition and machine learning technology. This paper focuses on the identification of adhesive character verification code. The main characteristics of this kind of code are that there is adhesion between characters, simple adhesion and complex adhesion. This paper selects six websites' login or mailbox registration verification code, which are CSDN, Zhihu, Sina mailbox, Sina Weibo and NetEase respectively, and puts forward the corresponding decoding algorithm according to their respective characteristics. The main research work and results of this paper are as follows: 1. Aiming at the problem of skew adhesion of characters in Bank of Communications, an improved projection segmentation algorithm is proposed, which mainly adds dynamic rotation to the projection segmentation algorithm. The accuracy of judging projection segmentation points is improved, and the projection segmentation is combined with character width and pixel points for the case of close adhesion of CSDN characters. In the end, the segmentation rate of both can reach nearly 100. 2. In order to solve the problem that the character contour is incomplete after binarization, a contour repair algorithm based on hierarchical region search is proposed, and the traditional connected domain segmentation algorithm is improved. In the case of character fragments, it can also be successfully partitioned. For the Sina certification code with curve noise, it directly utilizes the concavity and convexity of the connected domain block and the information of the peripheral pixels. The character fragment. 3 in the noise block is identified. Aiming at the segmentation problem of Sina Weibo's twisted oblique conglutination characters, an improved drip algorithm is proposed, which mainly adds the rotation clamping algorithm and vertical projection on the basis of the drop water algorithm. It can not only solve the problem of inclination but also successfully divide the adhesive characters. For the complex design of NetEase CAPTCA, it mainly combines three algorithms: connected domain segmentation, vertical projection and drip segmentation, and makes use of their respective segmentation characteristics. The characters were successfully separated from the picture step by step.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP309
本文编号:2254529
[Abstract]:The 21 century is the era of information explosion. The rapid development of Internet technology has brought great convenience to people's life. At the same time, the abuse of various network resources has attracted great attention of researchers. The verification code mechanism has come into being and has been applied to various websites to prevent resources from maliciously occupied by computer programs and to protect the privacy of information. The research on the identification of the verification code can not only find the loophole in the design of the verification code to prevent the automatic attack of the program, but also promote the development of digital image processing, pattern recognition and machine learning technology. This paper focuses on the identification of adhesive character verification code. The main characteristics of this kind of code are that there is adhesion between characters, simple adhesion and complex adhesion. This paper selects six websites' login or mailbox registration verification code, which are CSDN, Zhihu, Sina mailbox, Sina Weibo and NetEase respectively, and puts forward the corresponding decoding algorithm according to their respective characteristics. The main research work and results of this paper are as follows: 1. Aiming at the problem of skew adhesion of characters in Bank of Communications, an improved projection segmentation algorithm is proposed, which mainly adds dynamic rotation to the projection segmentation algorithm. The accuracy of judging projection segmentation points is improved, and the projection segmentation is combined with character width and pixel points for the case of close adhesion of CSDN characters. In the end, the segmentation rate of both can reach nearly 100. 2. In order to solve the problem that the character contour is incomplete after binarization, a contour repair algorithm based on hierarchical region search is proposed, and the traditional connected domain segmentation algorithm is improved. In the case of character fragments, it can also be successfully partitioned. For the Sina certification code with curve noise, it directly utilizes the concavity and convexity of the connected domain block and the information of the peripheral pixels. The character fragment. 3 in the noise block is identified. Aiming at the segmentation problem of Sina Weibo's twisted oblique conglutination characters, an improved drip algorithm is proposed, which mainly adds the rotation clamping algorithm and vertical projection on the basis of the drop water algorithm. It can not only solve the problem of inclination but also successfully divide the adhesive characters. For the complex design of NetEase CAPTCA, it mainly combines three algorithms: connected domain segmentation, vertical projection and drip segmentation, and makes use of their respective segmentation characteristics. The characters were successfully separated from the picture step by step.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP309
【参考文献】
相关期刊论文 前10条
1 刘欢;邵蔚元;郭跃飞;;卷积神经网络在验证码识别上的应用与研究[J];计算机工程与应用;2016年18期
2 陈义中;李松林;;一种注册登录系统图片验证码设计与实现[J];软件导刊;2016年07期
3 陆颖;苏智勇;;3D文本验证码的破解技术研究[J];计算机技术与发展;2016年07期
4 蔺佳哲;王茜;谢楠;;基于WEB开发技术的新型验证码的设计研究[J];石家庄学院学报;2016年03期
5 门涛;孙燕;;电子商务网站验证码安全性分析及设计[J];乐山师范学院学报;2015年08期
6 简献忠;曹树建;郭强;;SOM聚类与Voronoi图在验证码字符分割中的应用[J];计算机应用研究;2015年09期
7 刘华煜;蒋维;;用PHP实现的浮动验证码[J];电脑知识与技术;2014年33期
8 尹龙;尹东;张荣;王德建;;一种扭曲粘连字符验证码识别方法[J];模式识别与人工智能;2014年03期
9 李兴国;高炜;;基于滴水算法的验证码中粘连字符分割方法[J];计算机工程与应用;2014年01期
10 左保河;石晓爱;谢芳勇;章拓;;基于神经网络的网络验证码识别研究[J];计算机工程与科学;2009年12期
,本文编号:2254529
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/2254529.html