基于混合核LS-SVM的古汉字图像识别
发布时间:2018-11-20 04:10
【摘要】:中国古汉字记录了大量的政治、经济、历史等资料,具有很高的史料价值。古汉字具有笔划不规则、异体字繁多等特点,以碑刻和帛书等形式出现的古汉字,残损较为严重,上述特点使得古汉字识别非常困难。利用图像处理技术识别古汉字,解决古籍电子化进程中的流通和典藏困难,对民族文化的继承和发展具有重要的意义。由于古汉字的异体字与局部形变大量存在,现有的图像识别方法难以获得准确结果。支持向量机具有小样本下的强泛化与抗噪能力,在图像识别中已被广泛应用。本文将混合核最小方差支持向量机(LS-SVM)结合图像特征抽取、曲波变换等实现古汉字的图像识别,主要工作和结论如下:1.针对古汉字间的高度相似性导致误分类率高的问题,对传统的支持向量机进行改进,采用混合核加权LS-SVM进行分类识别。混合核加权LS-SVM可以减少异常样本的负面影响,避免出现分类越好或者越坏点的惩罚也越大的情况,提高分类的准确率。2.研究了时域多特征融合的特征提取方法。提取部件结构特征与整体广义密度特征作为全局特征,该全局特征具有鲁棒性强和算法复杂度低等特点;提取网格笔划特征与伪二维弹性网格内的局部点密度特征作为局部特征,所提的局部特征对于局部形变有很好的吸收能力。将提取的全局特征和局部特征融合后作为分类器的特征输入。3.针对古汉字笔划多为不规则曲线导致分类率不高的问题,利用二代曲波变换提取古汉字的频域特征,研究了频域多特征融合的特征提取方法。采用快速离散二代曲波变换对古汉字图像进行多分辨率分解,对不同分辨率下的古汉字图像求取灰度共生矩阵,得到各层子图像的纹理特征参数,然后将所有子图的特征参数进行多特征融合,形成高维的特征向量,并对此特征向量抽取主成分,作为分类器的特征输入。仿真实验结果验证了该方法的有效性。
[Abstract]:Chinese ancient Chinese characters record a large number of political, economic, historical and other materials, with high historical value. Ancient Chinese characters are characterized by irregular strokes and various heterogeneous characters. Ancient Chinese characters, which appear in the form of inscriptions and silk books, are seriously damaged, which makes the recognition of ancient Chinese characters very difficult. It is of great significance for the inheritance and development of national culture to use image processing technology to recognize ancient Chinese characters and to solve the difficulties of circulation and collection in the process of electronization of ancient books. Due to the large number of variant characters and local deformation of ancient Chinese characters, the existing image recognition methods are difficult to obtain accurate results. Support vector machine (SVM) has been widely used in image recognition because of its strong generalization and anti-noise capability under small samples. In this paper, the hybrid kernel minimum variance support vector machine (LS-SVM) is combined with image feature extraction and Qu Bo transform to realize the image recognition of ancient Chinese characters. The main work and conclusions are as follows: 1. Aiming at the problem of high misclassification rate caused by the high similarity among ancient Chinese characters, the traditional support vector machine is improved and the hybrid kernel weighted LS-SVM is used for classification recognition. Hybrid kernel-weighted LS-SVM can reduce the negative effects of abnormal samples, avoid the situation that the better or worse the classification is, and improve the accuracy of classification. 2. The feature extraction method of time domain multi-feature fusion is studied. The structure feature and the global generalized density feature are extracted as global features, which have the characteristics of strong robustness and low algorithm complexity. The feature of stroke and the local point density in pseudo-two-dimensional elastic mesh are extracted as local features. The proposed local features have good absorption ability to local deformation. The extracted global feature and local feature are fused as the feature input of the classifier. 3. 3. Aiming at the problem that most strokes of ancient Chinese characters are irregular curves and the classification rate is not high, the frequency domain features of ancient Chinese characters are extracted by using the second generation Qu Bo transform, and the feature extraction method of frequency domain multi-feature fusion is studied. The fast discrete second generation Qu Bo transform is used to decompose the ancient Chinese character image with multi-resolution. The gray level co-occurrence matrix is obtained for the ancient Chinese character image with different resolution, and the texture characteristic parameters of each layer sub-image are obtained. Then, the feature parameters of all subgraphs are fused to form a high-dimensional feature vector, and the principal components are extracted from the feature vector as the feature input of the classifier. Simulation results show that the proposed method is effective.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.41
本文编号:2343670
[Abstract]:Chinese ancient Chinese characters record a large number of political, economic, historical and other materials, with high historical value. Ancient Chinese characters are characterized by irregular strokes and various heterogeneous characters. Ancient Chinese characters, which appear in the form of inscriptions and silk books, are seriously damaged, which makes the recognition of ancient Chinese characters very difficult. It is of great significance for the inheritance and development of national culture to use image processing technology to recognize ancient Chinese characters and to solve the difficulties of circulation and collection in the process of electronization of ancient books. Due to the large number of variant characters and local deformation of ancient Chinese characters, the existing image recognition methods are difficult to obtain accurate results. Support vector machine (SVM) has been widely used in image recognition because of its strong generalization and anti-noise capability under small samples. In this paper, the hybrid kernel minimum variance support vector machine (LS-SVM) is combined with image feature extraction and Qu Bo transform to realize the image recognition of ancient Chinese characters. The main work and conclusions are as follows: 1. Aiming at the problem of high misclassification rate caused by the high similarity among ancient Chinese characters, the traditional support vector machine is improved and the hybrid kernel weighted LS-SVM is used for classification recognition. Hybrid kernel-weighted LS-SVM can reduce the negative effects of abnormal samples, avoid the situation that the better or worse the classification is, and improve the accuracy of classification. 2. The feature extraction method of time domain multi-feature fusion is studied. The structure feature and the global generalized density feature are extracted as global features, which have the characteristics of strong robustness and low algorithm complexity. The feature of stroke and the local point density in pseudo-two-dimensional elastic mesh are extracted as local features. The proposed local features have good absorption ability to local deformation. The extracted global feature and local feature are fused as the feature input of the classifier. 3. 3. Aiming at the problem that most strokes of ancient Chinese characters are irregular curves and the classification rate is not high, the frequency domain features of ancient Chinese characters are extracted by using the second generation Qu Bo transform, and the feature extraction method of frequency domain multi-feature fusion is studied. The fast discrete second generation Qu Bo transform is used to decompose the ancient Chinese character image with multi-resolution. The gray level co-occurrence matrix is obtained for the ancient Chinese character image with different resolution, and the texture characteristic parameters of each layer sub-image are obtained. Then, the feature parameters of all subgraphs are fused to form a high-dimensional feature vector, and the principal components are extracted from the feature vector as the feature input of the classifier. Simulation results show that the proposed method is effective.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.41
【参考文献】
相关期刊论文 前6条
1 陈丹;李宁;李亮;;古文字的联机手写识别研究[J];北京机械工业学院学报;2008年04期
2 周晓文;李国英;;建立“信息交换用古汉字编码字符集”的必要性及可行性[J];北京师范大学学报(社会科学版);2006年01期
3 居琰,汪同庆,彭建,刘建胜,袁祥辉;特征融合用于手写体汉字识别研究[J];电子科技大学学报;2002年03期
4 ;“中华古籍保护计划”大事记[J];国家图书馆学刊;2014年05期
5 许贺楠;添玉;黄道;;K聚类加权最小二乘支持向量机在分类中的应用[J];华东理工大学学报(自然科学版);2010年02期
6 汤印华;;浅议古籍修复人才队伍建设[J];科技情报开发与经济;2011年32期
相关硕士学位论文 前6条
1 傅向华;金文操作平台及金文资料库系统的设计与实现[D];西北农林科技大学;2002年
2 杨玲;脱机手写体汉字识别研究[D];西华大学;2008年
3 靳天飞;基于笔段的脱机手写体汉字识别方法研究[D];山东大学;2008年
4 孙华;基于多特征融合SVM的古汉字图像识别研究[D];中南大学;2010年
5 张欣;基于四角结构特征的脱机手写汉字识别[D];河北大学;2010年
6 时培培;基于第二代曲波变换结合改进子空间技术的人脸识别技术研究[D];北京化工大学;2012年
,本文编号:2343670
本文链接:https://www.wllwen.com/jingjilunwen/zhengzhijingjixuelunwen/2343670.html