面向抗噪语音识别的SVM关键问题研究

发布时间：2018-07-18 10:04

【摘要】：语音识别是人机交互、模式识别的一项重要技术,有着广阔的应用前景,对语音识别进行研究具有重要的理论意义和实用价值。目前多数语音识别系统只适合于识别“纯净”的语音,当存在噪声或训练和测试环境不同时,识别系统会出现性能急剧下降的现象,系统的性能还有待改进,但常用的语音识别方法难以达到很好的效果。支持向量机作为一种新型的模式识别方法,其理论依据是结构风险最小化原则和VC维理论,该方法利用最大间隔原则、核函数等方法技巧,能较好地解决小样本、高维数、非线性和局部最优解等分类问题,比较适合语音信号特点,已经初步应用于语音识别领域。本文紧密围绕如何提高基于支持向量机的语音识别系统综合性能这个核心内容展开研究,分别从抗噪语音识别系统的多类分类方法选择、支持向量机核函数的选择与构造、加快语音识别系统训练速度的局部支持向量机算法等角度出发,对支持向量机在语音识别系统中的应用进行了若干分析和研究。主要研究成果可以归纳为以下几个方面:(1)详细研究了支持向量机的理论基础和基本原理,从理论上深入分析了支持向量机算法的鲁棒性,从而选择支持向量机作为本文的识别方法,构建了基于支持向量机的语音识别系统;详细分析了语音识别的基础原理、总体流程、模型训练和模式匹配等内容,研究了语音数据库的设计和录制流程,建立了中文500词语音数据库。(2)为了提高语音识别系统的抗噪性,本文深入研究了支持向量机解决多类分类问题的策略,首次将通信系统中的M-ary和纠错输出编码原理引入到支持向量机的语音识别中,仿真实验证明:在纯净和带噪的语音环境下,纠错输出编码方法具有很好的鲁棒性、抗噪性和泛化识别能力。(3)核函数对于支持向量机来说很重要,其直接决定着支持向量机的最终性能,因此在支持向量机的理论研究中,核函数的选择和构造占据很重要的位置。本文提出了两种新的核函数:Logistic和ORF核函数,分别证明了其为Mercer核函数,通过核函数映射趋势、双螺旋测试问题、Vowel和TIDigits两种孤立词语音库的实验,验证了新的核函数是有效的,应用于语音识别具有很好的泛化性能和抗噪能力。(4)为了提高语音识别系统的实时性,加快标准支持向量机的训练速度,考虑到语音样本在特征上的局部相似性以及非邻近样本之间的弱相关性等特点,本文提出了一种改进的局部支持向量机算法模型,给出了改进算法的描述、局部核函数证明和具体的流程,通过Vowel、CASIA汉语数字串、ISOLET、中文500词四种语音库的实验,验证了改进的局部支持向量机算法在语音识别方面可以有效的缩短语音识别系统的训练时间。
[Abstract]:Speech recognition is an important technology of human-computer interaction and pattern recognition, which has a broad application prospect. The research on speech recognition has important theoretical significance and practical value. At present, most speech recognition systems are only suitable for the recognition of "pure" speech. When there is noise or different training and testing environments, the performance of the recognition system will decline sharply, and the performance of the system needs to be improved. However, common speech recognition methods are difficult to achieve good results. As a new pattern recognition method, support vector machine (SVM) is based on structural risk minimization principle and VC dimension theory. The classification problems such as nonlinear and local optimal solutions are suitable for speech signal characteristics and have been applied to speech recognition. This paper focuses on how to improve the comprehensive performance of speech recognition system based on support vector machine. It selects the multi-class classification method of anti-noise speech recognition system, and selects and constructs the kernel function of support vector machine. From the point of view of accelerating the training speed of speech recognition system, the application of support vector machine in speech recognition system is analyzed and studied. The main research results can be summarized as follows: (1) the theoretical basis and basic principle of support vector machine are studied in detail, and the robustness of support vector machine algorithm is analyzed theoretically. Therefore, the support vector machine is selected as the recognition method in this paper, and the speech recognition system based on support vector machine is constructed, and the basic principle, general process, model training and pattern matching of speech recognition are analyzed in detail. The design and recording process of speech database are studied, and the Chinese 500 word speech database is established. (2) in order to improve the noise resistance of speech recognition system, this paper deeply studies the strategy of support vector machine (SVM) to solve the multi-class classification problem. The principle of M-ary and error-correcting output coding in communication system is introduced into the speech recognition of support vector machine for the first time. The simulation results show that in pure and noisy speech environment, the error correction output coding method has good robustness. (3) Kernel function is very important for support vector machine, which directly determines the final performance of support vector machine. Therefore, the selection and construction of kernel function occupy an important position in the theoretical research of support vector machine. In this paper, two new kernel functions, namely: Logistic and ORF kernel functions, are proposed. It is proved that they are Mercer kernel functions respectively. The new kernel functions are proved to be effective by the experiments of the double helix test problem, Vowel and TiDigits, of isolated word phonetic corpus, and the results show that the new kernel function is a Mercer kernel function. Application in speech recognition has good generalization performance and anti-noise ability. (4) in order to improve the real-time performance of speech recognition system, the training speed of standard support vector machine is accelerated. Considering the local similarity of speech samples and the weak correlation between non-adjacent samples, an improved local support vector machine (LSVM) algorithm model is proposed, and the description of the improved algorithm is given. Local kernel function proof and concrete flow, through the experiments of Vowelen CASIA Chinese digital string ISOLET and Chinese 500-word phonetic corpus, it is verified that the improved local support vector machine algorithm can effectively shorten the training time of speech recognition system in the aspect of speech recognition.
【学位授予单位】：太原理工大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN912.34;TP181

【参考文献】