当前位置:主页 > 科技论文 > 信息工程论文 >

语音识别系统中的VTS特征补偿算法优化

发布时间:2018-05-28 17:09

  本文选题:矢量泰勒级数 + 特征补偿 ; 参考:《东南大学》2016年硕士论文


【摘要】:在实际环境中,由于环境噪声的干扰,语音识别系统的识别性能并不理想。矢量泰勒级数(VTS:Vector Taylor Series)特征补偿是一种基于模型的特征补偿算法,具有很强的鲁棒性,能够有效解决训练环境与测试环境失配导致的识别性能下降问题。针对VTS计算量大、在低信噪比环境下性能急剧下降的问题,论文将对基于VTS的孤立词识别系统进行优化,主要包括基于双层高斯混合模型(GMM:Gaussian Mixture Model)结构的VTS特征补偿优化,以及针对多环境模型的噪声参数估计的初始值优化,通过优化提高系统的识别速度和识别率,增强语音识别系统的实用性。主要工作如下:(1)鲁棒语音识别系统结构分析。重点分析鲁棒语音识别中的关键技术,包括基于加权子带谱熵的端点检测算法,VTS特征补偿算法,以及声学模型。声学模型包括用于特征补偿的GMM模型和模式识别的隐马尔可夫模型(HMM:Hidden Markov Model).(2)基于双层GMM模型的VTS补偿算法优化。针对VTS特征补偿计算量大的问题,本文提出了双层GMM的VTS算法结构,将特征补偿中的噪声参数估计过程和特征映射过程分开进行。在训练阶段,分别得到高斯单元混合数个数较少的GMM1模型和混合高斯个数较多的GMM2模型。特征补偿过程中,先用GMM1模型估计测试语音中噪声的均值和方差,再利用GMM2模型基于最小均方误差准则,将测试语音的含噪特征参数映射成纯净的语音特征参数。算法优化大幅降低了计算量,同时保持了识别性能。(3)基于多环境模型VTS算法的噪声参数估计初始值优化。基于多环境模型VTS语音识别从基本环境模型集中选出与当前环境最匹配的声学模型,用于特征补偿,能够有效降低训练环境与测试环境之间的失配性。根据最优GMM模型设置噪声参数的初始值,在噪声参数迭代求解过程中可以有效的避免最大期望(EM:Expectation-maximization)算法陷入局部收敛,使得EM算法能够以更少的迭代次数收敛到更为准确的估计值,从而提高语音识别性能。(4)实现了基于MATLAB的离线仿真测试和基于C平台的实时测试。在MATLAB平台和C平台进行大量实验,验证本文所提出优化算法的有效性。实验证明,本文所提出的双层GMM结构优化算法在中文语音库下识别速度提升38%左右,噪声参数估计EM迭代初始值优化算法能够更加准确的估计出噪声参数,从而使系统误识率下降,特别是在低信噪比环境下效果更加明显。
[Abstract]:In the actual environment, the recognition performance of speech recognition system is not ideal due to the interference of environmental noise. Vector Taylor series Taylor series is a model-based feature compensation algorithm, which is robust and can effectively solve the problem of poor recognition performance caused by mismatch of training environment and test environment. Aiming at the problem of large amount of VTS computation and sharp deterioration of performance in low SNR environment, the isolated word recognition system based on VTS will be optimized in this paper, including the VTS feature compensation optimization based on the two-layer Gao Si hybrid model (GMM: Gaussian Mixture Model) structure. And the initial value of noise parameter estimation for multi-environment model is optimized to improve the recognition speed and recognition rate of the system and enhance the practicability of the speech recognition system. The main work is as follows: 1) structure analysis of robust speech recognition system. The key technologies of robust speech recognition are analyzed, including the VTS feature compensation algorithm based on weighted sub-band spectral entropy and acoustic model. The acoustic model includes the GMM model for feature compensation and the hidden Markov model for pattern recognition. In order to solve the problem of large computation of VTS feature compensation, a two-layer GMM VTS algorithm is proposed in this paper, in which the noise parameter estimation process and the feature mapping process in feature compensation are separated. In the training stage, the GMM1 model with less mixing number of Gao Si cells and the GMM2 model with more mixed Gao Si number are obtained respectively. In the process of feature compensation, the GMM1 model is used to estimate the mean and variance of the noise in the test speech first, and then, based on the minimum mean square error criterion, the noisy feature parameters of the tested speech are mapped to pure speech feature parameters by using the GMM2 model. The algorithm greatly reduces the computational complexity, while keeping the recognition performance. 3) the noise parameter estimation initial value optimization based on the multi-environment model VTS algorithm. Based on the multi-environment model VTS speech recognition selects the most suitable acoustic model from the basic environment model for feature compensation which can effectively reduce the mismatch between the training environment and the test environment. By setting the initial value of noise parameters according to the optimal GMM model, we can effectively avoid the maximum expectation EM1: Expectation-maximization algorithm falling into local convergence in the iterative solution of noise parameters. The EM algorithm can converge to a more accurate estimate with fewer iterations, thus improving the speech recognition performance. (4) the off-line simulation test based on MATLAB and the real-time test based on C platform are realized. A large number of experiments are carried out on MATLAB platform and C platform to verify the effectiveness of the proposed optimization algorithm. Experimental results show that the proposed two-layer GMM structure optimization algorithm increases the recognition speed by about 38% under the Chinese speech corpus, and the noise parameters can be estimated more accurately by the EM iterative initial value optimization algorithm. Thus, the system error rate is decreased, especially in the low SNR environment.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TN912.34

【相似文献】

相关期刊论文 前10条

1 汪洪波;;语音识别系统在配送中心的应用[J];信息与电脑;2006年06期

2 杨q,

本文编号:1947512


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1947512.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7c8ba***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com