基于MPE藏语拉萨话区分度声学模型研究

发布时间：2018-09-08 18:42

【摘要】：声学模型是语音识别中最关键的问题之一,其准确性直接影响着语音识别系统的识别效果。如何才能建立较为精确的声学模型一直是研究的重点。那么,本文就是以提高声学模型参数的准确性为目的,通过最小音素错误准则对三音子模型的参数进行估计,进而得到具有更好识别效果的声学模型。大词表连续语音识别关键是要建立和训练能准确描述声学特征的声学模型,而选择哪种训练准则对于识别率具有很大影响。声学模型的训练方法有很多种,传统的训练方法仅仅可以对模型的内部进行训练,而模型与模型之间并不能相互区分。为了解决这个问题,通常采用区分性训练方法。跟传统的训练方法比较,较为不同的一点是区分性训练算法中加入了对模型的边界信息的考虑,所以能够训练出识别性能更好的声学模型。本文是在藏语拉萨话大词汇量连续语音识别系统这个平台上对声学模型的区分性训练进行研究,具体的研究内容和创新点如下。本文主要研究了传统的基于生成性准则的最大似然估计训练算法和基于区分性训练准则的最小音素错误(Minimum Phone Error,MPE)训练算法。使用HTK工具搭建了每个训练算法的实验平台,建立了基于这两种方法的藏语拉萨话声学模型。本文共进行了五个实验,实验1是通过建模单元的选取实验得到三音子模型有更好的识别效果;实验2是通过设定高斯混合数的不同来验证;实验3是通过惩罚因子的设定,得到需要找到一个临界值来提高识别效果;实验4通过Phone Lattice大小的设定得到要根据实际情况设定;实验5是是否加入Ⅰ-平滑函数,得到加入了平滑函数之后,识别效果更好。实验结果证明,跟传统的基于生成性的声学模型训练方法相比,最小音素错误训练算法提高了音素识别率。相比最大似然估计准则,单音子的正确识别率提高了 7.15%,三音子的正确识别率提高了7.78%。
[Abstract]:Acoustic model is one of the most important problems in speech recognition, and its accuracy directly affects the recognition effect of speech recognition system. How to establish a more accurate acoustic model has been the focus of research. Therefore, in order to improve the accuracy of the acoustic model parameters, this paper estimates the parameters of the trisyllabic model by using the minimum phoneme error criterion, and then obtains the acoustic model with better recognition effect. The key of large vocabulary continuous speech recognition is to establish and train acoustic models that can accurately describe the acoustic features, and choosing which training criteria have great influence on the recognition rate. There are many kinds of training methods for acoustic model. The traditional training method can only train the interior of the model, but the model and the model can not be distinguished from each other. In order to solve this problem, discriminative training is usually used. Compared with the traditional training method, the discriminative training algorithm takes into account the boundary information of the model, so it can train the acoustic model with better recognition performance. This paper studies the discriminative training of acoustic models on the platform of Tibetan Lhasa large vocabulary continuous speech recognition system. The specific research contents and innovations are as follows. In this paper, the traditional maximum likelihood estimation training algorithm based on generative criterion and the minimum phoneme error (Minimum Phone Error,MPE) training algorithm based on discriminative training criterion are studied. The experimental platform of each training algorithm is built with HTK tool, and the acoustic model of Tibetan Lhasa dialect based on these two methods is established. In this paper, five experiments were carried out. Experiment 1 obtained a better recognition effect of the trisyllabic model through the selection of modeling units; experiment 2 was verified by setting the mixing number of Gao Si; experiment 3 was determined by setting the penalty factor. We need to find a critical value to improve the recognition effect; experiment 4 need to be set according to the actual situation by setting the size of Phone Lattice; experiment 5 whether to add I-smoothing function, after the addition of smoothing function, the recognition effect is better. The experimental results show that the minimum phoneme error training algorithm improves the phoneme recognition rate compared with the traditional acoustic model training method based on generation. Compared with the maximum likelihood estimation criterion, the correct recognition rate of monophones is increased by 7.15, and the correct recognition rate of tri-phonon is increased by 7.78.
【学位授予单位】：西北民族大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.34

【参考文献】