基于自适应神经模糊推理与隐马尔可夫的语音分割研究

发布时间：2018-12-27 09:00

【摘要】：现代语音技术和研究需要高精确度和高可靠性的语音分割。人工分割一直被认为是最为可靠和精确的方法。然而,人工分割方法不仅费时费力,还必须由语音专家来进行实施。在大数据时代,尤其针对大型语音库,这是一个致命的缺陷。因此,发展高精确度的自动语音分割技术,是十分必要的。最主要的自动语音分割技术,被称为强制校准。在此方法中,隐马尔可夫模型(HMM)被用于构建不同音素的语音模型。而语音信号被提取为一帧一组的特征向量。该模型可以得到音素间大概的语音边界,但结果不够准确。传统的基于隐马尔科夫模型的强制校准系统,在TIMIT语音库中,以20毫秒的容忍度来计算,精确度在80%?89%之间。迄今为止,许多方法被提出,用于改善基于隐马尔科夫的自动语音分割技术。一些研究人员认识到,基于隐马尔科夫的自动语音分割与人工语音分割之间的差别,是语音专家具有语音分割的相关知识。而模糊逻辑可以将此类知识,直观的转化为可用于计算机的模糊规则。但模糊规则需要专家精心设计,且无法保证规则的完备性。针对这些问题,提出一种更加合适的改善方法,是本研究的目的。自适应神经模糊推理系统(ANFIS)是一种结合神经网络与模糊推理系统的机器学习方法。与其他机器学习方法相比,它具有神经网络和模糊推理系统的优点,且具有较好的性能。其优点:实现简单,非线性,使用模糊推理规则,非常适合解决我们之前提到的问题。在本课题中,自适应神经模糊推理系统,被用于学习如何修正分割点位置,来补偿人工分割与机器分割间的差异和隐马尔科夫模型本身所产生的系统分割误差。整个实验分为两步:第一步,上下文无关的HMM被用于获得初始的语音边界。第二步,训练好的自适应神经模糊推理系统用于修正第一步所得到的分割边界。实验使用TIMIT数据库。实验的结果表明,自适应神经模糊推理系统,可以显著的提高,基于隐马尔科夫的自动语音分割技术精确度。在TIMIT语音库中,以20毫秒容忍度为评价标准,自适应神经模糊推理系统使得精确度从86.25%提高92.08%。这也证明了自适应神经模糊推理系统在语音分割中的有效性。此外,我们的方法更加易于构建和应用。未来,我们要继续提高系统精确度,并将其应用于其它数据库。
[Abstract]:Modern speech technology and research need high accuracy and high reliability of speech segmentation. Manual segmentation is always considered to be the most reliable and accurate method. However, the manual segmentation method is not only time-consuming and laborious, but also must be implemented by speech experts. This was a fatal flaw in big data's time, especially for large-scale speech banks. Therefore, it is necessary to develop automatic speech segmentation technology with high accuracy. The most important automatic speech segmentation technique is called forced calibration. In this method, the hidden Markov model (HMM) is used to construct different phoneme models. The speech signal is extracted into a set of feature vectors. The model can get the approximate phonemes boundary, but the results are not accurate. The traditional forced calibration system based on hidden Markov model is calculated with 20 millisecond tolerance in the TIMIT speech corpus, and the accuracy is between 80% and 89%. Up to now, many methods have been proposed to improve the automatic speech segmentation based on Hidden Markov. Some researchers have realized that the difference between automatic speech segmentation based on hidden Markov and artificial speech segmentation is that speech experts have knowledge of speech segmentation. Fuzzy logic can directly transform this knowledge into fuzzy rules that can be used in computers. However, fuzzy rules need to be carefully designed by experts, and the completeness of the rules cannot be guaranteed. To solve these problems, a more suitable method is proposed, which is the purpose of this study. Adaptive neural fuzzy inference system (ANFIS) is a machine learning method combining neural network and fuzzy inference system. Compared with other machine learning methods, it has the advantages of neural network and fuzzy inference system, and has better performance. Its advantages: simple, nonlinear, fuzzy reasoning rules, very suitable to solve the problems we mentioned earlier. In this paper, the adaptive neural fuzzy inference system is used to learn how to correct the location of segmentation points to compensate for the difference between manual segmentation and machine segmentation and the system segmentation error caused by Hidden Markov Model itself. The whole experiment is divided into two steps: first, context-free HMM is used to obtain the initial speech boundary. In the second step, the trained adaptive neural fuzzy inference system is used to modify the segmentation boundary obtained from the first step. The experiment uses TIMIT database. The experimental results show that the adaptive neural fuzzy inference system can significantly improve the accuracy of automatic speech segmentation based on Hidden Markov. In the TIMIT corpus, the adaptive neurofuzzy inference system can improve the accuracy from 86.25% to 92.08 by using 20 millisecond tolerance as the evaluation criterion. It also proves the effectiveness of adaptive neural fuzzy inference system in speech segmentation. In addition, our approach is easier to build and apply. In the future, we will continue to improve the accuracy of the system and apply it to other databases.
【学位授予单位】：天津大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.3

【相似文献】