基于DNN的语言识别系统的研究与实现

发布时间：2019-05-19 13:19

【摘要】：语言是人们日常进行沟通最常见的方法之一,是一种不可或缺的技能。在全球化进程中,人们对语言沟通的障碍日益凸显。在这种背景下,迫切要求能够实现语言识别,因此,语言识别成为近几年语音研究方向的重要研究课题。现有的语言识别系统仍然存在很多问题,比如在复杂的语音背景下提取出纯净的语音信息,从易混淆的语言中将具有语言属性的信息剥离出来等,因此,语言识别仍有待继续研究和探索。语言识别(Language Identification,LID)是根据语音对说话人所说语言所属种类进行自动区分,从而进行说话人语种鉴别的生物特征识别技术。基于音素特征和基于底层的声学特征已经被证明能够非常有效的代表语言种类信息。虽然,通过机器学习能够有效改进了语言识别性能,但识别率依然达不到要求,尤其是对于短时语音段来说,识别性能仍然有待提高。近年来,基于DNN(Deep Neural Network,DNN)的语言识别更是由于DNN的兴起和广泛应用以及良好效果,成为学术界以及工业界的一个研究热点。本次课题以基于DNN的语言识别为研究重点,致力于完成一个完善的且性能良好的语言识别系统。主要做了以下几点工作:1.实现基于DNN的语言识别系统。2.采用一种基于底层声学特征的音素特征向量,即DBF(Deep Bottleneck Features,DBF)特征,这一特征比底层声学特征和音素特征更能够对语言特征进行表述。3.使用一种采用DBF训练DNN统计量提取I-Vector的方法,将DBF代替UBM(Universal Background Model,UBM)用在GMM(Gaussian Mixture Model,GMM)模型中,获得更加精确的统计量,进而提高识别效率。4.对整个系统进行测试和分析。首先,对DBF特征与SDC特征进行性能对比,结果表明,DBF特征对语言有更强的表达能力,在短时语音任务、长时语音任务和易混淆和方言识别任务上性能有显著提高。然后,对基于DBF-GMM-TV的方法和基于DNN-TV的方法在性能方面做了对比分析。表明采用这种模型域能更有效的对模型进行估计。最后,从本地测试和网络在线测试两个方面对系统性能做了测试。
[Abstract]:Language is one of the most common ways for people to communicate on a daily basis, and it is an indispensable skill. In the process of globalization, the obstacles to language communication are becoming more and more prominent. In this context, it is urgent to realize language recognition. Therefore, language recognition has become an important research topic in speech research in recent years. There are still many problems in the existing language recognition systems, such as extracting pure speech information from complex speech background, stripping the information with language attributes from confusing languages, and so on. Language recognition still needs to be further studied and explored. Language recognition (Language Identification,LID) is a biometric recognition technology which automatically distinguishes the language to which the speaker belongs according to speech, so as to identify the speaker's language. Phoneme based features and underlying acoustic features have been proved to be very effective in representing language category information. Although the performance of language recognition can be effectively improved by machine learning, the recognition rate still does not meet the requirements, especially for short-term speech segments, the recognition performance still needs to be improved. In recent years, language recognition based on DNN (Deep Neural Network,DNN has become a research focus in academia and industry because of the rise, wide application and good results of DNN. In this paper, DNN-based language recognition is the focus of research, and a perfect and good language recognition system is devoted to the completion of a perfect and good performance language recognition system. The main work has been done as follows: 1. Implement a language recognition system based on DNN. 2. A phoneme feature vector (DBF (Deep Bottleneck Features,DBF) feature based on the underlying acoustic feature is used, which is more able to express the language feature than the underlying acoustic feature and phoneme feature. Using a method of using DBF training DNN statistics to extract I-Vector, DBF is used instead of UBM (Universal Background Model,UBM in GMM (Gaussian Mixture Model,GMM) model to obtain more accurate statistics, and then improve the recognition efficiency. 4. The whole system is tested and analyzed. Firstly, the performance of DBF features is compared with that of SDC features. The results show that DBF features have stronger expression ability to language, and the performance of DBF features is significantly improved in short-term speech tasks, long-term speech tasks and obfuscating and dialect recognition tasks. Then, the performance of DBF-GMM-TV-based method and DNN-TV-based method is compared and analyzed. It is shown that the model domain can be used to estimate the model more effectively. Finally, the system performance is tested from two aspects: local test and network online test.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.34

【参考文献】