基于在线的维吾尔语语音语料库的建立及应用

发布时间：2018-03-16 04:21

本文选题：维吾尔语　切入点：语料库　出处：《新疆大学》2017年硕士论文　论文类型：学位论文

【摘要】：建立丰富的语音语料库是研究语音技术的基础,从工程方面看,语音语料库是更加完善语音技术的重要环节之一。针对现有的维吾尔语语音语料库的规模不大以及发音人员年龄和地区方面不平衡等问题,面对深入研究发展的需求,维吾尔语语料库急需扩大,尤其是对语音数据的多样化需要改进,使其更加实用化。因此,本文研究了维吾尔语语音语料库的建立方法及语料库在语音识别中的应用。主要研究内容如下:1)对传统的语音采集和语音语料的标注方法进行了改进。语音语料的采集需要大量的人力和时间。为了解决这一问题,配合维语语音数据采集工作,定制了维语录音工具,目的是简化录音流程,使其更加方便地进行远程数据采集与异地支持等工作。任何人都可以使用该平台进行语音采集数据。为了提高语音标注工作的效率,设计了相应的语音标注平台。这些新方法在实际应用中取得了很好的效果。2)根据维吾尔语的语言和语音特征,构建了多样化的大规模语音语料库。每一种方言中,根据地区的不同还包含了一些土语。有些同样的词在不同的方言中发音不一样。除此之外,不同年龄人的发音特征,韵律特征有一定的差异。因而,采集各地区的方言语音和各年龄段不同文化程度的自然人的语音数据,对语音识别中的特征参数提取和声学模型的优化有着一定的研究意义。目前影响提高识别率的关键因素就是语音的多变性问题。建立包含尽可能多语言现象的语料对语音的分析和识别工作至关重要。而提高语音语料库质量的另一个关键是选取覆盖尽可能多语言现象的文本。本文我们用常用词包含程度和三音子包含程度两种方法进行语料的筛选工作并与传统的随机筛选方法做了对比实验,其中本文提出的三音子筛选方法的覆盖率达到了91%。3)最后,利用语音识别技术中广泛应用的HMM和DNN两种方法对采集的语音语料的一部分进行声学特征提取和声学模型建模并进行了连续语音识别实验。实验中,在语言模型方面,采用了N-gram语言模型,在Linux环境下借助kaldi语音识别工具箱进行了对比识别实验。实验结果表明针对大规模的语音数据来说,DNN声学模型的语音识别结果更好。本文中基于DNN模型的识别率达到了84.49%。相比传统的模型,系统的识别率提高了1.77%。
[Abstract]:The establishment of rich speech corpus is the basis of the study of speech technology, from the engineering point of view, The phonetic corpus is one of the most important links to perfect the speech technology. In view of the problems of the existing Uygur speech corpus, such as the small size of the existing Uygur phonetic corpus and the imbalance in the age and region of the pronunciation personnel, the need for further research and development is faced with. The Uygur corpus urgently needs to be expanded, especially the diversity of speech data needs to be improved to make it more practical. In this paper, the establishment method of Uygur speech corpus and the application of corpus in speech recognition are studied. The main research contents are as follows: (1) the traditional methods of speech acquisition and tagging of speech corpus are improved. In order to solve this problem, In conjunction with the acquisition of Uygur voice data, a Uygur recording tool was customized to simplify the recording process. Make it more convenient for remote data acquisition and remote support. Anyone can use this platform for voice data collection. In order to improve the efficiency of voice tagging, The corresponding phonetic tagging platform is designed. These new methods have achieved good results in practical application. 2) according to the language and phonological features of Uygur language, a variety of large-scale speech corpus has been constructed. Some of the same words are pronounced differently in different dialects. In addition, there are some differences in pronunciation and rhythm between people of different ages. To collect phonological data of dialects in different regions and natural persons of all ages and different educational levels, It is of great significance to study the feature parameter extraction and the optimization of acoustic model in speech recognition. At present, the key factor affecting the improvement of recognition rate is the problem of speech variability. Language with as many linguistic phenomena as possible is established. Data is very important for speech analysis and recognition. Another key to improving the quality of speech corpus is to select text that covers as many linguistic phenomena as possible. In this paper, we use common words to include degree of inclusion and triphonetic inclusion. The two methods are compared with the traditional random screening methods. The coverage of the triphonon screening method proposed in this paper has reached 91.3. finally, Two methods, HMM and DNN, which are widely used in speech recognition technology, are used to extract the acoustic features and model the acoustic model of a part of the collected speech corpus, and the experiments of continuous speech recognition are carried out. Using N-gram language model, The contrast recognition experiment is carried out with the help of kaldi speech recognition toolbox in Linux environment. The experimental results show that the speech recognition result of the acoustic model is better for large scale speech data. The recognition rate based on DNN model in this paper is better. Reached 84.49. compared with the traditional model, The recognition rate of the system increased by 1.77.
【学位授予单位】：新疆大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.34

【相似文献】