汉藏双语跨语言语音转换方法的研究

发布时间：2019-06-22 09:00

【摘要】：近年来,随着人机语音交互技术的迅速发展,语音转换技术已经得到众多研究学者的重视,将被应用到教育、通信等诸多领域。在国内,对汉语普通话、广东话等主流语言的语音转换方法的研究已经取得很大的进步。但目前还缺少民族语言与方言的跨语言语音转换系统。藏族是我国古老的少数民族之一,藏语的使用人数众多,分布区域广泛。本文将藏语拉萨话作为研究对象,建立了2800句藏语拉萨话的语料库,切分及标注声韵母信息并建立了藏语的声韵母库。在进行汉藏双语跨语言语音转换时,首先是把待转换的藏语文本翻译得到对应的汉语文本,将汉语文本进行文本分析获得所有的声韵母,再查找已建立好目录索引的声韵母库;以藏语的声韵母为基元,同时利用边界信息,根据语境有关的问题集与候选基元的频谱距离进行决策树的建立。对于目标汉语语句,利用决策树算法选择最符合语境信息的声韵母,选取发该音位置和音质最符合的声韵母,然后分别利用波形拼接合成法和STRAIGHT算法得到对应的汉语语音语句,从而完成汉藏双语跨语言语音转换方法的研究。论文的主要工作和创新如下:1、建立了2800句藏语拉萨话的语料库,提取并建立了藏语的声韵母库。首先进行藏语文本语料的设计,然后进行语音语料的录制,再进行切分及标注得到所有声韵母的信息,最后按照藏语的声韵母进行归类,建立目录索引。从而完成藏语声韵母库的建立,为汉藏双语跨语言语音转换奠定了基础。2、汉藏双语跨语言语音转换中采用了STRAIGHT算法。它可以很灵活地修改语音信号的基频、非周期索引和平滑时频谱等相关参数,从而提高转换目标语音的音质。3、实现了汉藏双语跨语言语音转换。对于待转换成的目标汉语语句,利用决策树算法选择最符合语境信息的声韵母,选取发该音位置和音质最合适的声韵母,然后分别利用波形拼接合成法和STRAIGHT算法得到对应的汉语语音语句,并对转换后语音进行了MOS评测、DMOS评测和ABX测试。实验结果表明,使用STRAIGHT算法转换得到语音的音质要优于使用波形拼接合成法。
[Abstract]:In recent years, with the rapid development of human-computer voice interaction technology, speech conversion technology has been paid attention to by many researchers, and will be applied to many fields such as education, communication and so on. In China, great progress has been made in the study of phonetic conversion methods in Mandarin, Cantonese and other mainstream languages. However, there is still a lack of cross-language phonetic conversion system between national languages and dialects. Tibetan is one of the ancient ethnic minorities in China, the number of Tibetan speakers is large and the distribution area is wide. In this paper, Tibetan Lhasa dialect is taken as the research object, the corpus of 2800 Tibetan Lhasa dialect is established, the consonant information is segmented and marked, and the phonological vowel database of Tibetan language is established. In the process of bilingual phonetics conversion between Chinese and Tibetan, first of all, the Tibetan text to be converted is translated into the corresponding Chinese text, the Chinese text is analyzed to obtain all the consonants, and then the consonant database of the catalogue index is found. Taking the consonant of Tibetan as the primitive, and using the boundary information, the decision tree is established according to the spectral distance between the context-related problem set and the candidate primitive. For the target Chinese sentence, the decision tree algorithm is used to select the consonant which is most in accordance with the contextual information, and the phonological position and quality of the phoneme are selected, and then the corresponding Chinese phonetic statements are obtained by using waveform splicing synthesis method and STRAIGHT algorithm respectively, so as to complete the research of Chinese-Tibetan bilingual cross-language speech conversion method. The main work and innovations of this paper are as follows: 1. The corpus of 2800 Tibetan Lhasa dialect is established, and the phonological alphabet database of Tibetan language is extracted and established. Firstly, the Tibetan text corpus is designed, then the phonetic corpus is recorded, and then all the information of consonant is obtained by segmentation and tagging. Finally, according to the consonant of Tibetan language, the catalogue index is established. In order to complete the establishment of Tibetan phonological alphabet database, it lays a foundation for Chinese-Tibetan bilingual cross-language speech conversion. 2, STRAIGHT algorithm is used in Chinese-Tibetan bilingual cross-language speech conversion. It can flexibly modify the fundamental frequency, aperiodic index and smooth time spectrum of speech signal, so as to improve the sound quality of the converted target speech. 3, the bilingual speech conversion between Chinese and Tibetan is realized. For the target Chinese sentence to be converted, the decision tree algorithm is used to select the consonant which is most in line with the contextual information, and the most suitable vowel position and quality are selected. Then the corresponding Chinese speech sentences are obtained by using waveform stitching synthesis method and STRAIGHT algorithm, respectively, and the converted speech is evaluated by MOS, DMOS evaluation and ABX test. The experimental results show that the sound quality of speech converted by STRAIGHT algorithm is better than that of waveform stitching synthesis method.
【学位授予单位】：西北师范大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TN912.3

【相似文献】