基于混合基元的藏语语音合成技术研究

发布时间：2018-04-27 01:42

本文选题：藏文信息处理 + 语音合成　；参考：《陕西师范大学》2016年博士论文

【摘要】：语音合成是人机交互的核心技术之一,也是中文信息处理中的一个难题。语音合成的目标是将文字信息自动转换为清晰、流畅的语音,它的研究对自动控制、智能机器人和人机语音通讯系统等的研制具有重要的理论意义和实用价值。随着计算机技术和通信技术的发展,基于语料库的语音合成技术越来越引起社会的关注。藏文信息处理作为中文信息处理的重要组成部分,经过二十多年的发展,在分词、标注及词频统计等方面得到了长足进展,但藏语语音合成技术研究才刚刚起步。目前对藏语语音合成有价值的许多属性未能被挖掘和描述,对藏语本体的研究也不够深入。例如,现有系统还不能对藏语的韵律特征进行定性与定量分析,也不能通过文本分析为系统提供必要的控制信息等。本文立足于藏语言文字本体,从语言学和语音学角度研究藏文的文本特征和藏语韵律特征,并基于语料库语音合成技术,设计实现了一个实用的基于混合基元的藏语语音合成系统。文章的主要工作包括以下几个方面:(1)从藏语文本入手,研究了非藏文字符和句子边界识别等语音合成的预处理问题,并根据藏语语音合成的实际需要,提出了基于词性约束的藏文分词算法。相对于传统分词算法,该算法通过词性搭配规则避免了大多数交集型和包孕型歧义的产生,改进紧缩词和未登录词的识别策略,使分词的效率得到了明显改善。另外,为了解决未登录词的语音合成问题,给出了藏文字构件分解算法,并通过开发藏文字构件分析系统验证了算法的性能。同时,将该系统从大规模语料中统计的构件分布信息用于指导基元的选取与语料库的构建。该内容见第二章。(2)从声学及语法特征入手,统计分析安多藏语的韵律层级结构、重音模式及语调现象,研究了藏语的韵律控制规则。首先,提出了藏语的韵律层级结构预测算法,该算法综合运用虚词频度与韵律短语长度信息动态地标记韵律单元边界,避免了韵律层级结构划分过分依赖于分词结果的现象,保证了韵律层级结构的完整性。其次,计算出各级重音的相对系数。合成时先分配韵律词、韵律短语和语调短语的语法重音,然后根据各级韵律单元重音的相对系数计算目标语句的强调重音。最后,给出陈述句、疑问句、祈使句和感叹句的语调特征及语调规则。实验数据证明,本文的韵律规则对语音的韵律表达起到了重要作用,语音的自然度得到较大的改善。该内容见第三章。(3)基元选择是建立结构合理、规模适中的语料库的基础,也是基于语料库语音合成的关键。为了提高系统的韵律表现并兼顾基元的搜索空间,提出混合基元库构建策略,并给出相应的基元选择算法。主、客观实验数据表明,混合基元库策略与算法有效地保留了大基元的完整性与小基元的灵活性及鲁棒性。为了避免语音合成时对基元做过多的算法调整,文章基于混合基元库采用多样本波形拼接策略,即一个(文本)基元在语音库对应多个候选样本。同时研究了多样本语音库的组织策略与搜索算法。实验证明,与传统算法相比,该算法提高了合成速度,增强了系统的实时性。该内容见第四章。(4)以安多藏语语音合成系统为代表介绍了藏语语音合成系统的设计思想、目标、功能特色及性能评测结果。该系统在文本分析、韵律控制方面都比较有特色,为我们继续研究语音合成技术提供了实验平台。该内容见第五章。
[Abstract]:Speech synthesis is one of the core techniques of human-computer interaction. It is also a difficult problem in Chinese information processing. The target of speech synthesis is to automatically convert text information into clear and fluent speech. Its research has important theoretical significance and practical value for the development of automatic control, intelligent robot and human computer speech communication system. With the development of computer technology and communication technology, corpus based speech synthesis technology has attracted more and more attention. As an important part of Chinese information processing, Tibetan information processing has made great progress in the aspects of word segmentation, tagging and word frequency statistics after more than 20 years' development, but the study of Tibetan speech synthesis technology At present, many of the valuable attributes of Tibetan speech synthesis have not been excavated and described, and the research on the Tibetan language is not deep enough. For example, the existing system can not make qualitative and quantitative analysis of the prosody characteristics of Tibetan language, and can not provide the necessary control information for the system through text analysis. This article is based on the Tibetan language. Language and word ontology, from the perspective of linguistics and phonetics, study the features of Tibetan text and Tibetan prosody, and based on corpus speech synthesis technology, a practical Tibetan speech synthesis system based on mixed elements is designed and realized. The main work of this article includes the following aspects: (1) from the Tibetan text, the study of the non Tibetan language. According to the actual needs of speech synthesis in Tibetan language, a Tibetan word segmentation algorithm based on lexical constraints is proposed. Compared with the traditional word segmentation algorithm, the algorithm avoids the generation of most intersection and preconceiving ambiguities through the word matching rules, and improves the contraction word and the non login. In addition, in order to solve the problem of the speech synthesis of the unregistered words, the decomposition algorithm of the hidden text component is given, and the performance of the algorithm is verified by the development of the analysis system of the Tibetan text component. The selection of yuan and the construction of corpus. Second chapters. (2) from the acoustic and grammatical features, the prosodic control rules of Tibetan language are studied by statistical analysis of the rhythmic hierarchy, stress pattern and intonation. First, the prosodic hierarchy prediction algorithm of Tibetan language is proposed. The algorithm combines the frequency and rhythm of the function words. The phrase length information dynamically marks the boundary of the prosodic unit, avoids the phenomenon that the prosodic hierarchical structure is too dependent on the result of the participle, and ensures the integrity of the prosodic hierarchy. Secondly, the relative coefficients of the accents at all levels are calculated. The prosodic words, the prosody phrases and the intonation phrases are first assigned to the grammatical stress, and then according to the rhyme of the different levels. The relative coefficients of the metrical units stress the emphasis on the stress of the target sentences. Finally, the intonation characteristics and the intonation rules of the declarative sentences, interrogative sentences, imperative sentences and exclamations are given. The experimental data show that the rhythmic rules of this paper play an important role in the prosody expression of the speech, and the naturalness of the speech has been greatly improved. Third chapters are shown in this content. (3 Base element selection is the basis of establishing a corpus of reasonable structure and moderate scale. It is also the key to corpus based speech synthesis. In order to improve the prosody performance of the system and give consideration to the search space of the base element, a hybrid base element library construction strategy is proposed and the corresponding basic element selection algorithm is given. And the algorithm effectively preserves the integrity of large base elements and the flexibility and robustness of small primitives. In order to avoid overdoing the algorithm adjustment of the base element in speech synthesis, the paper uses a multi sample waveform splicing strategy based on the hybrid base element library, that is, a (text) base should have multiple candidate samples in the speech library. The experiment shows that the algorithm improves the speed of synthesis and enhances the real-time performance of the system compared with the traditional algorithm. The content of this algorithm is fourth chapters. (4) the design idea, target, feature and performance evaluation results of Tibetan speech synthesis system are introduced with the Tibetan speech synthesis system as representative. Text analysis and prosody control are more distinctive, providing an experimental platform for us to continue research on speech synthesis technology. The content is in the fifth chapter.

【学位授予单位】：陕西师范大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TN912.33

【相似文献】