基于改进的BLFW下平行和非平行文本的语音转换算法研究

发布时间：2018-06-09 02:51

本文选题：语音转换 + 自适应高斯分类　；参考：《南京邮电大学》2017年硕士论文

【摘要】：在语音信号处理领域,语音转换是指将一个说话人(源说话人)的语音转换成听起来像另一个说话人(目标说话人)的所发出的语音,同时保持语义不变。语音中包含着丰富的信息,包括语义信息、个性信息、语言信息和情感信息等,而语音转换主要关注点在于语音的声学本质特征:频谱特性和韵律特征。在语音转换的多种应用场景中,如娱乐和跨语言转换应用中,需要语音转换系统能够提供高质量的语音和实现非平行文本下的语音转换。现有的语音转换系统面临着两个主要问题:一方面是转换后的语音不能同时获得较高的相似度和较好的音质效果,而不得不在转换后语音的相似度和音质上权衡,另一方面是转换函数的训练依赖于平行语料,限制了语音转换系统的通用性。首先为了实现较高音质和相似度转换的语音转换,本文提出基于自适应高斯分类的双线性频率弯折加幅度调节算法,它采用自适应高斯分类更好地对语音的声学特征分布建模,在实现合理分类的基础上进行语音转换。经过主观和客观评价,本文提出的方法比固定的分类数的双线性频率弯折加幅度调节算法转换后的语音的平均MOS值提高了4.7%,平均MCD值降低了2.7%,这说明本文提出的方法对语音转换系统的性能有一定的改进。其次,为了解决语音转换方法对平行语料的依赖,本文使用基于单元挑选和声道长度归一化的方法对非平行语料进行对齐,然后将基于自适应高斯分类的双线性频率弯折加幅度调节方法应用于非平行文本下的语音转换领域。经过主观和客观评价实验对比,证实这种方法比非平行文本下INCA方法的转换后的语音的平均MOS值提高了7.1%,平均MCD值降低了4.0%,表明转换后的语音音质更好,相似度更高。而与传统的平行文本下的高斯混合模型语音转换方法相比平均MCD值高了5.1%,平均MOS值低了3.9%,表明其转换性能仍有一定的差距,但是本方法是在非平行文本条件下开展的,具有更强的通用性。
[Abstract]:In the field of speech signal processing, speech conversion is to transform the speech of one speaker (source speaker) into a speech that sounds like another speaker (target speaker), while maintaining the same semantics. Speech contains abundant information, including semantic information, personality information, language information and emotional information, while speech conversion focuses on the acoustic essential features of speech, such as spectrum characteristics and prosodic features. In many application scenarios of speech conversion, such as entertainment and cross-language conversion, it is necessary that the speech conversion system can provide high quality speech and achieve speech conversion under non-parallel text. The existing speech conversion system is faced with two main problems: on the one hand, the transformed speech can not obtain higher similarity and better sound quality at the same time, but it has to weigh the similarity and sound quality of the converted speech at the same time. On the other hand, the training of conversion function depends on parallel corpus, which limits the generality of speech conversion system. In order to realize the speech conversion of high tone quality and similarity conversion, this paper proposes a bilinear frequency bending amplitude adjustment algorithm based on adaptive Gao Si classification, which uses adaptive Gao Si classification to better model the acoustic feature distribution of speech. On the basis of reasonable classification, speech conversion is carried out. After subjective and objective evaluation, The method proposed in this paper increases the average MOS value of speech by 4.7 and reduces the average MCD value by 2.7 points compared with the bilinear frequency bending and amplitude adjustment algorithm with fixed classification number, which shows that the proposed method is effective for speech conversion system. The performance has certain improvement. Secondly, in order to solve the dependence of speech conversion methods on parallel corpus, this paper uses the method of unit selection and channel length normalization to align the non-parallel corpus. Then the bilinear frequency bending amplitude adjustment method based on adaptive Gao Si classification is applied to the field of speech conversion under non-parallel text. By comparing subjective and objective evaluation experiments, it is proved that the average MOS value and the average MCD value of the transformed speech by the INCA method under non-parallel text are 7.1 higher and 4.0% lower than those of the non-parallel text INCA method, which indicates that the transformed speech has better sound quality and higher similarity. The average Gao Si value is 5.1 higher and the average MOS value is 3.9 lower than the traditional parallel text model speech conversion method, which indicates that there is still a certain gap in the conversion performance. However, this method is developed under the condition of non-parallel text. It is more versatile.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【参考文献】