基于改进的BLFW下平行和非平行文本的语音转换算法研究
发布时间:2018-06-09 02:51
本文选题:语音转换 + 自适应高斯分类 ; 参考:《南京邮电大学》2017年硕士论文
【摘要】:在语音信号处理领域,语音转换是指将一个说话人(源说话人)的语音转换成听起来像另一个说话人(目标说话人)的所发出的语音,同时保持语义不变。语音中包含着丰富的信息,包括语义信息、个性信息、语言信息和情感信息等,而语音转换主要关注点在于语音的声学本质特征:频谱特性和韵律特征。在语音转换的多种应用场景中,如娱乐和跨语言转换应用中,需要语音转换系统能够提供高质量的语音和实现非平行文本下的语音转换。现有的语音转换系统面临着两个主要问题:一方面是转换后的语音不能同时获得较高的相似度和较好的音质效果,而不得不在转换后语音的相似度和音质上权衡,另一方面是转换函数的训练依赖于平行语料,限制了语音转换系统的通用性。首先为了实现较高音质和相似度转换的语音转换,本文提出基于自适应高斯分类的双线性频率弯折加幅度调节算法,它采用自适应高斯分类更好地对语音的声学特征分布建模,在实现合理分类的基础上进行语音转换。经过主观和客观评价,本文提出的方法比固定的分类数的双线性频率弯折加幅度调节算法转换后的语音的平均MOS值提高了4.7%,平均MCD值降低了2.7%,这说明本文提出的方法对语音转换系统的性能有一定的改进。其次,为了解决语音转换方法对平行语料的依赖,本文使用基于单元挑选和声道长度归一化的方法对非平行语料进行对齐,然后将基于自适应高斯分类的双线性频率弯折加幅度调节方法应用于非平行文本下的语音转换领域。经过主观和客观评价实验对比,证实这种方法比非平行文本下INCA方法的转换后的语音的平均MOS值提高了7.1%,平均MCD值降低了4.0%,表明转换后的语音音质更好,相似度更高。而与传统的平行文本下的高斯混合模型语音转换方法相比平均MCD值高了5.1%,平均MOS值低了3.9%,表明其转换性能仍有一定的差距,但是本方法是在非平行文本条件下开展的,具有更强的通用性。
[Abstract]:In the field of speech signal processing, speech conversion is to transform the speech of one speaker (source speaker) into a speech that sounds like another speaker (target speaker), while maintaining the same semantics. Speech contains abundant information, including semantic information, personality information, language information and emotional information, while speech conversion focuses on the acoustic essential features of speech, such as spectrum characteristics and prosodic features. In many application scenarios of speech conversion, such as entertainment and cross-language conversion, it is necessary that the speech conversion system can provide high quality speech and achieve speech conversion under non-parallel text. The existing speech conversion system is faced with two main problems: on the one hand, the transformed speech can not obtain higher similarity and better sound quality at the same time, but it has to weigh the similarity and sound quality of the converted speech at the same time. On the other hand, the training of conversion function depends on parallel corpus, which limits the generality of speech conversion system. In order to realize the speech conversion of high tone quality and similarity conversion, this paper proposes a bilinear frequency bending amplitude adjustment algorithm based on adaptive Gao Si classification, which uses adaptive Gao Si classification to better model the acoustic feature distribution of speech. On the basis of reasonable classification, speech conversion is carried out. After subjective and objective evaluation, The method proposed in this paper increases the average MOS value of speech by 4.7 and reduces the average MCD value by 2.7 points compared with the bilinear frequency bending and amplitude adjustment algorithm with fixed classification number, which shows that the proposed method is effective for speech conversion system. The performance has certain improvement. Secondly, in order to solve the dependence of speech conversion methods on parallel corpus, this paper uses the method of unit selection and channel length normalization to align the non-parallel corpus. Then the bilinear frequency bending amplitude adjustment method based on adaptive Gao Si classification is applied to the field of speech conversion under non-parallel text. By comparing subjective and objective evaluation experiments, it is proved that the average MOS value and the average MCD value of the transformed speech by the INCA method under non-parallel text are 7.1 higher and 4.0% lower than those of the non-parallel text INCA method, which indicates that the transformed speech has better sound quality and higher similarity. The average Gao Si value is 5.1 higher and the average MOS value is 3.9 lower than the traditional parallel text model speech conversion method, which indicates that there is still a certain gap in the conversion performance. However, this method is developed under the condition of non-parallel text. It is more versatile.
【学位授予单位】:南京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
【参考文献】
相关期刊论文 前10条
1 车滢霞;俞一彪;;约束条件下的结构化高斯混合模型及非平行语料语音转换[J];电子学报;2016年09期
2 李阳春;俞一彪;;倒谱本征空间结构化高斯混合模型语音转换方法[J];声学学报;2015年01期
3 李贤;於俊;汪增福;;面向情感语音转换的韵律转换方法[J];声学学报;2014年04期
4 宋鹏;王浩;赵力;;采用模型自适应的语音转换方法[J];信号处理;2013年10期
5 马振;张雄伟;杨吉斌;徐玉龙;;基于稀疏卷积非负矩阵分解的语音转换方法研究[J];军事通信技术;2013年02期
6 宋鹏;王浩;赵力;;基于混合Gauss归一化的语音转换方法[J];清华大学学报(自然科学版);2013年06期
7 马振;张雄伟;杨吉斌;;基于语音个人特征信息分离的语音转换方法研究[J];信号处理;2013年04期
8 孙健;张雄伟;曹铁勇;杨吉斌;孙新建;;基于卷积非负矩阵分解的语音转换方法[J];数据采集与处理;2013年02期
9 俞一彪;曾道建;姜莹;;采用独立说话人模型的语音转换[J];声学学报;2012年03期
10 徐宁;杨震;张玲华;;基于状态空间模型的子频带语音转换算法[J];电子学报;2010年03期
,本文编号:1998505
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1998505.html