语音转换中声道谱参数变换算法的研究

发布时间：2018-06-27 17:52

本文选题：语音转换 + 语音信号处理　；参考：《南京邮电大学》2017年硕士论文

【摘要】：语音转换技术就是指在维持说话人语言内容不变的情况下,将源说话人声音的个性特征进行转化,使得变换后的语音更贴近目标人语音。语音转换技术属于语音信号处理衍生出来的一个研究方向,语音转换与语音信号分析、识别和合成等研究方向有着密不可分的联系且相互之间促进发展,还有许多实际应用如文语转换、制作影视作品配音、医学领域等等。本文重点研究以下内容:(1)对语音转换系统中各个部分的作用进行讨论;主要针对声道谱特征参数这一特征的转换进行研究并且依此介绍许多经典转换模型,如矢量量化、高斯混合、线性多变量回归、人工神经网络等等。(2)径向基函数神经网络常被用作转换模型,该神经网络的核函数参数通常采纳K-均值聚类进行训练,由于此方法具有一些缺点如收敛速度慢、容易落入局部最优中、泛化能力不强等。本文提出改进粒子群算法优化径向基函数的方法来提高此网络的性能,以便于更准确的获得源说话人与目标人之间谱包络的映射关系并研究其在语音转换系统中起到的作用。实验成果表明,本文提出的转换方案能够有效提升神经网络的性能,使转换后的语音更接近于目标语音。(3)常规语音转换系统中声道谱特征参数都根据单一的径向基函数神经网络规则进行转换,这样很难匹配所有的特征参数,使得转换语音的质量有所下降。为了改善上述情况,本文提出自组织特征映射与改进粒子群优化径向基函数神经网络联合转换声道谱特征参数,利用自组织特征映射良好的分类能力建立多转换规则。通过主观和客观的评价:这种多类别映射规则可以提升转换的精确度,使得语音信号的质量得到提升。
[Abstract]:The technology of speech conversion is to transform the individual characteristics of the source speaker's voice under the condition of keeping the speaker's language content unchanged, so that the transformed speech is closer to the target person's speech. Speech conversion technology is a research direction derived from speech signal processing. Speech conversion is closely related to speech signal analysis, recognition and synthesis, and promotes the development of each other. There are many practical applications such as text-to-speech conversion, production of film and television dubbing, medical field and so on. This paper focuses on the following contents: (1) the role of each part of the speech conversion system is discussed, and the conversion of the characteristic parameter of the channel spectrum is mainly studied and many classical conversion models, such as vector quantization, are introduced. Gao Si mixing, linear multivariate regression, artificial neural network and so on. (2) Radial basis function neural network is often used as the transformation model, the kernel function parameters of the neural network are usually trained by K-means clustering. This method has some disadvantages, such as slow convergence rate, easy to fall into local optimum, weak generalization ability and so on. In this paper, an improved particle swarm optimization method is proposed to optimize the radial basis function (RBF) to improve the performance of the network, so as to obtain more accurately the mapping relationship of spectral envelope between the source speaker and the target, and to study its role in the speech conversion system. Experimental results show that the proposed conversion scheme can effectively improve the performance of neural networks. The transformed speech is closer to the target speech. (3) in the conventional speech conversion system, the characteristic parameters of the channel spectrum are converted according to a single radial basis function neural network rule, so it is difficult to match all the feature parameters. The quality of the converted speech is reduced. In order to improve the above situation, this paper presents a method of combining self-organizing feature mapping with improved particle swarm optimization radial basis function neural network to transform the acoustic spectrum feature parameters, and sets up multi-conversion rules by using the good classification ability of self-organizing feature mapping. Subjective and objective evaluation: this multi-class mapping rule can improve the accuracy of the conversion and improve the quality of speech signal.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3;TP18

【相似文献】