当前位置:主页 > 科技论文 > 信息工程论文 >

基于神经网络的语音转换算法研究

发布时间:2018-02-05 22:01

  本文关键词: 语音转换 广义回归神经网络 PSO算法 LPC模型 STRAIGHT模型 出处:《西安建筑科技大学》2017年硕士论文 论文类型:学位论文


【摘要】:语音转换技术是一种将源说话人的声音变为目标说话人声音的技术。作为一门交叉性较强的学科,语音转换技术目前已在文语转换、医疗辅助和通信保密等方面已经得到了重要应用,并且在其他领域展现出了广泛的应用前景。语音转换的研究不仅能加深信号处理领域的理论发展,而能够加深其他与之交叉领域的研究进展。因此,语音转换技术的研究在各个方面都表现出了重要的意义。目前进行语音转换时使用最多的模型是高斯混合模型(Gaussian Mixture Model,GMM)和人工神经网络模型(Artificial Neural Networks,ANN)。考虑到GMM模型存在过平滑和过拟合等问题,论文选用ANN模型进行语音转换。ANN中的径向基函数神经网络(Radial Basis Function,RBF)模型结构简单,可以逼近任意非线性函数。而广义回归神经网络(Generalized Regression Neuron Network,GRNN)作为RBF的一种特例,其模型具有很强的非线性映射能力、简单的网络结构和较高的鲁棒性。针对GRNN模型有且只有一个模型参数的特点,本文利用粒子群优化算法(Particle swarm optimization,PSO)对其进行参数优化,得到了PSO-GRNN模型。该模型不但可以减少人为参数选择对转换模型的影响,还可以提高网络的学习能力。因此,论文中使用的ANN模型有RBF模型、GRNN模型和PSO-GRNN模型。实验结果表明,基于PSO-GRNN模型的转换语音比基于RBF模型和GRNN模型的转换语音更接近目标语音。线性预测编码(Linear Prediction Coding,LPC)模型在语音信号分解时对鼻音和爆破音描述的准确率不高,而STRAIGHT模型可以将语音信号分解得到彼此独立的频谱参数和基频参数,并对这些参数进行语音重构。故本文使用STRAIGHT模型代替LPC模型对语音信号分解和合成,并进行了相应的语音转换实验。相似度测评结果表明,基于STRAIGHT和PSO-GRNN模型的转换语音比基于LPC和PSO-GRNN模型的转换语音更接近目标语音。
[Abstract]:Speech conversion technology is a kind of technology that turns the source speaker's voice into the target speaker's voice. As a cross subject, speech conversion technology has been used in text to speech conversion. Medical aids and communication secrecy have been widely used in other fields. The research of speech conversion can not only deepen the theoretical development of signal processing. And can deepen the research progress in other intersecting fields. The research of speech conversion technology has shown great significance in all aspects. At present, Gao Si mixed model is the most widely used model in speech conversion. Gaussian Mixture Model. GMM) and artificial Neural Networks (Ann). Considering that the GMM model has some problems, such as smoothing and overfitting, etc. In this paper, the radial basis function neural network (Radial Basis function) model of ANN model for speech conversion. Ann is simple in structure. The generalized Regression Neuron Network can be approximated to any nonlinear function. GRN) as a special case of RBF, its model has strong nonlinear mapping ability, simple network structure and high robustness. Aiming at the characteristics of GRNN model with only one model parameter. In this paper, particle swarm optimization algorithm (PSO) is used to optimize its parameters. The PSO-GRNN model is obtained, which can not only reduce the influence of the artificial parameter selection on the conversion model, but also improve the learning ability of the network. The ANN model used in this paper includes RBF model and PSO-GRNN model. The transformed speech based on PSO-GRNN model is closer to the target speech than that based on RBF model and GRNN model. Linear Prediction Coding. The STRAIGHT model can decompose the speech signal into spectrum parameters and fundamental frequency parameters independently. So we use STRAIGHT model instead of LPC model to decompose and synthesize the speech signal, and carry out the corresponding speech conversion experiment. The result of similarity evaluation shows that the speech signal is decomposed and synthesized by the STRAIGHT model instead of the LPC model. The converted speech based on STRAIGHT and PSO-GRNN model is closer to the target speech than that based on LPC and PSO-GRNN model.
【学位授予单位】:西安建筑科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3;TP183

【相似文献】

相关硕士学位论文 前4条

1 杨秀峰;基于神经网络的语音转换算法研究[D];西安建筑科技大学;2017年

2 水晶;语音调度WEB平台服务器推送技术研究[D];长安大学;2017年

3 李丽军;汉字家族效应:语音总体激活与侧抑制机制[D];西南大学;2017年

4 郝唯;二人转小帽的语言特色探析[D];西南大学;2017年



本文编号:1492873

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1492873.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8f1b7***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com