基于深度自编码网络语音识别噪声鲁棒性研究
发布时间:2018-08-22 19:28
【摘要】:为了解决传统径向基(Radial basis function,RBF)神经网络在语音识别任务中基函数中心值和半径随机初始化的问题,从人脑对语音感知的分层处理机理出发,提出利用大量无标签数据初始化网络参数的无监督预训练方式代替传统随机初始化方法,使用深度自编码网络作为语音识别的声学模型,分析梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)和基于Gammatone听觉滤波器频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)下非特定人小词汇量孤立词的抗噪性能。实验结果表明,深度自编码网络在MFCC特征下较径向基神经网络表现出更优越的抗噪性能;而与经典的MFCC特征相比,GFCC特征在深度自编码网络下平均识别率相对提升1.87%。
[Abstract]:In order to solve the problem of random initialization of the center value and radius of the basis function in the speech recognition task based on the traditional radial basis function (Radial basis) function RBF neural network, the mechanism of human brain's hierarchical processing of speech perception is discussed. An unsupervised pretraining method using a large amount of unlabeled data to initialize the network parameters is proposed instead of the traditional random initialization method. The depth self-coding network is used as the acoustic model of speech recognition. The anti-noise performance of isolated words with small vocabulary size is analyzed under Mel frequency cepstrum coefficient (Mel Frequency Cepstrum coefficient) and frequency cepstrum coefficient based on Gammatone audio filter (Gammatone Frequency Cepstrum efficient coefficient (Gammatone Frequency Cepstrum). The experimental results show that the depth self-coding network has better anti-noise performance than the radial basis function neural network under the MFCC feature, and the average recognition rate of the MFCC feature is 1.87% higher than that of the classical MFCC feature.
【作者单位】: 太原理工大学信息工程学院;天津大学计算机科学与技术学院;
【基金】:国家自然科学基金(No.61371193,No.61303109) 山西省留学回国择优资助项目(晋人社厅函[2013]68号) 山西省自然科学基金(No.2014021022-6)
【分类号】:TN912.34
[Abstract]:In order to solve the problem of random initialization of the center value and radius of the basis function in the speech recognition task based on the traditional radial basis function (Radial basis) function RBF neural network, the mechanism of human brain's hierarchical processing of speech perception is discussed. An unsupervised pretraining method using a large amount of unlabeled data to initialize the network parameters is proposed instead of the traditional random initialization method. The depth self-coding network is used as the acoustic model of speech recognition. The anti-noise performance of isolated words with small vocabulary size is analyzed under Mel frequency cepstrum coefficient (Mel Frequency Cepstrum coefficient) and frequency cepstrum coefficient based on Gammatone audio filter (Gammatone Frequency Cepstrum efficient coefficient (Gammatone Frequency Cepstrum). The experimental results show that the depth self-coding network has better anti-noise performance than the radial basis function neural network under the MFCC feature, and the average recognition rate of the MFCC feature is 1.87% higher than that of the classical MFCC feature.
【作者单位】: 太原理工大学信息工程学院;天津大学计算机科学与技术学院;
【基金】:国家自然科学基金(No.61371193,No.61303109) 山西省留学回国择优资助项目(晋人社厅函[2013]68号) 山西省自然科学基金(No.2014021022-6)
【分类号】:TN912.34
【参考文献】
相关期刊论文 前6条
1 张晓丹;黄丽霞;张雪英;;关于在噪声环境下语音识别优化研究[J];计算机仿真;2016年08期
2 陈梦U,
本文编号:2198084
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2198084.html