语音信号端点检测算法的研究

发布时间：2018-04-19 14:36

本文选题：语音识别 + 语音端点检测　；参考：《郑州大学》2016年硕士论文

【摘要】：在当前信息科技时代,语音信号识别(ASR)技术、语音信号编码(ASC)技术、及语音信号增强(ASE)技术[1]将会在安防领域、人机交互领域、通信领域以及未来的消费电子产品领域[2]产生强有力的技术支撑作用。通过语音信号端点检测技术,可以准确地分析出一段语音信号中的纯语音信号和静音段[3],该技术直接对ASR、ASE技术的性能和ASC技术的效率产生决定性影响[4]。可以用三个环节来表征一个完整的语音端点检测模型:首先,语音信号预处理环节,包括信号滤波、语音流分帧以及信号加窗等[5]。其次,提取整个语音流的特征向量,小波分析(WA)技术的多分辨率解析特性是提取语音信号特征向量的极好方法[6]。最后,语音端点判别模型的建立[7]。传统的语音端点检测算法有基于时域的双门限法、基于频域的普熵法以及基于倒普特征的检测方法等。针对在低信噪比和复杂的噪声环境下,为了得到满意的端点检测效果,本文提出了基于优化极限学习机(ELM)的端点检测模型,通过优化网络连接参数以弥补算法本身的不足。(1)为了优化ELM神经网络的输入权值和隐含层偏差,结合粒子群优化(PSO)算法,形成了粒子群优化极限学习机(PSO-ELM)端点检测模型。依靠ELM神经网络的快速学习能力,瞬间完成端点检测并输出预测结果。该算法在一定程度上优化了网络连接结构,但是仍然存在一定的缺陷。(2)为了更好的优化ELM神经网络的连接参数,最后采用自适应步长果蝇(FOAMR)算法优化极限学习机,并将优化后算法应用于语音端点判别模型中。在Matlab辅助软件环境中做了大量的仿真实验,通过实验结果可以得出结论,单纯的ELM模型具有最好的快速性和较高的准确率;PSO-ELM模型的准确率有所提升但是训练时间最长;而最终基于自适应果蝇优化ELM模型具有最高的准确率,同时具备了很好的快速性,达到了实际应用的要求。
[Abstract]:In the age of information technology, the technology of speech signal recognition (ASR), speech signal coding (ASC) and speech signal Enhancement (ASE) will be used in the field of security and human-computer interaction.The communication field and the future consumer electronics field [2] have strong technical support function.The pure speech signal and mute segment in a speech signal can be accurately analyzed by the endpoint detection technique of speech signal. This technique has a decisive effect on the performance of ASR ASE technology and the efficiency of ASC technology [4].A complete speech endpoint detection model can be represented by three links: first, the speech signal preprocessing, including signal filtering, speech stream framing and signal windowing, etc.Secondly, extracting the feature vector of the whole speech stream and the multi-resolution analytical characteristic of the wavelet analysis (WAW) technique is an excellent method to extract the feature vector of the speech signal [6].Finally, the establishment of speech endpoint discriminant model [7].The traditional speech endpoint detection algorithms include dual threshold method based on time domain, general entropy method based on frequency domain and detection method based on inverted features.In order to obtain satisfactory endpoint detection effect in low SNR and complex noise environment, an endpoint detection model based on Elm, an optimized extreme learning machine, is proposed in this paper.In order to optimize the input weights and hidden layer deviations of the ELM neural network and combine the particle swarm optimization (PSO) algorithm, a PSO extreme learning machine (PSO ELM) endpoint detection model is formed by optimizing the network connection parameters to make up the deficiency of the algorithm.Based on the fast learning ability of ELM neural network, the endpoint detection is completed and the prediction results are output.The algorithm optimizes the network connection structure to a certain extent, but there are still some defects. In order to better optimize the connection parameters of ELM neural network, the adaptive step size FOAMR-based algorithm is used to optimize the ultimate learning machine.The optimized algorithm is applied to the speech endpoint discrimination model.A large number of simulation experiments have been done in the environment of Matlab aided software. It can be concluded from the experimental results that the pure ELM model has the best rapidity and higher accuracy. The accuracy of PSO-ELM model has been improved but the training time is the longest.Finally, the ELM model based on adaptive Drosophila optimization has the highest accuracy and rapidity, which meets the requirements of practical application.
【学位授予单位】：郑州大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TN912.3

【相似文献】