复杂环境下阵列语音识别方法的研究

发布时间：2018-06-19 11:55

本文选题：麦克风阵列 + 语音识别　；参考：《辽宁工业大学》2014年硕士论文

【摘要】：语音识别属于人工智能和语音处理领域，它是让机器听懂人类的语言，并按照人的命令去执行相应的操作。目前单通道语音识别发展迅速，识别效果较好。然而，存在灵活性差、需要佩戴麦克风、限制说话人活动等缺点。麦克风阵列正好能克服上述单通道语音识别的缺点，，因此，近几年麦克风阵列语音识别逐渐成为研究热点。论文在综述国内外语音识别技术研究进展的基础上，系统分析了目前语音识别存在的问题；阐述了语音信号预处理的理论基础，包括采样量化、分帧加窗、端点检测等；详细分析了特征参数提取常用的参数梅尔倒谱系数；研究了HMM模型的三个基础算法以及语音识别中基元的选择和状态数的确定；给出了HMM模型在应用中存在的问题及解决办法。针对单通道语音识别在实际环境中识别效果不理想的问题，论文首先提出一种基于多通道选择的阵列语音识别方法。该方法针对实际封闭环境，构建时延补偿后阵列信号相关矩阵，并对其进行子空间分解。在信号子空间下，采用基于归一化多路互相关系数的通道选择方法，去掉相关性较小的通道、选择互相关系数最大的通道组成新麦克风阵列，进而经过波束形成得到输出信号；最后，通过语音识别器得到识别结果。在此基础上考虑到语音识别不仅是一个信号处理问题，而是一个模型判别问题。因此，阵列波束形成和语音识别联合处理，将语音识别系统中的信息运用到前端的阵列处理中，用共轭梯度算法找到使正确假设似然概率最大的滤波器系数，应用到语音识别器得到识别结果。仿真实验结果表明，这些方法不仅减少了阵元数目，降低了计算量，而且加强了对识别有利的信息，提高了识别率，在复杂声学环境下具有较好的鲁棒性。
[Abstract]:Speech recognition belongs to the field of artificial intelligence and speech processing , it is to let the machine understand human language and carry out corresponding operation according to the human order . At present , the single - channel speech recognition is developed rapidly and the recognition effect is good . However , the microphone array can overcome the disadvantages of single - channel speech recognition , so the speech recognition of microphone array has become a hot spot in recent years .

On the basis of summarizing the research progress of speech recognition at home and abroad , this paper systematically analyzes the existing problems of speech recognition .
The theoretical basis of speech signal preprocessing is described , including sampling quantization , sub - frame windowing , endpoint detection , etc .
The parameter Mel cepstrum coefficient commonly used in extracting characteristic parameters is analyzed in detail .
The three basic algorithms of HMM and the determination of the number of elements in speech recognition are studied .
The problems and solutions of HMM model in application are given .

An array speech recognition method based on multi - channel selection is proposed for single - channel speech recognition in real environment . The method is based on the real - enclosed environment , constructs delay - compensated array signal correlation matrix and subspace decomposition . Under the subspace of signal subspace , the channel selection method based on the normalized multi - channel correlation number is adopted to remove the channel with smaller correlation , and the channel with the largest correlation number is selected to form a new microphone array , and then the output signal is obtained through the beam forming ;
In the end , the recognition result is obtained by the speech recognizer . Based on this , the speech recognition is not only a signal processing problem , but a model discrimination problem . Therefore , the array beam forming and speech recognition combined processing are used to apply the information in the speech recognition system to the array processing of the front end . The result of recognition is obtained by using the conjugate gradient algorithm . The simulation results show that these methods not only reduce the number of elements , reduce the calculation amount , but also enhance the recognition favorable information , improve the recognition rate and have better robustness in the complex acoustic environment .
【学位授予单位】：辽宁工业大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.34

【参考文献】