基于统计模型的语音识别系统研究及DSP实现

发布时间：2018-06-25 02:30

本文选题：语音识别 + MFCC　；参考：《电子科技大学》2012年硕士论文

【摘要】：语音识别是通过人类说话声音的各种特征，来辨别人类自然语音的语义，或者用来辨别说话人是谁等。随着语音识别系统的发展，语音识别技术被广泛应用到医疗、军事、航空、移动互联网等领域。近年来，随着各项技术的不断突破，嵌入式语音识别系统发展得很快，已经在许多消费电子类产品中出现，它深刻地改变了传统的人机交互模式。识别准确率和鲁棒性是语音识别系统的关键，本文主要从这两个角度来研究孤立词语音识别系统的基本算法和OOV拒识算法的实现，以及系统在DSP平台上的实现。首先，本文对语音识别系统中基本原理和实现技术进行了详细的描述，主要讨论了语音信号的前端处理，前端处理的重点是端点检测，提取语音特征参数。然后论述了语音模型的建立与实现，并重点讨论了HMM的初始化以及如何合并模板参数。其次，，语音识别系统的识别结果总是难以避免误识，这会严重影响到系统的鲁棒性和识别准确率，所以需要拒识OOV语音。考虑到在嵌入式平台上系统实现的复杂性和成本，本文选择了基于后验概率特征和LVQ的拒识算法来完成拒识，并提出了用于拒识的特征参数，这几个特征参数能比较好地诠释OOV与IV在后验概率上的不同之处。将类标签和特征参数组成的向量作为输入向量，输入到LVQ网络进行训练，使得LVQ网络具有区分OOV和IV两个类的能力。最后通过不同输入向量训练的网络以及不同的测试集合来测试系统的拒识能力，并给出系统在不同情况下的IV拒绝率及OOV接受率。结果表明，系统在拒绝约2.6%的IV语音的同时，可以拒绝98%以上的OOV语音。最后，在系统相关的算法在PC平台上实现后，研究了孤立词语音识别系统在DSP平台上的实现。首先研究了DSP平台的处理器架构、存储器架构、DSP内部各个芯片之间的连接以及各接口的设置，并特别详细阐述了音频处理芯片的使用方法。然后给出了系统软件的设计流程，并描述了语音识别算法如何从PC平台移植到DSP平台。接着，研究了系统的自举，使得系统能在脱离仿真器和开发环境的情况下运行。最终建立了一套基于DSP的通用孤立词语音识别系统。
[Abstract]:In recent years , with the development of speech recognition system , the speech recognition technology has been widely used in medical , military , aviation , mobile internet , etc . With the development of the speech recognition system , the speech recognition technology has been widely used in medical , military , aviation , mobile internet , etc . In recent years , with the development of various technologies , the embedded speech recognition system has developed rapidly . It has changed the traditional man - machine interaction mode profoundly . The recognition accuracy and robustness are the key of the speech recognition system .

Firstly , the basic principle and realization technology of speech recognition system are described in detail . The front - end processing of the speech signal is mainly discussed . The emphasis of the front - end processing is endpoint detection , and the speech feature parameters are extracted . Then the establishment and implementation of the speech model are discussed , and the initialization of HMM and how to merge the template parameters are discussed .

Secondly , the recognition result of the speech recognition system is always difficult to avoid , which can seriously affect the robustness and the recognition accuracy of the system , so it is necessary to reject the OOV speech . Considering the complexity and cost of the system implementation on the embedded platform , this paper selects the feature parameters based on the posterior probability characteristic and the LVQ , and then inputs to the LVQ network for training so that the LVQ network has the ability to distinguish between OOV and IV . The results show that the system can reject more than 98 % of the OOV speech while rejecting about 2.6 % of the IV voice .

Finally , after the system - related algorithm is implemented on PC platform , the realization of isolated word speech recognition system on DSP platform is studied . Firstly , the processor architecture of DSP platform , the memory architecture , the connection between each chip in DSP and the setting of each interface are discussed . Then , the design flow of the system software is discussed , and how the speech recognition algorithm is transplanted from PC platform to DSP platform is described .
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TN912.34;TP368.1

【参考文献】