基于FPGA的实时固定语音识别系统研究与实现

发布时间：2018-01-24 12:41

本文关键词： 固定语音识别实时多处理单元并行处理 AXI4总线 FPGA　出处：《解放军信息工程大学》2013年硕士论文　论文类型：学位论文

【摘要】：固定语音识别是指在动态语音流中识别出与给定模板库中的模板语音相同或基本相同的语音片段。作为固定音频检索的一个分支，固定语音识别可用于电信网垃圾语音识别及广告监播、版权管理等领域。但随着语音模板数量的不断增加，固定语音识别系统面临实时性的考验。而实时性则代表了系统的多路实时处理能力，是其走向实用必须解决的关键问题。论文课题来源于国家863计划某重点项目，结合电信网对多路语音实时处理的需求现状，针对大容量语音模板库（模板个数8000）条件下的固定语音实时识别问题展开研究，取得了如下成果： 1．提出了一种基于超标量体系结构的多处理单元并行架构（Multi-PE ParallelArchitecture，MPPA）。该架构充分挖掘固定语音识别算法的并行性：处理单元内部采用超字并行结构，考虑到语音信号处理以帧为单位，超字并行的最大并行度为8，设计了数据处理位宽为256bit的超位宽处理单元；并行的多处理单元之间采用超标量体系结构，每个处理单元均可以独立完成匹配处理任务，根据处理性能的需求该架构具有可伸缩性。 2．针对MPPA架构，提出了一种使架构中各处理单元时刻“尽力而为”的数据存储和调度机制。对模板数据存储，研究了一种采用集中式共享存储和分布式存储相结合的混合式存储结构，设计模板数据的二级存储机制和PE内部双缓存（Double Buffer，DB）结构，，实现了高效的数据存储体系。对数据调度，研究了轮询分发机制和先请求先分发机制的模板分发策略。 3．针对直接使用FPGA内嵌的硬核乘法器进行大位宽平方计算时IP核资源消耗过大的问题，提出了一种高效的平方计算方法。该方法先通过一种可重复迭代的简单逻辑电路降低操作数的位宽，然后结合内嵌DSP48E完成平方的计算，有效的减少了DSP48E的使用数量。 4．将Xilinx公司的FPGA中内嵌的MicroBlaze处理器作为主处理器，结合基于MPPA处理架构的MPPA协处理器，在FPGA平台上实现了固定语音识别SOPC系统。系统采用基于AXI4总线协议的共享RAM接口方式完成主处理器和协处理器之间的控制和数据的交互。通过对系统的性能测试和资源分析，结果表明该系统在8192个模板的情况下实时处理22路固定语音识别任务，实现了大容量模板库条件下的多路语音实时处理。
[Abstract]:Fixed speech recognition refers to the recognition of the same or almost the same speech segment as the template speech in a given template library in the dynamic speech stream, which is regarded as a branch of fixed audio retrieval. Fixed speech recognition can be used in telecom network spam speech recognition, advertising monitoring, copyright management and other fields, but with the number of voice templates increasing. The fixed speech recognition system faces the test of real-time, which represents the multi-channel real-time processing ability of the system, which is the key problem that must be solved when it moves towards practice. This paper comes from a key project of the National 863 Program, and combines the demand of multi-channel real-time speech processing in telecommunication network. In this paper, the fixed speech real-time recognition problem under the condition of large capacity speech template library (8 000 templates) is studied, and the results are as follows: 1. A multi-processing unit parallel architecture based on superscalar architecture is proposed. This architecture fully exploits the parallelism of the fixed speech recognition algorithm: the processing unit adopts the hyperword parallel structure, considering that the speech signal processing takes frame as the unit, the maximum parallelism of superword parallelism is 8. A data processing unit with a bit width of 256bit is designed. The parallel multi-processing units adopt superscalar architecture and each processing unit can accomplish matching processing tasks independently. The architecture is scalable according to the requirement of processing performance. 2. Aiming at the MPPA architecture, a data storage and scheduling mechanism is proposed, which can make every processing unit in the architecture "try its best" at all times. A hybrid storage structure based on centralized shared storage and distributed storage is studied. The secondary storage mechanism of template data and double double Buffer in PE are designed. For data scheduling, the polling distribution mechanism and the template distribution strategy of the first request first distribution mechanism are studied. 3. To solve the problem that IP core resource consumption is too large when the square of large bit width is calculated directly by using the hard core multiplier embedded in FPGA. In this paper, an efficient square calculation method is proposed, which reduces the bit width of operands by a simple iterative logic circuit, and then accomplishes the square calculation with embedded DSP48E. Effectively reduces the use of DSP48E. 4. The MicroBlaze processor embedded in FPGA of Xilinx Company is taken as the main processor, and the MPPA coprocessor based on MPPA processing architecture is combined. The fixed speech recognition SOPC system is implemented on FPGA platform, and the control and data exchange between the main processor and coprocessor is accomplished by using the shared RAM interface based on AXI4 bus protocol. Through performance testing and resource analysis of the system. The results show that the system can process 22 fixed speech recognition tasks in real time with 8192 templates, and realize multi-channel real-time speech processing under the condition of large volume template library.
【学位授予单位】：解放军信息工程大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TN912.34

【参考文献】