YHFT-Matrix DSP低功耗向量运算单元设计与归约网络研究

发布时间：2018-02-07 12:12

本文关键词： 数字信号处理器向量运算技术低功耗算术逻辑单元除法器归约网络逻辑验证　出处：《国防科学技术大学》2012年硕士论文　论文类型：学位论文

【摘要】：数字信号处理器（DSP）是一种特别适合于数字信号处理运算的嵌入式微处理器。随着其在通信、多媒体处理等高端领域的广泛应用，对DSP性能的要求也越来越高，因此研究和设计高性能DSP就具有较大的科研和应用价值。本文依托于面向软件无线电的“YHFT-Matrix DSP”的开发与研制，旨在研究和设计符合YHFT-Matrix DSP高标准要求的向量运算单元和归约网络。本文研究了DSP的结构特点和向量运算技术的实现，并介绍了国际上将相关向量运算实现技术应用于面向3G和4G无线通信的DSP。概述了YHFT-Matrix DSP的体系结构，以及向量运算单元和向量数据交互网络的特点，指出向量运算单元的设计需结合低功耗技术，向量数据交互网络要满足灵活性和便于使用的要求，并根据开发者的反馈信息总结了现有运算单元值得提升和改进的功能点。将低功耗设计方法和RTL级的低功耗设计技术应用于向量运算单元的设计。用门控时钟技术实现了可变宽度的向量处理单元VPU。分析了定点SIMD IALU的应用需求以及相关指令，以进位选择SIMD加法器为核心，结合操作数隔离低功耗技术，设计并实现了低功耗定点SIMD IALU。基于分离基数的基_4除法算法，结合状态赋值低功耗技术，设计了定点除法器，支持有符号和无符号除法运算，数据通路为8/16/32位SISD/SIMD模式，可工作于固定执行周期模式和可变执行周期模式，两种模式分别适用于向量处理单元VPU和标量处理单元SPU。以矩阵乘法算法为例，比较了归约的软件实现方式和硬件实现方式，，结果表明在增加面积开销的条件下硬件实现方式对算法具有明显的加速作用。在定点归约网络的设计中，引入归约树模型实现了定点归约网络的完整平均分组，以隐式自增指定目标VPE的方式实现了定点归约网络的循环编程。研究了浮点归约的实现方式，指出由于浮点运算单元巨大的硬件面积开销，浮点归约网络应采用软硬件相结合的实现方式。基于YHFT-Matrix DSP中定点归约网络的分组模式，给出了一种支持浮点混合运算归约网络的实现方案：用SPU配置浮点归约运算类型，通过专用的混洗网络搬移操作数，并调用向量运算单元中的浮点运算部件实现计算，从而完成浮点归约操作。介绍了YHFT-Matrix DSP的逻辑功能验证流程，编写基于Verilog语言和Perl脚本语言的运算部件模块级测试平台。用DC综合工具对实现的三个运算部件在TSMC65nm工艺下进行了逻辑综合，给出综合结果和性能比较，结果表明三个运算部件均能达到700MHz工作频率的设计要求。介绍了4核YHFT-QMBase芯片的仿真测试和单核的性能评测。
[Abstract]:Digital signal processor (DSP) is a kind of embedded microprocessor which is especially suitable for digital signal processing. With the wide application of DSP in communication, multimedia processing and other high-end fields, the performance of DSP is becoming more and more demanding. Therefore, the research and design of high performance DSP has great scientific research and application value. This paper is based on the development and development of "YHFT-Matrix DSP" for software radio, aiming to study and design vector operation units and reduction networks that meet the high standard requirements of YHFT-Matrix DSP. This paper studies the structure characteristics of DSP and the realization of vector operation technology, and introduces the application of correlation vector operation technology to 3G and 4G wireless communication in the world. The architecture of YHFT-Matrix DSP is summarized. As well as the characteristics of vector operation unit and vector data interactive network, it is pointed out that the design of vector operation unit should be combined with low power technology, and vector data interaction network should meet the requirements of flexibility and convenience. According to the feedback information of the developer, the function points of the existing computing units are summarized. The low power design method and the low power design technique of RTL level are applied to the design of vector operation unit. The variable width vector processing unit (VPU) is realized by gating clock technology. The application requirements and related instructions of fixed-point SIMD IALU are analyzed. Taking the carry-select SIMD adder as the core and combining the Operand isolation low power technology, a low power fixed-point SIMD IALU algorithm is designed and implemented. The base stack 4 division algorithm based on the separated cardinality and the state assignment low-power technology are used to design the fixed-point divider. The data path is 8 / 16 / 32 bit SISD/SIMD mode, which can work in fixed execution cycle mode and variable execution cycle mode. The two modes are suitable for vector processing unit (VPU) and scalar processing unit (SPU), respectively. Taking the matrix multiplication algorithm as an example, this paper compares the software implementation and hardware implementation of the reduction algorithm. The results show that the hardware implementation can accelerate the algorithm obviously under the condition of increasing the area overhead. In the design of fixed-point reduction network, The reduction tree model is introduced to realize the complete average packet of fixed point reduction network, and the cyclic programming of fixed point reduction network is realized by implicit self-increasing target VPE. The implementation of floating point reduction is studied. It is pointed out that the floating-point reduction network should be implemented by the combination of hardware and software because of the huge hardware area overhead of the floating-point operation unit, based on the grouping mode of fixed-point reduction network in YHFT-Matrix DSP. In this paper, a scheme of supporting floating point hybrid operation reduction network is presented. The type of floating point reduction operation is configured with SPU, the operands are moved by special washing network, and the floating-point operation unit in the vector operation unit is called to realize the calculation. Thus the floating point reduction operation is completed. This paper introduces the logic function verification flow of YHFT-Matrix DSP, compiles the modular test platform based on Verilog language and Perl script language, and uses DC synthesis tool to realize the logic synthesis of the three operation components under TSMC65nm technology. The synthetic results and performance comparison are given. The results show that the three operation components can meet the design requirements of 700MHz operating frequency. The simulation test of four core YHFT-QMBase chips and the performance evaluation of single core are introduced.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP332;TN402

【参考文献】