X-DSP 64位SIMD位处理部件及混洗单元的设计与实现

发布时间：2018-04-04 12:59

本文选题：位处理　切入点：混洗　出处：《国防科学技术大学》2013年硕士论文

【摘要】：数字信号处理器（Digital Signal Processor，DSP）是一门涉及多学科而又广泛应用于众多领域的新兴学科。步入21世纪以后，社会进入数字时代，而DSP正是这场数字化革命的核心。 X-DSP是一款自主设计的高性能64位SIMD DSP，它采用VLIW技术，一拍可以发射11条指令，设计主频为1.25GHz。本文根据X-DSP的性能需求，在深入研究了目前主流DSP处理器体系结构与指令集系统的基础上，完成了64位位处理（Bit-Processing，BP）部件和混洗(Shuffle)单元的设计与实现，具体内容如下： ⒈设计实现了X-DSP64位SIMD位BP部件。它作为X-DSP内核运算单元的功能部件之一，主要执行移位指令、位处理指令和打包解包指令。通过采用SIMD结构，可以一拍内完成两个32位数据操作，对程序的数据级并行提供充分的支持。 ⒉64位Shuffle单元作为一种向量数据交互网络，主要用于实现向量运算单元中各个VPE之间的数据交换。本文通过深入研究目前几种主流芯片的混洗指令设计特点，设计了自己的64位混洗指令及混洗电路结构。它采用独立的SRAM来存放混洗模式，这样使得应用程序在执行过程中可以与寄存器文件或访存带宽等系统的关键资源分离，提高了其执行效率。 ⒊本文在设计中对BP及shuffle进行了三个层次的模拟验证：模块级、部件级、SPE/VPE级，其中在模块级还结合了SVA形式化验证，保证了设计功能的正确性；在部件级，我们通过加载单个部件的测试激励，获得了相应模块的覆盖率。同时，我们还对混洗单元进行了性能测评，结果显示：在相同的混洗粒度下，X-DSP混洗模式存储器的混洗模式表示效率分别为0.88和0.75，在对比的几种混洗单元中为最高。最后，我们采用Synopsys公司的Design Compiler工具分别对BP部件及shuffle单元进行综合，结果显示：位处理部件的总面积为48513.7819um2，关键路径延时为0.42ns，，功耗为28.1785mw；混洗单元的总面积为662016.8um2，关键路径延时为0.44ns，功耗为179.6060mw，均能满足X-DSP预期1.25GHz的性能要求。
[Abstract]:Digital Signal processor (DSP) is a new subject which involves many disciplines and is widely used in many fields.After entering the 21 st century, the society enters the digital age, and DSP is the core of the digital revolution.X-DSP is a self-designed high performance 64-bit SIMD DSP. It uses VLIW technology, can send 11 instructions in one shot, and the main frequency is 1.25 GHz.According to the performance requirements of X-DSP, the design and implementation of 64-bit processing Bit-Processing-BPs and shuffle units are completed on the basis of in-depth research on the current mainstream DSP processor architecture and instruction set system. The main contents are as follows:1. The X-DSP64 bit SIMD BP part is designed and implemented.As one of the functional components of the X-DSP kernel unit, it mainly executes shift instruction, bit processing instruction and package unpack instruction.By adopting SIMD structure, two 32 bit data operations can be completed in one beat, which can provide sufficient support for the data level parallelism of the program.As a vector data interaction network, the 264 bit Shuffle unit is mainly used to realize the data exchange between the VPE in the vector operation unit.In this paper, the design characteristics of several kinds of mainstream chips' washing instructions are studied, and their own 64 bit washing instructions and their circuit structure are designed.It uses independent SRAM to store the shuffling mode, which enables the application to separate from the key resources of the system such as register file or memory access bandwidth in the execution process, and improves its execution efficiency.3. In this paper, BP and shuffle are simulated at three levels: module level, component level and SPE / VPE level, in which SVA formal verification is combined at module level to ensure the correctness of design function.We get the corresponding module coverage by loading the test excitation of a single component.At the same time, we also evaluate the performance of the washing unit. The results show that the efficiency of the mixed-mode memory of X-DSP is 0.88 and 0.75 respectively under the same washing granularity, which is the highest in the comparison of several washing units.Finally, we use the Design Compiler tool of Synopsys Company to synthesize BP parts and shuffle units respectively.The results show that the total area of the bit processing unit is 48513.7819um2, the critical path delay is 0.42ns, the power consumption is 28.1785mw. the total area of the mixed-washing unit is 662016.8um2, the critical path delay is 0.44ns, and the power consumption is 179.6060mw. it can meet the performance requirements of X-DSP.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】