自主XDSP中软件流水循环缓冲部件的设计与实现

发布时间：2018-07-22 14:25

【摘要】：DSP算法中存在大量的循环操作，而开发循环体间的指令级并行是提高处理器性能的重要方法之一。循环体调度技术包括循环展开和软件流水等。本文基于自主X DSP，研究软件流水技术提高X DSP中循环程序的执行效率，设计并实现了软件流水循环缓冲部件。论文详细分析了循环展开和软件流水技术，基于X DSP的需求和特点，设计了一种基于软件流水模调度算法的循环缓冲。该部件位于流水线的指令派发栈，，用于存储和派发循环体指令，减少执行循环程序时的访存次数，从而减少访存延迟对DSP性能的影响。本文的主要工作如下： 1）在分析了for循环和while循环执行特点的基础上设计了循环缓冲的总体结构，并完成了循环缓冲控制模块和存储派发模块的详细设计。 2）设计了一种循环指令的跟踪比较机制，完成了循环指令的装载、排空和重载，实现循环指令的准确存储和派发。 3）设计了计数器比较机制和中断排空机制，实现了循环程序的精确中断。 4）研究了模拟验证方法，构建了循环缓冲的模拟验证平台，对循环缓冲进行了全面的系统级验证。 5）利用一个矩阵乘加程序和三个典型的DSP图像算法等典型程序评测了循环缓冲的性能，通过实际的模拟测试，循环缓冲在上述程序中的使用率分别达到了95.34%、90.61%、88.85%和89.94%，大大减少了指令访存频率，降低了访存功耗。 6）基于45nm工艺，完成了循环缓冲的逻辑综合。该部件工作频率可达1GHz，面积为76778.69平方微米，动态功耗为28.99mW，静态功耗为1.83mW。该循环缓冲可以存储112条32位的循环体指令，在循环专用指令的控制下完成循环体指令的存储和派发。显著提高了循环程序的执行效率。
[Abstract]:There are a lot of cyclic operations in the DSP algorithm, and the development of instruction level parallelism between the cycle bodies is one of the most important ways to improve the performance of the processor. The cycle body scheduling technology includes cyclic expansion and software pipelining. Based on the autonomous X DSP, this paper studies the software pipelining technology to improve the execution efficiency of the circular program in the X DSP, and designs and implements the software. Running water cycle buffer.
In this paper, cyclic deployment and software pipelining are analyzed in detail. Based on the requirements and characteristics of X DSP, a cyclic buffer based on software pipelining scheduling algorithm is designed. The component is located in the pipelined instruction stack, which is used to store and distribute circulant instructions, reduce the number of memory visits in the execution of the circulant program, and reduce the memory delay. The effect of DSP performance. The main work of this article is as follows:
1) on the basis of the analysis of the characteristics of the for cycle and the while cycle, the overall structure of the cyclic buffer is designed, and the detailed design of the cycle buffer control module and the storage dispatch module is completed.
2) design a tracking and comparing mechanism for cyclic instructions, which completes the loading of cyclic instructions, emptying and overloading, and achieving accurate storage and distribution of circular instructions.
3) the counter comparison mechanism and interrupt emptying mechanism were designed to achieve the precise interruption of the cyclic procedure.
4) we studied the simulation verification method, built the simulation platform for cyclic buffering, and carried out a comprehensive system level verification of cyclic buffers.
5) using a matrix multiplier program and three typical DSP image algorithms, the performance of cyclic buffer is evaluated. Through the actual simulation test, the utilization rate of cyclic buffer in the above programs is 95.34%, 90.61%, 88.85% and 89.94% respectively, which greatly reduces the frequency of instruction memory and reduces the memory loss.
6) based on the 45nm process, the logic synthesis of cyclic buffer is completed. The working frequency of the component is up to 1GHz, the area is 76778.69 square microns, the dynamic power is 28.99mW, and the static power is 1.83mW.
The cyclic buffer can store 112 32 bit cyclic instructions and complete the storage and distribution of the circulation instruction under the control of the special instruction. The efficiency of the cycle program is greatly improved.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】