YHFT-Matrix处理器BP部件及shuffle单元的设计与实现
发布时间:2018-01-07 10:20
本文关键词:YHFT-Matrix处理器BP部件及shuffle单元的设计与实现 出处:《国防科学技术大学》2012年硕士论文 论文类型:学位论文
更多相关文章: YHFT-Matrix 移位器 位处理 shuffle SIMD 打包解包 模拟验证 综合
【摘要】:数字信号处理器(Digital Signal Processor,DSP)是一种专门用于数字信号处理的处理器,在无线通信系统和社会生活的其它领域得到广泛应用,研制具有我国自主知识产权的DSP芯片不仅具有巨大的经济利益,而且能够为构建安全的通信设施提供基础保障。 YHFT-Matrix DSP是国防科技大学自主研发的一款高性能32位浮点DSP,它采用VLIW技术,一拍可以发射10条指令。本文在深入研究了目前主流DSP处理器体系结构与指令集系统的基础上,设计实现了YHFT-Matrix DSP的位处理部件(bit process,BP)和混洗(shuffle)单元。 BP部件是YHFT-Matrix DSP内核三大运算部件之一,主要执行移位指令、位处理指令和打包解包指令,该部件采用SIMD技术实现,可以充分挖掘程序的数据级并行。Shuffle单元用于实现向量运算单元中各个VPE之间的数据交换,它采用独立的SRAM来存放混洗模式,应用程序在执行过程中可以与寄存器文件或访存带宽等系统的关键资源分离,提高了混洗单元的执行效率。 本文在设计的各个阶段对BP部件及shuffle单元进行了模拟验证,先后进行了RTL模拟、综合后模拟及布图后模拟,并使用Synopsys公司的NC_Verilog工具对设计完成了覆盖率验证和反标后模拟,,保证了设计的正确性。对混洗单元进行了性能测评,结果显示:该混洗单元对应用程序的性能提升了14.3%~27.6%,而额外面积开销仅为0.6%。 同时,我们在TSMC65nm工艺下采用Synopsys公司的Design Compiler工具分别对BP部件和shuffle单元进行综合,结果显示:BP部件的总面积为581856um2,占单核面积的3.7%,关键路径延时为0.8ns;shuffle单元的总面积为352326um2,占仅单核面积的2.2%,关键路径延时为1.59ns,均能满足YHFT-Matrix DSP预期500MHz的频率要求。
[Abstract]:The digital signal processor (Digital Signal, Processor, DSP) is a kind of special processor for digital signal processing, has been widely used in other fields of wireless communication system and the social life, the DSP chip developed with independent intellectual property in China has enormous economic benefits, but also can provide the basis for building a secure communications infrastructure.
YHFT-Matrix DSP of National University of Defense Technology is a self-developed high-performance 32 bit floating-point DSP, it uses VLIW technology, a film can issue 10 instructions. Based on the in-depth study of the current mainstream DSP processor architecture and instruction set system, the design and implementation of YHFT-Matrix DSP (bit process, a processing unit and BP) shuffle (shuffle) unit.
BP YHFT-Matrix DSP is one of the core components of three operational components, mainly the implementation of shift instruction, a processing instruction and packing and unpacking instructions, this part uses the SIMD technology, can fully exploit the data level parallelism of a program for the realization of.Shuffle unit between the VPE vector arithmetic unit in the data exchange, it adopts independent SRAM to store shuffle model, applications can be separated from the key resource register file or memory bandwidth of the system in the implementation process, improve the efficiency of the shuffle unit.
In every stage of the design of BP components and shuffle unit simulation, has carried out the RTL simulation, comprehensive simulation and post layout simulation, and use the Synopsys NC_Verilog tools to complete the coverage verification and Simulation of anti standard design, to ensure the correctness of the design of shuffle unit. The performance evaluation, the results show that the shuffle unit for the application to improve the performance of 14.3%~27.6%, while the cost of extra area is only 0.6%.
At the same time, we use Synopsys Design Compiler tool in the TSMC65nm process respectively for BP components and the shuffle unit are integrated, the results showed that the total area of BP parts for 581856um2, accounting for 3.7% of the area of single nucleus, the delay of the critical path is 0.8ns; the total surface of the shuffle unit area is 352326um2, accounted for only 2.2% of the area of single nuclear the critical path delay, 1.59ns, YHFT-Matrix DSP 500MHz can meet the expected frequency requirements.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP332
【参考文献】
相关期刊论文 前1条
1 万江华;刘胜;周锋;王耀华;陈书明;;具有高效混洗模式存储器的可编程混洗单元[J];国防科技大学学报;2011年06期
本文编号:1392139
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1392139.html