当前位置:主页 > 科技论文 > 计算机论文 >

基于X-DSP乘法部件的设计、验证与优化

发布时间:2018-12-27 06:43
【摘要】:X-DSP是一款自主正向研发的、支持浮点和定点操作的32位高性能数字信号处理器,采用超长指令字(VLIW)体系结构和单指令流多数据流(SIMD)技术。乘法部件是CPU内核四大功能运算部件之一。本文根据X-DSP的设计要求,研制开发了一款高性能、支持定点和浮点乘法的SIMD乘法部件,满足了DSP对并行运算、高精度以及实时数据处理能力的需求。本文的主要研究内容有以下几点: 1、乘法部件的设计。首先对乘法部件的指令进行分析,然后根据分析结果对定点乘法和浮点乘法进行结构设计,之后采用多数据流乘法矩阵算法、Wallace树型结构以及超前进位加法器实现了SIMD乘法部件的逻辑设计。 2、乘法部件的时序优化。首先,对乘法部件进行逻辑综合,得出关键路径。然后对处在关键路径上的功能模块进行优化设计。最后从逻辑结构与算法级和代码级对整个乘法部件进行时序优化。优化后,在45nm CMOS工艺下,且在面积、功耗等性能满足设计要求的前提下,,关键路径延时减少190ps,时序性能提高22.4%,寄存器的个数减少了18.3%。 3、乘法部件的功能验证。本文采取模拟验证和FPGA仿真验证方法对乘法部件进行功能验证。模拟验证的关键是测试向量的开发,验证过程中采取功能覆盖的方法从模块级和系统级对乘法部件进行了测试向量的开发。模块级验证主要根据每个模块实现的功能开发测试向量。系统级验证主要分为流水线验证和运算功能验证。最后,对乘法部件进行了FPGA仿真验证。 在45nm CMOS工艺下,布局布线结果表明:乘法部件在worst条件下主频达到1GHz,动态功耗为12.6686mW,静态功耗为4.5032mW,面积为202718.88um2,完全达到X-DSP的设计目标。
[Abstract]:X-DSP is an autonomous forward developed 32-bit high performance digital signal processor which supports floating-point and fixed-point operation. It adopts super-long instruction word (VLIW) architecture and single-instruction stream multi-stream (SIMD) technology. Multiplicative component is one of the four functions of CPU kernel. According to the design requirements of X-DSP, a high performance SIMD multiplier supporting fixed-point and floating-point multiplication is developed in this paper, which meets the requirements of DSP for parallel operation, high precision and real-time data processing. The main contents of this paper are as follows: 1. Design of multiplication components. Firstly, the instructions of multiplication components are analyzed, then the structure of fixed-point multiplication and floating-point multiplication are designed according to the analysis results, and then the multi-data stream multiplication matrix algorithm is adopted. The Wallace tree structure and the ahead carry adder realize the logical design of the SIMD multiplication unit. 2. Timing optimization of multiplicative components. First of all, the multiplication components are logically synthesized and the critical path is obtained. Then the function module on the critical path is optimized. Finally, the logic structure, algorithm level and code level are used to optimize the time sequence of the whole multiplication unit. After optimization, the critical path delay is reduced by 190psand the timing performance is improved by 22.4s, and the number of registers is reduced by 18.3in 45nm CMOS process, and the performance of critical path is reduced by 190ps. 3. Functional verification of multiplication components. In this paper, the method of simulation verification and FPGA simulation verification is used to verify the function of multiplication components. The key of simulation verification is the development of test vectors. In the process of verification, the test vectors are developed at the module level and the system level by the method of functional coverage. Module level verification is mainly based on the function of each module to develop the test vector. System level verification is mainly divided into pipeline verification and operational function verification. Finally, the multiplication components are verified by FPGA simulation. In 45nm CMOS process, the layout and wiring results show that the main frequency of the multiplier is 1 GHz under worst, the dynamic power consumption is 12.6686mW, the static power consumption is 4.5032mW, the area is 202718.88um2, and the design goal of X-DSP is fully achieved.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP332

【参考文献】

相关期刊论文 前1条

1 郝志刚;曾献君;;一种并行的Sticky位计算方法[J];计算机工程与科学;2006年04期



本文编号:2392651

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2392651.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1ff2d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com