40纳米工艺乘法部件的物理设计与优化
发布时间:2018-06-29 20:33
本文选题:乘法部件 + 布图优化 ; 参考:《国防科学技术大学》2015年硕士论文
【摘要】:乘法运算部件速度的快慢直接影响到整个CPU内核数据通路的性能,高性能、低功耗乘法运算部件的物理设计与实现是当前的难点问题之一。综合考虑芯片的设计成本以及整体的性能,需要在有限的面积下完成乘法运算部件物理设计,这会导致时钟网络偏差较大,整体密度偏高,进而影响设计的时序和功耗。针对以上问题,本文以X-DSP的CPU内核乘法运算部件性能优化为背景,从提高时序、降低功耗、时序分析和等价性验证四个方面入手,对物理设计流程进行了详细研究,并对其中采用的主要方法和技术进行了阐述。本文的主要研究工作包括以下几个方面:1)布图规划是物理设计中的重要环节,其合理性对设计的性能有很大影响。采用层次化的物理设计方法,对顶层的CPU数据通路进行布图规划,根据模块之间的连接关系,调整出两种布图规划并进行对比,结果表明改进后的布图规划在时序上可优化9%。根据CPU数据通路的布图规划来确定乘法部件的布图规划,详细分析乘法部件的层次结构,迭代多次,改进后的布图规划在时序上可优化5%。2)在时钟树方面,减少时钟延时和时钟偏差是时钟网络的首要任务。乘法部件最初的时钟偏差是47.2ps,时钟延时是304.7ps;通过控制时钟驱动单元、最大扇出、最大级数来优化时钟网络,使时钟偏差降低到35.5ps,时钟延时降低到260.6~296.1ps。在上面的基础上通过控制时钟布线来优化时钟网络,使时钟偏差降低到27.9ps,时钟延时降低到232.3~260.2ps。3)芯片功耗已经成为与芯片速度、芯片面积同样重要的性能指标,通过自动化插入门控时钟使动态功耗降低了62.7%,通过调整约束、降低密度、减小单元倍数、多阈值单元替换等办法使静态功耗降低了9%。4)在静态时序分析方面,由于单模式单端角进行时序分析的不足,从端角的组成、分析模式以及分析流程三个角度讨论了多模式多端角的时序分析。基于乘法部件进行多模式多端角时序分析,并运用ice工具进行优化时序,使其达到满足。5)基于乘法部件进行形式化验证方法的研究,使用Formality工具进行形式化验证,解决了在验证过程中所遇到的门控时钟、扫描链等问题,最终验证成功。
[Abstract]:The speed of multiplication operation unit has a direct impact on the performance of the whole CPU kernel data path. The physical design and implementation of high performance and low power multiplication operation unit is one of the difficult problems at present. Considering the design cost and the overall performance of the chip, it is necessary to complete the physical design of the multiplication operation unit in a limited area, which will lead to a large deviation of the clock network and high overall density, which will affect the timing and power consumption of the design. Aiming at the above problems, this paper studies the physical design flow in detail from four aspects: improving timing, reducing power consumption, timing analysis and equivalence verification, based on the performance optimization of X-DSP CPU kernel multiplication operation unit. The main methods and techniques are described. The main research work of this paper includes the following aspects: 1) layout planning is an important part of physical design, and its rationality has a great influence on the design performance. Using the hierarchical physical design method, the layout planning of the CPU data path at the top level is carried out. According to the connection relationship between the modules, the two layout plans are adjusted and compared. The results show that the improved layout planning can optimize 9 parts in time sequence. According to the layout planning of CPU data path, the layout planning of multiplication components is determined, and the hierarchical structure of multiplicative components is analyzed in detail, iterated many times, the improved layout planning can optimize 5.2% in time sequence) in the aspect of clock tree, Reducing clock delay and clock deviation is the most important task of clock network. The initial clock deviation of the multiplication unit is 47.2 pss and the clock delay is 304.7 ps.The clock network is optimized by controlling the clock drive unit, the maximum fan out and the maximum series to reduce the clock deviation to 35.5psand the clock delay to 260.6 / 296.1ps. On the basis of the above, the clock network is optimized by controlling the clock wiring, so that the clock deviation is reduced to 27.9psand the clock delay is reduced to 232.3~260.2ps.3) chip power consumption has become the same performance index as chip speed and chip area. By automatically inserting the gating clock, the dynamic power consumption is reduced by 62.7%, and the static power consumption is reduced by 9.4% by adjusting the constraints, reducing the density, reducing the unit multiple and replacing the multi-threshold unit. Because of the shortage of single mode and single end angle time series analysis, this paper discusses the time series analysis of multi mode and multi end angle from three angles: the composition of end angle, the analysis mode and the analysis flow. Multi-mode multi-angle timing analysis based on multiplicative components, and optimized timing with ice tool to meet the requirements of 5. 5) the formal verification method based on multiplicative components is studied, and formality tools are used for formal verification. The problems of gating clock and scan chain are solved in the process of verification, and the verification is successful.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP332
【参考文献】
相关硕士学位论文 前1条
1 张仕红;多端角下时钟偏差一致性的分析与优化[D];国防科学技术大学;2014年
,本文编号:2083292
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2083292.html