层次化物理设计中时序预算及优化方法
发布时间:2019-03-23 17:04
【摘要】:在大规模、高频率的芯片设计中,层次化设计的方法愈来愈普遍,而各个模块的时序预算对时序的收敛有着重要的作用;随着芯片尺寸的增大,片上误差对于芯片的影响也愈发显著;在高频的设计中,时钟偏差严重制约着芯片时序收敛的速度,而数据路径上的延时成为影响时序收敛最重要的因素。本文基于40nm工艺下一款高性能多核DSP芯片YHFT-XX内核的布局布线,针对其物理设计中制约时序收敛的关键问题做了相关研究,详细阐述了使用的优化方法。层次化物理设计中,各个模块时序预算的合理程度影响着整个设计收敛的进度。本文对传统的边界最短化及依据逻辑深度的两种时序预算方法进行了一定的分析,针对其不足之处,结合内核设计的特点,提出了两种新的时序预算方法:综合考虑距离和逻辑深度的时序预算方法以及考虑时钟的时序预算方法,并提出了对应的预算公式。通过对公式的推演得出了时钟上的偏差以及公共路径对时序产生的影响。在时序预算的指导下,优化了内核的布图规划进行,使得关键路径的长度减小了19.77%。降低片上误差对芯片时序的影响越来越重要。本文通过对内核时钟结构的详细分析,结合各个子模块时钟的特点,对内核的时钟走向进行了细致的规划,使得公共路径比规划前增加了5120um;在时钟偏差方面,对于顶层复用模块,通过分类的方法简化了问题的复杂度,采用类H树的方法优化了时钟延时和偏差,将复用模块的偏差控制在15ps以内;对于边界寄存器,通过嵌入调节点将其时钟偏差减小至49ps,相对于工具自动运行的结果减小了39.5%,满足了顶层的要求;对于硬宏及门控单元,将不同模块中硬宏、门控单元的物理位置与多种时钟结构的特点相结合,对其时钟进行规划,并采取手工连线的方式优化延时,它们之间的时钟偏差均控制在10ps以内;而对于顶层分割出来的三个模块,采取动态调节的方式来平衡时钟偏差。本文通过对设计时钟的规划,解决了时钟偏差给设计带来的不利影响,为后期的时序优化提供了保障。设计中数据通路的延时是制约设计收敛进度的难点。本文分析了关键路径的特点,通过调节复位信号的起点,优化了复位信号的保持时间,使其最大违反相比于优化前减小了55.7%,总违反条数下降了68.7%,优化效果相当显著。对于超长的跨模块数据通路,通过对关键站的寄存器进行手动布局,有效引导了数据流向,优化了整个数据通路的延时。上述方法对YHFT-XX芯片设计中出现的问题效果明显,最终实现了时序收敛,目前,该芯片已经成功流片。
[Abstract]:In the large-scale, high-frequency chip design, hierarchical design method is becoming more and more popular, and the timing budget of each module plays an important role in timing convergence. With the increase of chip size, the effect of on-chip error on chip becomes more and more obvious. In the design of high frequency, clock deviation seriously restricts the speed of timing convergence, and the delay in the data path is the most important factor to influence the timing convergence. Based on the layout and routing of a high-performance multi-core DSP chip YHFT-XX kernel in 40nm process, the key problems restricting timing convergence in its physical design are studied in this paper, and the optimization methods used are described in detail. In hierarchical physical design, the timing budget of each module affects the convergence of the whole design. In this paper, the traditional boundary minimization and logical depth based on the two sequential budget methods are analyzed, aiming at its shortcomings, combined with the characteristics of kernel design, In this paper, two new timing budget methods are proposed: one is timing budget method considering distance and logical depth, the other is timing budget method considering clock, and the corresponding budget formula is put forward. The deviation on the clock and the effect of the common path on the timing are obtained through the derivation of the formula. Under the guidance of timing budget, the kernel layout planning is optimized and the critical path length is reduced by 19.77%. It is more and more important to reduce the influence of on-chip error on chip timing. Through the detailed analysis of the clock structure of the kernel, combined with the characteristics of each sub-module clock, the clock trend of the kernel is carefully planned, which makes the common path increase 5120umm compared with the pre-planning. In the aspect of clock deviation, for the top-level multiplexing module, the complexity of the problem is simplified by the classification method. The H-tree-like method is used to optimize the clock delay and deviation, and the deviation of the multiplexing module is controlled within the 15ps. For the boundary register, the clock deviation is reduced to 49 PS by embedding the adjustment node, and the result of automatic operation of the tool is reduced by 39.5%, which meets the requirement of the top level. For the hard macro and gate control unit, the physical position of the hard macro and gate control unit in different modules is combined with the characteristics of various clock structures, the clock is planned, and the delay is optimized by means of manual connection. The clock deviation between them is controlled within 10ps; For the top three modules, dynamic adjustment is adopted to balance the clock deviation. In this paper, the bad influence of clock deviation on the design is solved by planning the design clock, which provides a guarantee for the timing optimization in the later period. The delay of the data path in the design is a difficult point to restrict the convergence progress of the design. This paper analyzes the characteristics of the critical path. By adjusting the starting point of the reset signal, the holding time of the reset signal is optimized, so that the maximum violation of the reset signal is reduced by 55.7% compared with that before the optimization, and the total number of violations is decreased by 68.7%. The optimization effect is quite remarkable. For the ultra-long cross-module data path, the data flow is effectively guided and the delay of the whole data path is optimized by manually arranging the registers of the key station. The above-mentioned method has obvious effect on the design of YHFT-XX chip, and finally achieves timing convergence. At present, the chip has been successfully flowed.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TN402
本文编号:2446053
[Abstract]:In the large-scale, high-frequency chip design, hierarchical design method is becoming more and more popular, and the timing budget of each module plays an important role in timing convergence. With the increase of chip size, the effect of on-chip error on chip becomes more and more obvious. In the design of high frequency, clock deviation seriously restricts the speed of timing convergence, and the delay in the data path is the most important factor to influence the timing convergence. Based on the layout and routing of a high-performance multi-core DSP chip YHFT-XX kernel in 40nm process, the key problems restricting timing convergence in its physical design are studied in this paper, and the optimization methods used are described in detail. In hierarchical physical design, the timing budget of each module affects the convergence of the whole design. In this paper, the traditional boundary minimization and logical depth based on the two sequential budget methods are analyzed, aiming at its shortcomings, combined with the characteristics of kernel design, In this paper, two new timing budget methods are proposed: one is timing budget method considering distance and logical depth, the other is timing budget method considering clock, and the corresponding budget formula is put forward. The deviation on the clock and the effect of the common path on the timing are obtained through the derivation of the formula. Under the guidance of timing budget, the kernel layout planning is optimized and the critical path length is reduced by 19.77%. It is more and more important to reduce the influence of on-chip error on chip timing. Through the detailed analysis of the clock structure of the kernel, combined with the characteristics of each sub-module clock, the clock trend of the kernel is carefully planned, which makes the common path increase 5120umm compared with the pre-planning. In the aspect of clock deviation, for the top-level multiplexing module, the complexity of the problem is simplified by the classification method. The H-tree-like method is used to optimize the clock delay and deviation, and the deviation of the multiplexing module is controlled within the 15ps. For the boundary register, the clock deviation is reduced to 49 PS by embedding the adjustment node, and the result of automatic operation of the tool is reduced by 39.5%, which meets the requirement of the top level. For the hard macro and gate control unit, the physical position of the hard macro and gate control unit in different modules is combined with the characteristics of various clock structures, the clock is planned, and the delay is optimized by means of manual connection. The clock deviation between them is controlled within 10ps; For the top three modules, dynamic adjustment is adopted to balance the clock deviation. In this paper, the bad influence of clock deviation on the design is solved by planning the design clock, which provides a guarantee for the timing optimization in the later period. The delay of the data path in the design is a difficult point to restrict the convergence progress of the design. This paper analyzes the characteristics of the critical path. By adjusting the starting point of the reset signal, the holding time of the reset signal is optimized, so that the maximum violation of the reset signal is reduced by 55.7% compared with that before the optimization, and the total number of violations is decreased by 68.7%. The optimization effect is quite remarkable. For the ultra-long cross-module data path, the data flow is effectively guided and the delay of the whole data path is optimized by manually arranging the registers of the key station. The above-mentioned method has obvious effect on the design of YHFT-XX chip, and finally achieves timing convergence. At present, the chip has been successfully flowed.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TN402
【参考文献】
相关期刊论文 前5条
1 徐毅;陈书明;刘祥远;;Hierarchical distribution network for low skew and high variation-tolerant bufferless resonant clocking[J];半导体学报;2011年09期
2 柯烈金;吴秀龙;徐太龙;;时钟树性能的研究及改进方法[J];电脑知识与技术;2011年16期
3 顾琴;林正浩;;用Encounter实现Mesh-Local-Tree结构的时钟设计流程[J];半导体技术;2008年07期
4 刘德启;胡忠;;深亚微米SOC芯片分层设计方法[J];半导体技术;2007年04期
5 刘毅,赵萌,洪先龙,蔡懿慈;一种基于结群的零偏差时钟布线算法[J];计算机辅助设计与图形学学报;2002年02期
,本文编号:2446053
本文链接:https://www.wllwen.com/kejilunwen/dianzigongchenglunwen/2446053.html