基于脉冲锁存器的关键路径优化

发布时间：2018-06-29 14:22

本文选题：时序收敛 + 关键路径　；参考：《国防科学技术大学》2015年硕士论文

【摘要】：集成电路的快速发展,使得芯片的集成度更高、性能更加优异,而芯片的时序收敛却变得更加困难。本文以YHFT-DX芯片中L_unit部件的物理设计为研究对象,研究了如何有效优化关键路径,达到时序收敛的目的,减少芯片的上市时间(time-to-market)。YHFT-DX芯片采用40nm工艺设计,要求在最差工作条件(worst case)下时钟频率达到1GHz。L_unit部件作为YHFT-DX芯片的重要部件之一,结构设计相对复杂,在经过多次迭代优化设计后,仍存在一些时序违反的关键路径,为快速消除这些关键路径,本文采用延时更低的脉冲锁存器来替换这些路径上的寄存器。本文首先分析标准单元库中的寄存器的版图结构,再根据脉冲锁存器的基本原理,采用全定制设计流程设计出了实验所需的脉冲锁存器。在通过后仿真验证后,与同功能的标准单元对比,脉冲锁存器的延时减小51.9%。然后,根据脉冲锁存器的结构特点将一位宽脉冲锁存器成组实现了水平结构的多位宽脉冲锁存器,并从减小电压降(IR-drop)的角度分析,设计出了垂直结构的多位宽脉冲锁存器。这两种结构的多位宽脉冲锁存器在获得延时减小的同时,与同功能的标准单元对比,三、四、五位宽脉冲锁存器的单位功耗和单位面积都优于标准单元。并通过实验证明了多位宽脉冲锁存器在输出负载为30飞法(FF)、分别使用M3~M7作为互连线时,30um的互连线延时为2ps左右。根据造成时序违反的因素,提出了采用全定制设计的一位宽脉冲锁存器替换关键路径上寄存器的优化方案及算法,并将优化算法转化成自动处理脚本。同时,分析了在不同阶段进行替换的优点,最终实验结果表明:在布局(place)阶段进行替换优化,以寄存器到寄存器(Reg2reg)路径的时序违反数量(Violating Paths)和最差时序违反(WNS)为衡量标准,关键路径减少了99.45%,整体电路时序性能提升12%左右。最后,根据多位宽脉冲锁存器的优势和特点,提出了采用多位宽脉冲锁存器优化关键路径的方案,依托前两章实验所获得的结果,解决了方案中出现的问题。并同样将优化方案转化成优化算法和自动处理脚本,大大提高了算法的实用性和工作效率。实验证明,在边长为30um的矩形内,使用水平结构的三位宽脉冲锁存器,在place阶段替换该区域内关键路径上的寄存器,所获得的效果最好。以Reg2reg路径的Violating Paths和WNS为衡量标准,关键路径减少了99%,整体电路时序性能提升11.4%,整体功耗降低2.5%,芯片密度降低4.4%。经多次实验证明,采用脉冲锁存器能够有效的优化关键路径,加速芯片的时序收敛,并能在一定程度上降低芯片的整体功耗和密度。
[Abstract]:With the rapid development of integrated circuits, the integration and performance of the chips become higher and better, but the timing convergence of the chips becomes more difficult. In this paper, the physical design of the LHFT-DX part in YHFT-DX chip is taken as the research object, and how to optimize the critical path effectively, to achieve the purpose of timing convergence, and to reduce the time to market (time-to-market) of the chip. YHFT-DX chip is designed by 40nm process. As one of the most important components of YHFT-DX chip, the clock frequency of 1GHz 路L\ + (worst case) is required under the worst working conditions. The structure design is relatively complex. After several iterations, there are still some critical paths of timing violation. In order to eliminate these critical paths quickly, a lower delay pulse latch is used to replace the registers on these paths. In this paper, the layout of registers in the standard cell library is analyzed, and then, according to the basic principle of pulse latch, the pulse latch is designed according to the whole custom design flow. The delay of the pulse latch is reduced by 51.9 compared with the standard unit of the same function after the post-simulation verification. Then, according to the structural characteristics of the pulse latch, a multi-bit wide pulse latch with horizontal structure is implemented in groups, and the vertical multi-bit wide pulse latch is designed from the angle of reducing the voltage drop. Compared with the standard cells of the same function, the three, four, and five bits wide pulse latch have better unit power consumption and unit area than the standard cells. It is proved by experiments that the interconnection delay of M3M7 is about 2ps when the output load is 30 flight (FF) and M3M7 is used as the interconnect line respectively. According to the causes of timing violation, an optimized scheme and algorithm for replacing registers in critical path by a fully customized design of a wide pulse latch is proposed, and the optimization algorithm is transformed into an automatic processing script. At the same time, the advantages of substitution in different stages are analyzed. The final experimental results show that the replacement optimization is carried out in the layout (place) phase, and the measurements are based on the number of sequential violations and the worst sequential violations in the register to register path. The critical path is reduced by 99.45, and the timing performance of the whole circuit is improved by about 12%. Finally, according to the advantages and characteristics of the multi-bit wide pulse latch, a scheme of using multi-bit wide pulse latch to optimize the critical path is proposed. Based on the experimental results obtained in the previous two chapters, the problems in the scheme are solved. The optimization scheme is also transformed into optimization algorithm and automatic processing script, which greatly improves the practicability and working efficiency of the algorithm. The experimental results show that in the rectangle with the length of 30um, the use of a horizontal three-bit wide pulse latch to replace the registers on the critical path in the place stage is the best. By using Reg2reg path's Violating paths and WNS as the standard, the critical path is reduced by 99 percent, the timing performance of the whole circuit is improved by 11.4g, the overall power consumption is reduced by 2.5 and the chip density is reduced by 4.4. It has been proved by many experiments that the pulse latch can effectively optimize the critical path, accelerate the timing convergence of the chip, and reduce the overall power consumption and density of the chip to a certain extent.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TN40

【相似文献】