65nm工艺YHFT-DX二级Cache的物理设计
发布时间:2018-03-17 23:07
本文选题:二级Cache 切入点:物理设计 出处:《国防科学技术大学》2012年硕士论文 论文类型:学位论文
【摘要】:YHFT-DX是在65nm工艺下设计的一款高性能DSP(Digital Signal Processor)芯片,,要求在最差工艺条件下达到800MHz的设计目标。作为芯片存储通路的中心枢纽,二级Cache的设计至关重要。本文研究了YHFT-DX初样芯片和正样芯片中二级Cache的物理设计优化技术。主要内容分为如下几点: 1)二级Cache采用分体结构,将1MB的存储体分成16个Bank体,每个Bank体由4个SRAM_CELL基本模块构成。研究了SRAM_CELL模块的电路设计,优化了其布局结构、布线方法以及译码电路,最后合理规划了该模块的版图布局。 2)研究了初样芯片中宏模块Bank体的物理设计,使用了原地优化等多种方法进行时序收敛,而且采用了有用偏差技术优化了长互连路径的时序。研究了全芯片中Bank体的布局结构,分析了不同布局结构的关键路径和性能上的优缺点,最终确定了初样芯片中Bank体的倒U型布局结构。 3)对LRU(Least Recent Use)模块进行全定制设计,使用了带复位端的13T存储单元,并在面积、性能、噪声容限和功耗等各个方面与其他存储单元进行了比较分析。设计了读写操作电路,其中读操作电路全部采用组合逻辑实现。模拟了延时和功耗结果,全定制设计将关键路径延时减少218ps,时序性能提高了29%,消除了芯片中与LRU相关的时序违反。 4)使用层次化设计方法进行正样芯片二级Cache的物理设计。重新调整了Bank体的布局结构,根据与全芯片中其他模块的互连关系优化了I/O端口的位置,改善了跨模块路径80的时序。设计了电源网络,在保证供电充足的同时将电压降控制在3%以内。时钟树综合采用了平衡树的二叉树拓扑结构,并且使用了多种方法来优化时钟偏差和噪声。分析了串扰对信号延迟的影响,对串扰预防和修复的方法进行了研究,有效提高了设计的抗噪声能力。 5)在正样芯片中重组了存储体的层次结构,将两个Bank体合并为单个Bank2模块,规划了远端存储单元的长互连,并采用了双倍线宽双倍间距的布线规则,改善关键路径延时30,有效解决了二级Cache中因长互连线引起的时序违反。 最终,与初样芯片相比,正样芯片的二级Cache延时减少90,性能提高6.7%,在65nm最差工艺条件下达到了800MHz的设计目标。
[Abstract]:YHFT-DX is a high performance DSP(Digital Signal processor chip designed in 65nm process, which requires the design target of 800MHz in the worst process conditions. The design of two-stage Cache is very important. In this paper, the physical design optimization technology of two-stage Cache in YHFT-DX chips and regular chips is studied. The main contents are as follows:. 1) the two-stage Cache adopts a split-body structure and divides the 1MB storage into 16 Bank bodies. Each Bank is composed of four SRAM_CELL basic modules. The circuit design of the SRAM_CELL module is studied, and its layout, routing method and decoding circuit are optimized. Finally, the layout of the module is reasonably planned. 2) the physical design of macro module Bank in the original sample chip is studied, and several methods such as in situ optimization are used for timing convergence, and the useful deviation technique is used to optimize the timing of long interconnect path. The layout structure of Bank volume in the whole chip is studied. The key paths and performance advantages and disadvantages of different layout structures are analyzed. Finally, the inverted U-shaped layout of Bank in the initial chip is determined. 3) the LRU(Least Recent use module is fully customized, the 13T memory cell with reset end is used, and compared with other memory cells in area, performance, noise tolerance and power consumption, the read-write operation circuit is designed. All the read operation circuits are implemented by combinatorial logic. The results of delay and power consumption are simulated. The fully customized design reduces critical path delay by 218ps. the timing performance is improved by 29 and the timing violation associated with LRU in the chip is eliminated. 4) physical design of normal chip two-level Cache is carried out by using hierarchical design method. The layout structure of Bank is adjusted, and the position of I / O port is optimized according to the interconnection relationship with other modules in the whole chip. The time sequence of cross-module path 80 is improved. The power supply network is designed, and the voltage drop is controlled within 3% while the power supply is sufficient. The clock tree synthesizes the binary tree topology of the balance tree. Several methods are used to optimize clock bias and noise. The effects of crosstalk on signal delay are analyzed, and the methods of crosstalk prevention and repair are studied, which can effectively improve the anti-noise ability of the design. 5) the hierarchical structure of the memory is reorganized in the normal chip, the two Bank bodies are merged into a single Bank2 module, the long interconnection of the remote memory cells is planned, and the routing rules of double linewidth and double spacing are adopted. By improving critical path delay 30, the timing violation caused by long interconnect in two stage Cache is effectively solved. Finally, compared with the original chip, the two-stage Cache delay of the standard chip is reduced by 90, the performance is improved by 6.7, and the design target of 800MHz is achieved under the worst process condition of 65nm.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333
【参考文献】
相关期刊论文 前3条
1 冯科才;SRAM在激烈竞争中开拓新市场[J];电子产品世界;1996年10期
2 叶菁华,陈一辉,郭淦,洪志良;一种512Kbit同步高速SRAM的设计[J];固体电子学研究与进展;2004年03期
3 赵继业;杨旭;;纳米级工艺对物理设计的影响[J];中国集成电路;2008年08期
本文编号:1626891
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1626891.html