CMP上结合bank一致性技术的NUCA任意步长数据提升技术

发布时间：2019-03-11 16:03

【摘要】：目前,计算机已经成为人们生活和工作必不可少的工具,在使用中,人们对计算机的要求也越来越高,希望计算机能拥有更高的处理速度、更大的存储能力、更方便友好的使用方法等等。为了提高处理器的速度,制造商不断的提高处理器的主频,但是随之而来的是更大的功耗,成为了处理器速度提高的瓶颈。在这种情况下,片上多核处理器CMP (Chip Multi-Processor)随之诞生,它将多个处理器内核集成在一个处理器芯片上以此来提高计算能力。CMP已经成为市场的主流,对CMP处理芯片的研究也成为了必要。同时,集成电路制造工艺迅速发展起来,片上cache的容量被制造的越来越大,但随着cache体积的增大,大容量片上cache的线延迟也随着变长,越来越长的线延迟对CPU的处理速度产生了很大的影响。因此,Kim C等人提出非一致性cache (NUCA),它允许cache的不同的bank具有不同的访问延迟,从而比之从前的一致性cache (UCA)具有更小的平均访问延迟。在动态非一致cache (DNUCA)中,cache支持cache line(即数据块)的迁移,即可以将被命中的数据向距离访问处理器更近的bank中移动,从而减少CPU再次访问同一个数据时的访问延迟。这种数据在cache中的移动就叫做数据提升或块迁移。数据提升需要找到目标bank来存放要提升的数据,但是目前的一些数据提升技术不考虑目标bank的实际状态,并且采用的固定的提升步长,在数据提升的过程中,可能将目标bank中更有用的数据替换出cache,或替换到离CPU更远的bank中,产生cache污染问题,使得数据提升不能达到良好的作用。在CMP结构的基础上对提升技术进行改进的同时,我们还要考虑一个重要的问题,就是共享数据的问题。多个核在一个芯片上,共享某一个L2级或L3级的cache,一定会有同时访问某个共享数据的情况产生。但数据提升技术就是要将当前CPU访问的数据提升到离自己更近的bank中,来达到下次访问同一个数据的时候能更快的访问到。那么当多个CPU访问同一个共享数据的时候,就会出现共享数据被“拉”到NUCA的中间部分中,从而限制了数据提升技术带来的优势。因此,这里在提升技术的改进中,结合了bank一致性技术,就是允许共享数据在NUCA中拥有多个副本,每个副本属于不同的CPU,再通过bank一致性技术来维护NUCA中不同副本数据的一致,从而解决数据的竞争所带来的问题,提高了CPU访问共享数据的速度。维护数据一致性需要记录数据的不同状态,而本文提出的数据提升策略则刚好利用cache line的不同状态来选择将要迁移到的目标bank,从而提出了一种CMP上的结合了bank一致性的任意步长数据提升技术。本文首先对研究背景和相关的技术进行了简单的介绍,又介绍了几种系统结构研究方面的几种基本的仿真工具,并详细介绍了本文所用的仿真工具Simics.然后,对现有的固定步长的数据提升技术及其问题进行了介绍,介绍了本文结合的bank一致性技术。之后,详细地描述了本文所提出的CMP上的结合了bank一致性的任意步长数据提升技术。最后,利用全系统仿真,使用NAS Parallel Benchmark (NPB)基准测试程序,对该技术进行了测试,并且得到了理想的试验结果。该技术能有效降低处理器访问共享cache的访问延迟,相比Kim C等人提出的设计使IPC平均提高了8.19%,减少了提升发生的次数,改善了系统性能。
[Abstract]:At present, the computer has become an indispensable tool for people's life and work. In use, people's demands on the computer are getting higher and higher, and it is hoped that the computer can have higher processing speed, more storage capacity, more convenient and friendly use method and so on. In order to improve the speed of the processor, the manufacturer keeps increasing the processor's main frequency, but it comes with more power consumption and becomes the bottleneck of the processor's speed. In this case, on-chip multi-core processor CMP (Chip Multi-Processor) is born, which integrates multiple processor cores on a processor chip to improve computing power. CMP has become the mainstream of the market, and the research of the CMP processing chip is also necessary. At the same time, the manufacturing process of the integrated circuit is rapidly developed, and the capacity of the on-chip cache is more and more large, but with the increase of the cache volume, the line delay of the high-capacity on-chip cache also increases with the increase of the cache volume, and the increasing line delay has a great effect on the processing speed of the CPU. In response, Kim C et al. proposed a non-consistent cache (NUCA), which allows different banks of cache to have different access delays, thus having a smaller average access delay than the previous consistency cache (UCA) late. In dynamic non-consistent cache (DNUCA), cache supports the migration of cache line (i.e., data block), that is, the hit data can be moved to the bank closer to the access processor, thereby reducing the follow-up of the CPU when the same data is accessed again by the CPU Ask for a delay. The movement of this kind of data in cache is called data promotion or Block migration. The data upgrade requires the target bank to be found to store the data to be upgraded, but some of the current data-lifting techniques do not take into account the actual state of the target bank, and the fixed lift steps used are likely to replace the more useful data in the target bank during the data upgrade cache, or replace to a bank farther from the CPU, cause cache pollution problems, so that data enhancement cannot be reached Good effect. On the basis of the structure of the CMP, we need to consider an important issue, namely, the improvement of the lifting technology, that is, The problem of sharing data. Multiple cores on a single chip share a cache of an L2 or L3 level and will have access to a share at the same time Data is the case. But the data-raising technology is to raise the data accessed by the current CPU to the bank that is closer to its own, to reach the same data next time faster access to. Then, when multiple CPUs access the same shared data, the shared data is "Lula" into the middle of the NUCA, thereby limiting the data promotion The benefits of technology. So, in the improvement of the upgrade technology, the bank consistency technology is combined to allow shared data to have multiple copies in the NUCA, each of which belongs to a different CPU, and is maintained in the NUCA by the bank consistency technology The consistency of the data of the different copies, thus solving the problems caused by the competition of the data, and improving the CPU. The speed of the access to the shared data. The consistency of the maintenance data needs to record the different states of the data, and the data promotion strategy proposed in this paper just uses the different states of the cache line to select the target bank to be migrated, so that the consistency of bank is proposed. In this paper, a brief introduction to the research background and the related technologies is given, and several basic simulation tools for the research of the system structure are introduced, and the paper is introduced in detail. Simics, a simulation tool, is introduced and the existing fixed step size data lifting technology and its problems are introduced in this paper. After the combined bank consistency, the combined bank-one on the CMP is described in detail. And finally, using the whole system simulation, the NAS Parallel Benchmark (NPB) benchmark test program is used to carry out the technology. The technology can effectively reduce the access delay of the access shared cache by the processor. Compared with the design made by Kim C and the like, the average of the IPC is increased by 8.19%, and the result is reduced.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP332

【参考文献】