面向多线程应用的Cache优化策略及并行模拟研究
发布时间:2019-06-24 14:55
【摘要】:片上多核处理器(Chip Multi-Processor, CMP)相对于传统的单核处理器具有复杂度小、扩展性好以及性价比高等优势,在工艺和应用等因素的推动下,CMP已经成为高性能微处理器的发展潮流。多核处理器设计复杂度和性能瓶颈大部分集中在片内存储系统上,提高缓存(Cache)命中率、避免延时较大的片外访存对系统的整体性能至关重要,因此片上层次Cache系统已成为多核处理器的研究重点之一。目前学术界对CMP的缓存优化做了很多工作,但这些工作大部分是面向多道程序的,对于多线程应用程序,已有的Cache优化技术是否能提高程序性能或者说如何提高性能,依然是开放的问题。本文的研究主要针对多核处理器的Cache性能优化及并行模拟展开,论文的贡献与创新点如下: 1.研究了分片式多核处理器的缓存优化机制。在分片式片上多核处理器中,每个分片之间的通信流量和二级Cache的容量利用率都存在不均衡的现象。针对这一现象,本文提出一种面向多线程应用程序的自适应复制策略ARP,综合私有二级Cache和共享Cache的优点,通过周期性的权衡Cache数据复制带来的收益与消耗,动态地控制数据在二级Cache之间的复制数量。实验表明,在16核的配置中,ARP机制在最好情况下能降低52%的网络流量,提高容量利用率到58%,此外在优化平均访问距离方面也有较好效果。 2.研究了面向多线程应用的基于效用的缓存优化策略。传统的缓存划分方案大多是面向多道程序的,忽略了多线程负载中共享数据和私有数据访问模式的差别,使得共享数据的使用效率降低。针对多线程程序中不同类型数据的访问特性,本文提出了一种面向多线程程序的Cache管理机制UPP,通过监控共享Cache中共享、私有数据的效用信息为每个线程以及共享数据分配Cache空间,再结合改进后的数据插入、提升策略,达到数据总体效用最大化、过滤低重用数据等目的。实验表明,UPP性能相对于基于LRU的纯共享Cache结构、基于公平的静态Cache划分结构性能的提升约为4.5%和5.2%。 3.研究了多核处理器的并行模拟技术。随着片上多核处理器(CMP)中处理器核数目及核之间互联复杂度的增加,多核处理器模拟器将变得更加庞大、复杂、缓慢。针对这一问题,本文利用多线程技术开发了一种模块化、可扩展的并行仿真模块ParaNSim,既可以作为独立的片上网络模拟器使用,也可以添加其它模块作为分片式CMP模拟器或者嵌入其它模拟器中作为一个子模块使用。实验表明,ParaNSim在4个子线程和8个子线程的配置下分别能取得1.44和2.42倍的最高加速比。
[Abstract]:Compared with the traditional single-core processor, on-chip multi-core processor (Chip Multi-Processor, CMP) has the advantages of low complexity, good expansibility and high performance-price ratio. CMP has become the development trend of high-performance microprocessor driven by process and application. The design complexity and performance bottleneck of multi-core processor are mostly concentrated on-chip storage system. Improving the hit ratio of cache (Cache) and avoiding off-chip access with large delay are very important to the overall performance of the system. Therefore, on-chip hierarchical Cache system has become one of the research priorities of multi-core processor. At present, the academic circles have done a lot of work on the cache optimization of CMP, but most of these work is oriented to multi-program. For multithreaded applications, whether the existing Cache optimization technology can improve the performance of the program or how to improve the performance is still an open question. The research in this paper is mainly aimed at the performance optimization and parallel simulation of multi-core processor Cache. The contributions and innovations of this paper are as follows: 1. The cache optimization mechanism of split multi-core processor is studied. In the sliced on-chip multi-core processor, the communication traffic between each slice and the capacity utilization of the two-level Cache are uneven. In view of this phenomenon, this paper proposes an adaptive replication strategy for multithreaded applications, ARP, which combines the advantages of private secondary Cache and shared Cache, and dynamically controls the number of replication between secondary Cache by periodically weighing the benefits and consumption of Cache data replication. The experimental results show that in the 16-core configuration, the ARP mechanism can reduce the network traffic by 52% and increase the capacity utilization to 58%. In addition, it also has a good effect in optimizing the average access distance. two銆,
本文编号:2505146
[Abstract]:Compared with the traditional single-core processor, on-chip multi-core processor (Chip Multi-Processor, CMP) has the advantages of low complexity, good expansibility and high performance-price ratio. CMP has become the development trend of high-performance microprocessor driven by process and application. The design complexity and performance bottleneck of multi-core processor are mostly concentrated on-chip storage system. Improving the hit ratio of cache (Cache) and avoiding off-chip access with large delay are very important to the overall performance of the system. Therefore, on-chip hierarchical Cache system has become one of the research priorities of multi-core processor. At present, the academic circles have done a lot of work on the cache optimization of CMP, but most of these work is oriented to multi-program. For multithreaded applications, whether the existing Cache optimization technology can improve the performance of the program or how to improve the performance is still an open question. The research in this paper is mainly aimed at the performance optimization and parallel simulation of multi-core processor Cache. The contributions and innovations of this paper are as follows: 1. The cache optimization mechanism of split multi-core processor is studied. In the sliced on-chip multi-core processor, the communication traffic between each slice and the capacity utilization of the two-level Cache are uneven. In view of this phenomenon, this paper proposes an adaptive replication strategy for multithreaded applications, ARP, which combines the advantages of private secondary Cache and shared Cache, and dynamically controls the number of replication between secondary Cache by periodically weighing the benefits and consumption of Cache data replication. The experimental results show that in the 16-core configuration, the ARP mechanism can reduce the network traffic by 52% and increase the capacity utilization to 58%. In addition, it also has a good effect in optimizing the average access distance. two銆,
本文编号:2505146
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2505146.html