片上多核处理器缓存子系统优化的研究
[Abstract]:the current slice-on-chip multi-core processor requires a high-capacity caching system to reduce the performance gap between a fast processor and a slow chip. It is considered that the performance and power consumption of the cache sub-system can be optimized by using and digging the characteristics of the multi-core processor on the chip. In this paper, the mechanism of multi-core processor cache sub-system performance on several optimization slices is studied. In particular, the research topic in this paper includes three aspects: 1) research and design efficient multicast routing algorithm to improve the performance of the network on the chip; 2) use the current new non-volatile memory to design a low-power cache system for the multi-core processor on the chip; and 3) mining the progress information of the utilization thread to design a more efficient cache coherence protocol. For the first subject of the study, we propose a high-efficiency, on-chip, network-multicast routing machine for multi-core processors with more and more cores, the on-chip network provides an efficient, scalable communication infrastructure architecture. For an on-chip network under a multi-core architecture, a large number of communication modes are common. Without the support of a valid multicast routing mechanism, conventional unicast-based on-chip networks are inefficient in handling these multicast communications This paper presents a network-based multicast routing mechanism, called DPM. DPM can effectively reduce the average transmission delay of the network packets in the network and reduce the work of the network on the chip in particular, DPM can dynamically route that route in accordance with the load balance level in the current network and the link share characteristics of the multicast communication The second subject of this paper is to use a new non-volatile memory (spin transfer moment random access memory, STT-RAM) to design low power consumption for multi-core processors on the chip The cache. STT-RAM has a fast access speed, a high storage density, and a negligible drain however, large-scale application of STT-RAM as that cache of the multi-core processor is subject to a longer write delay of the STT-RAM and high write power consumption The recent study has shown that the data retention time of a memory cell (magnetic tunnel junction MTJ) that has reduced the STT-RAM can effectively increase it Write performance. However, the STT-RAM with reduced retention time is easy to lose, and it is necessary to avoid the number by periodically refreshing its storage unit It is lost. When such STT-RAM is used for the last-level cache (LLC) of a multi-core, frequent refresh operations will also negatively impact the performance of the system while increasing energy consumption The text provides a high-efficiency refresh scheme (CCear) that minimizes the brush on this class of STT-RAM The new operation. The CCear eliminates unnecessary brush by interacting with the cache coherency protocol and the cache management algorithm New operation. Finally, we put forward an efficient consistency protocol adjustment mechanism to optimize the parallelism of the multi-core processor running on the chip The performance of the program. One of the main objectives of the multi-core processor on the chip is to continue to improve the application by digging the parallelism of the thread level the performance of a program. However, for a multi-threaded program running on this class of systems, different threads typically present different threads due to the non-uniform task assignment and the collision of the shared resource The progress of the execution of the progress. The non-uniformity of this progress is the maximum of the multi-threaded program performance One of the bottlenecks in a multi-threaded program, such as a memory barrier and a lock, and the kernel running a thread with a faster progress must stop and wait for entry a relatively slow core. Such an air, etc., will not only reduce the performance of the system, but also This paper presents a thread progress-aware consistency adjusting mechanism, called TEACA. The TEACA dynamically adjusts the consistency of each thread with the thread's progress information. The purpose of this paper is to improve the utilization efficiency of network bandwidth resources on the slice. in particular, that TEACA divide the thread into two types: leader thread and the latter thread. The TEACA then provides a specific request for its consistency request based on the thread's class information
【学位授予单位】:中国科学技术大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TP332
【共引文献】
相关期刊论文 前4条
1 刘轶;吴名瑜;王永会;钱德沛;;一种硬件事务存储系统中的事务嵌套处理方案[J];电子学报;2014年01期
2 Muhammad Abid Mughal;Hai-Xia Wang;Dong-Sheng Wang;;The Case of Using Multiple Streams in Streaming[J];International Journal of Automation and Computing;2013年06期
3 张骏;田泽;梅魁志;赵季中;;基于节点预测的直接Cache一致性协议[J];计算机学报;2014年03期
4 冯超超;张民选;李晋文;戴艺;;一种可配置双向链路的片上网络容错偏转路由器[J];计算机研究与发展;2014年02期
相关博士学位论文 前5条
1 王庆;面向嵌入式多核系统的并行程序优化技术研究[D];哈尔滨工业大学;2013年
2 朱素霞;面向多核处理器确定性重演的内存竞争记录机制研究[D];哈尔滨工业大学;2013年
3 杨兵;分簇超标量处理器关键技术研究[D];哈尔滨工业大学;2009年
4 冯超超;片上网络无缓冲路由器关键技术研究[D];国防科学技术大学;2012年
5 陈锐忠;非对称多核处理器的若干调度问题研究[D];华南理工大学;2013年
相关硕士学位论文 前5条
1 闵银皮;同构通用流多核处理器存储部件关键技术研究[D];国防科学技术大学;2012年
2 张岐;基于CMP的硬件事务存储系统优化技术研究[D];哈尔滨工程大学;2013年
3 张杰;基于CMP的共享L2Cache管理策略研究[D];哈尔滨工程大学;2013年
4 马超;徽商银行基金代销自动账户系统设计与实现[D];大连理工大学;2013年
5 王勋;面向非易失存储器PCM的节能技术研究[D];浙江工业大学;2013年
,本文编号:2412699
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2412699.html