面向大数据处理的多核处理器Cache一致性协议

发布时间：2018-04-09 11:17

本文选题：大数据　切入点：MERSI协议　出处：《国防科学技术大学》2014年硕士论文

【摘要】：高新技术飞速发展,产生的数据量正以人们无法预计的方式不断的增加,因此面向大数据处理的微处理器需要快速处理大量不同类型的数据。大数据的价值密度低,所以处理器内核间需要交互的数据量变少,如果继续采用面向科学计算的多核处理器Cache一致性协议,将加大整个系统的压力,降低处理器频率。本文根据大数据的特征,设计了面向大数据处理的多核处理器Cache一致性协议——MERSI协议。本文主要工作如下:(1)针对多个共享副本同时应答所带来的开销,在MERSI协议中,当系统有多个共享副本时,只有唯一一个副本为共享应答状态,其余副本状态为“S”状态。当远程处理器对该数据进行操作时,由共享应答状态副本进行应答操作,“S”状态副本不应答。这样可以避免多个共享副本同时对远程操作进行应答带来的系统开销。(2)针对Cache负载不均衡问题,在MERSI协议中,共享应答状态副本在进行应答操作之后变为其余相应的状态,同时请求者副本的状态变为共享应答状态。下一次操作由变为共享应答状态的副本进行应答操作,这样就避免了一个Cache过于忙碌而其余Cache产生饥饿。(3)针对Cache乒乓效应的开销,在MERSI协议中采用写作废和写更新的混合策略。系统中只有两个共享副本时采用写更新策略,系统中有两个以上共享副本时采用写作废策略。采用混合写策略可以大大的减少乒乓效应带来的系统开销。(4)在性能测试部分,选取了SPLASH-2并行程序测试集中LU、Ocean、Radix、FFT和Water五个测试程序来完成多核处理器Cache一致性协议的性能测试。首先对目录结构的组织方式对系统性能的影响和Cache块大小对系统性能的影响进行了测试。然后对处理器内核数以及网络拓扑结构对系统的影响进行了测试与分析。最后将MERSI和MESI协议进行性能对比,MERSI比MESI协议性能平均提升了3.58%;在L1Cache的失效率方面,MERSI比MESI协议平均降低了3.18%。基本达到了MERSI协议的设计目标。
[Abstract]:With the rapid development of high and new technology, the amount of data generated is increasing in an unpredictable way, so the microprocessor for big data needs to deal with a large number of different types of data quickly.Big data's value density is low, so the amount of data needed to interact between processor cores becomes smaller. If we continue to adopt the multi-core processor Cache consistency protocol for scientific computing, it will increase the pressure on the whole system and reduce the processor frequency.According to big data's characteristics, a multi-core processor Cache conformance protocol, Mersi protocol, is designed.The main work of this paper is as follows: (1) for the overhead of multiple shared replicas simultaneously, in MERSI protocol, only one replica is a shared reply state and the other replicas are "S" state when the system has multiple shared replicas.When the remote processor operates on the data, the shared reply state copy responds, and the "S" state copy does not.This can avoid the overhead of multiple shared replicas responding to remote operations simultaneously. 2) aiming at the problem of Cache load imbalance, in the MERSI protocol, the shared reply state replicas become the remaining corresponding states after the response operation.At the same time, the state of the requestor copy becomes a shared reply state.The next operation changes from a copy of the shared reply state to a reply operation, which avoids a Cache being too busy while the rest of the Cache is hungry.) in view of the cost of the ping-pong effect of Cache, a hybrid strategy of writing scrap and writing update is adopted in the MERSI protocol.When there are only two shared replicas in the system, the write update strategy is adopted, and the writing scrap strategy is used when there are more than two shared replicas in the system.In the part of performance testing, five SPLASH-2 parallel program test programs, LU / Oceanum Radix FFT and Water, are selected to test the performance of multi-core processor Cache conformance protocol.Firstly, the influence of directory structure on system performance and the influence of Cache block size on system performance are tested.Then, the influence of processor kernel number and network topology on the system is tested and analyzed.Finally, the performance of MERSI and MESI is compared with that of MESI. The average performance of MERSI is 3.58% higher than that of MESI, and the average reduction of MERSI is 3.18% lower than that of MESI in the aspect of L1Cache failure rate.The design goal of MERSI protocol is basically achieved.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP332

【参考文献】