微处理器温度感知的任务调度算法研究

发布时间：2018-09-07 06:46

【摘要】：随着集成电路特征尺寸的不断缩小以及集成度的不断提高，处理器的功耗密度和温度持续上升。过高的温度不仅降低了芯片的可靠性，同时也对处理器的性能产生了很大的影响，温度问题已经成为限制微处理器持续发展的重要因素。而一些新兴的技术（如单芯片多处理器、3D堆叠技术等）则进一步增加了芯片的功耗密度，从而使得温度问题变得更加严峻。芯片的温度在时间和空间上的巨大变化也给处理器的封装和冷却带来了巨大挑战，传统的散热方法已经越来越不能满足当前高性能微处理器对散热的要求。为了解决这一问题，目前普遍采用基于硬件的动态热管理技术（Hardware DynamicThermal Management，简称HW DTM）来限制处理器的温度，即一旦硬件检测到处理器的温度达到某一预定的阈值，就启动相应的温度管理机制（比如降频、降压、断电等）来降低处理器的温度，保护处理器不被损坏。但是，HW DTM技术会增加程序的运行时间，降低系统的吞吐率，最终降低处理器的性能。本文的目标就是要尽可能地消除不必要的HW DTM触发事件，保护芯片的性能不受温度管理机制的影响。本文系统地研究了基于温度感知设计的一系列关键技术，针对已有工作的不足，提出了三种操作系统级的温度感知任务调度技术来避免HW DTM。本文主要的工作和创新点包括：第一，全面深入地分析了微处理器温度感知设计技术，从不同层次、不同角度对现有的温度感知设计技术进行了总结，分析了这些技术的优缺点。第二，进行温度感知任务调度需要在线进行功耗与温度计算，本文总结了现有的功耗估算模型和温度模型，对这些模型的特点进行了分析，讨论了功耗获取和温度计算的方法。第三，，针对单核处理器，提出了一种贪婪调度算法GSA，在处理器温度不超过阈值温度的前提下让热的任务先运行，充分利用处理器的时间“温度余量”，将处理器较早提升到一个温度较高的状态，使处理器能更快地耗散热量，从而减少DTM触发次数，提升系统的性能。实验结果表明，GSA与基准调度器相比，在低、中、高温环境下的SPEC2K负载以及中温环境下的非SPEC负载，DTM分别可以降低9.9%~82%（平均47.1%）、8.8%~73.8%（平均41.1%）、2.9%~58.7%（平均31.7%）和5.9%~45.5（平均31%），性能可以分别提升4.2%、5.2%、4.7%和3.7%；与随机算法、优先权算法、MinTemp+算法以及TreshHot算法相比，GSA均有不同程度的性能提升。第四，针对2D CMP，提出了TSTB调度算法。CMP的自然分簇结构为温度管理提供了新的途径，调度一个热的任务到一个冷的核上比调度一个热的任务到热的核上，具有更低的峰值温度。TSTB利用CMP在时间和空间上的温度变化，通过改变冷热任务的执行顺序来挖掘CMP每个核内的时间“温度余量”，并将冷热任务安排到温度适合的核上运行来挖掘空间“温度余量”，减少热紧急事件，提升系统性能。实验结果表明，相比于基准调度算法，TSTB算法使TET（ThermalEmergency Time）降低了8.3%~91.5%，平均降低了48.3%，使性能提升了2.47%~6.58%（平均4.62%）；TSTB相比于随机调度算法、轮转调度算法、平衡算法以及ThresHot算法等，均有不同程度的性能提升。第五，针对3D CMP，提出了HTBS调度算法。分析了3D CMP的温度特性，将垂直堆叠的核当作一个核堆，让任务的功耗在核堆之间进行平衡，同时将热的任务放在离热沉近的核上运行，以加速散热。当某个核出现过热时，对核堆中功耗最密集的处理器核进行DTM，使温度迅速降低。实验结果表明，相比于基准调度算法，HTBS算法使TET降低了8.4%~96.2%，平均降低了54.7%，获得了5.99%的性能提升；HTBS算法相比于随机调度、轮转调度、核间温度平衡调度以及堆间温度平衡调度算法，均有不同程度的性能提升。
[Abstract]:With the shrinking of IC feature size and the continuous improvement of IC integration, the power density and temperature of processors continue to rise. Excessive temperature not only reduces the reliability of the chip, but also has a great impact on the performance of processors. Temperature problem has become an important factor limiting the sustainable development of microprocessors. Some emerging technologies, such as single-chip multiprocessor, 3D stacking technology, further increase the chip power density, thus making the temperature problem more serious.
The tremendous change of chip temperature in time and space also brings tremendous challenges to processor packaging and cooling. Traditional heat dissipation methods can not meet the current requirements of high performance microprocessors. Rmal management (HW DTM) is used to limit the processor's temperature, that is, once the hardware detects that the processor's temperature reaches a predetermined threshold, it starts the corresponding temperature management mechanism (such as frequency reduction, voltage reduction, power off, etc.) to reduce the processor's temperature and protect the processor from damage. However, HW DTM technology will increase the running time of the program. It reduces the throughput of the system and ultimately reduces the performance of the processor.
The goal of this paper is to eliminate unnecessary HW DTM trigger events as much as possible and to protect the performance of the chip from the influence of temperature management mechanism. Free HW DTM.. The main work and innovations in this paper include:
Firstly, the temperature sensing Design Technology of microprocessor is analyzed comprehensively and thoroughly. The existing temperature sensing design technology is summarized from different levels and angles, and the advantages and disadvantages of these technologies are analyzed.
Secondly, on-line power consumption and temperature calculation are needed for temperature-aware task scheduling. This paper summarizes the existing power estimation models and temperature models, analyzes the characteristics of these models, and discusses the methods of power consumption acquisition and temperature calculation.
Thirdly, a greedy scheduling algorithm GSA is proposed for single-core processors, which makes hot tasks run first when the processor temperature does not exceed the threshold temperature, and makes full use of the processor's time "temperature margin" to raise the processor to a higher temperature state earlier so that the processor can consume heat more quickly, thereby reducing DTM. The experimental results show that compared with the benchmark scheduler, GSA can reduce SPC2K load in low, medium, high temperature environment and non-SPEC load in medium temperature environment by 9.9% ~ 82% (average 47.1%), 8.8% ~ 73.8% (average 41.1%), 2.9% ~ 58.7% (average 31.7%) and 5.9% ~ 45.5% (average 31%) respectively. 4.2%, 5.2%, 4.7% and 3.7%. Compared with random algorithm, priority algorithm, MinTemp + algorithm and TreshHot algorithm, GSA has different performance improvements.
Fourthly, a TSTB scheduling algorithm is proposed for 2D CMP. The natural clustering structure of CMP provides a new way for temperature management, dispatching a hot task to a cold core has a lower peak temperature than dispatching a hot task to a hot core. TSTB uses the temperature variation of CMP in time and space to change the cold and hot tasks. The execution sequence is used to mine the time "temperature margin" in each core of CMP, and the cold and hot tasks are arranged to run on the core with suitable temperature to mine the space "temperature margin" to reduce thermal emergencies and improve system performance. The experimental results show that compared with the benchmark scheduling algorithm, the TSTB algorithm reduces the TET (Thermal Emergency Time) by 8.3%~91%. 5%, reduced by 48.3% on average, and improved the performance by 2.47%~6.58% (average 4.62%). Compared with random scheduling algorithm, rotation scheduling algorithm, balance algorithm and ThresHot algorithm, TSTB has improved the performance to varying degrees.
Fifthly, an HTBS scheduling algorithm is proposed for 3D CMP. The temperature characteristics of 3D CMP are analyzed. Vertical stacked cores are regarded as a nuclear reactor to balance the power consumption of the tasks between the nuclear reactors, and the hot tasks are placed on the cores near the heat sink to speed up the heat dissipation. The experimental results show that the HTBS algorithm reduces TET by 8.4%~96.2% and decreases TET by 54.7% compared with the benchmark scheduling algorithm, and achieves 5.99% performance improvement. Performance improvement.
【学位授予单位】：国防科学技术大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】