面向动态异构众核处理器的任务调度研究

发布时间：2018-07-21 21:18

【摘要】：片上高效能计算的需求和芯片制造工艺偏差的增大共同驱动着多核处理器进入异构时代。性能异构多核处理器结构的基本设计思想是在芯片上放置不同粒度的处理器核,在使用乱序超标量大核开发串行代码性能的同时,使用大量结构简单的小核开发线程级并行性。本质上,性能异构多核处理器只有当芯片上处理器核的配置与任务负载的并行特征匹配时,才能有效提高计算效率。但是,任务负载的并行特征和资源需求是动态变化的,这就要求异构处理器结构必须具备根据负载特征动态调整片上计算资源配置的能力。为此,近年来学术界进一步提出了动态异构众核处理器(Dynamic Heterogeneous Chip Multiprocessor, DHCMP)结构：它在芯片上放置大量同构的基本核,同时在微结构上支持将若干个基本核组合成单个逻辑处理器核(简称逻辑核),从而允许系统软件在运行时动态地将片上计算资源(即基本核)按需配置成多个性能异构的逻辑核。但是,动态异构处理器本身只提供逻辑核重配置的能力,是否能够准确判断系统负载的并行特征和资源需求、并合理地配置DHCMP计算资源以达到高效能计算,任务调度程序则扮演着决定性的角色。然而,面向动态异构众核处理器的相关任务调度研究还远未展开。本文的研究旨在搭建一个能够有效支持DHCMP逻辑核快速调整的任务调度框架,同时研究能够有效使用DHCMP动态异构特性开发高效能计算的逻辑核资源分配算法、以及能够在DHCMP上提供基于任务优先级公平性的进程调度算法。本文的研究工作和成果主要包括以下四个方面： 1.研究了面向动态异构处理器的硬件/操作系统接口,向操作系统呈现了一个简洁通用的逻辑核抽象；将动态异构处理器的逻辑核重配置操作归纳为六个功能完备的原语,操作系统通过调用这些原语的组合可以完成对逻辑核的任何重配置。同时,研究了在动态异构处理器上进程调度触发粒度和计算资源调整触发粒度之间的关系,进而得出使用进程调度时钟即可满足程序阶段行为采样和片上计算资源调整的频率需求。 2.设计了面向动态异构处理器的任务调度框架,该调度框架基于集中式任务队列,能够高效支持逻辑核数目和粒度的快速调整。当发生逻辑核的释放/创建时,任务调度程序只需要进行出队/入队操作即可完成相应数据结构的更新。同时,提出了类流水线调度机制以优化调度程序在集中式队列上较大的决策时间开销,从而使得基于集中式队列的调度框架具备了可用性。 3.研究了程序阶段行为和能够反映程序计算访存特征的常用微结构参数之间的关系,提出了一个基于IPC的程序阶段动态识别算法。进而,设计了逻辑核资源分配算法PERA:该算法能够动态检测程序所处的执行阶段,并根据程序的执行效率准确地判断出该阶段内程序对计算资源的需求。通过将PERA算法设计为一个有限状态机、每次算法触发运行时只进行一次状态转换,从而使得算法具备O(1)的时间复杂度。 4.设计了面向动态异构处理器的公平性调度算法EDP,该算法不仅可以保证每个进程获得和其优先级成比例的性能,而且能够保证多进程的并行执行对相同优先级进程的性能影响相同。同时,得益于对逻辑核动态异构特性的有效使用,在EDP调度下动态异构处理器执行负载的性能也得到了提高。我们的实验结果显示,在片上计算资源总数相等的情况下,使用EDP调度的DHCMP在任务平均周转时间上比对称多核处理器和静态异构多核处理器分别胜出26.2%和11.8%；在系统吞吐率上分别胜出33.6%和12.5%. 本文设计的任务调度框架能够为后续面向动态异构众核处理器的调度算法研究提供一个通用的支撑平台。本文提出的逻辑核资源分配算法PERA、公平性调度算法EDP以及在算法设计过程中对程序阶段行为的探索,可以供后续面向异构多核／众核处理器的任务调度工作参考。同时,本文在集中式队列上提出的类流水线调度优化机制,可以作为一般方法论推广应用于其他众核结构。
[Abstract]:The requirement of high efficiency calculation on chip and the increase of chip manufacturing process deviation drive the multi-core processor into the heterogeneous era. The basic design idea of the heterogeneous multi-core processor structure is to place different granularity of the processor core on the chip, and use a large number of knots while using the chaotic sequence superscalar large kernel to open the serial code performance. In essence, a performance heterogeneous multicore processor can effectively improve the computational efficiency only when the configuration of the processor core is matched with the parallel features of the task load. However, the parallel features and resource requirements of the task load change dynamically, which requires the structure of heterogeneous processors to be necessary. In recent years, the academic circle has proposed a dynamic heterogeneous Dynamic Heterogeneous Chip Multiprocessor (DHCMP) structure in which a large number of basic cores of isomorphism are placed on the chip, and a number of basic nuclear combinations are supported on the microstructures. As a kernel of a single logic processor (called logical kernel), it allows system software to dynamically configure the computing resources on chip (i.e. basic kernel) into multiple heterogeneous logical cores at run time.
However, the dynamic heterogeneous processor itself only provides the ability of logical kernel reconfiguration, whether it can accurately determine the parallel features and resource requirements of the system load, and reasonably configure the DHCMP computing resources to achieve efficient computing. The task scheduler plays a decisive role. This research aims to build a task scheduling framework that can effectively support the rapid adjustment of DHCMP logic kernel, and study the efficient and efficient computing of logical kernel allocation algorithm which can effectively use DHCMP dynamic heterogeneity, as well as to provide the fairness of task priority based on DHCMP. Process scheduling algorithm. The research work and achievements in this paper mainly include the following four aspects:
1. the hardware / operating system interface for dynamic heterogeneous processors is studied, and a simple and general logical kernel abstract is presented to the operating system. The logical kernel reconfiguration operation of the dynamic heterogeneous processor is summed up as six fully functional primitives. The operating system can complete any logical kernel by calling the combinations of these primitive languages. At the same time, the relationship between process scheduling trigger granularity and computing resource adjustment trigger granularity on dynamic heterogeneous processors is studied, and then the use of process scheduling clock can satisfy the frequency requirement of program phase behavior sampling and the adjustment of computing resources on chip.
2. the task scheduling framework for dynamic heterogeneous processors is designed. The scheduling framework is based on centralized task queues, which can efficiently support the rapid adjustment of the number and granularity of logical kernel. When the logical kernel is released / created, the task scheduler only needs to carry out the team / queue exercises to complete the update of the corresponding data structure. A class pipelined scheduling mechanism is proposed to optimize the decision time overhead of the scheduler on the centralized queue, thus making the scheduling framework based on the centralized queue availability.
3. the relationship between the program stage behavior and the common microstructural parameters which can reflect the characteristics of program calculation and memory is studied. A dynamic recognition algorithm based on IPC is proposed. Then, the logical kernel allocation algorithm PERA: is designed, which can dynamically detect the execution phase of the program, and according to the execution efficiency of the program. The requirement of computing resources in this stage is accurately judged. By designing the PERA algorithm as a finite state machine, the algorithm has only one state conversion when the algorithm triggers the run, which makes the algorithm have the time complexity of O (1).
4. the fairness scheduling algorithm EDP for dynamic heterogeneous processors is designed. This algorithm can not only guarantee the performance of each process and its priority, but also ensure that the parallel execution of multi processes has the same impact on the performance of the same priority process. At the same time, it benefits from the efficient use of the dynamic heterogeneous characteristics of logical kernel, in E The performance of the dynamic heterogeneous processor execution load under DP scheduling has also been improved. Our experimental results show that, with the equal total number of computing resources on the chip, the DHCMP using EDP scheduling wins 26.2% and 11.8% more than symmetric multi-core processors and static heterogeneous multi-core processors in the task average turnover time; the throughput rate of the system is higher than that of the static heterogeneous multi-core processors. 33.6% and 12.5%., respectively
The task scheduling framework designed in this paper can provide a general support platform for the future research of scheduling algorithms for dynamic heterogeneous public kernel processors. The logical kernel allocation algorithm (PERA), fairness scheduling algorithm (EDP) and the exploration of program phase behavior in the process of algorithm design can be used for subsequent heterogeneity. At the same time, the class pipelining scheduling optimization mechanism proposed in the centralized queue can be popularized and applied to other public kernel structures as a general methodology.
【学位授予单位】：中国科学技术大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】