大规模高效能计算的系统软件关键技术研究

发布时间：2018-05-12 14:33

本文选题：高效能计算 + 功耗管理　；参考：《国防科学技术大学》2012年博士论文

【摘要】：为满足大型应用不断增长的计算性能需求，高端计算系统的规模越来越大，结构越来越复杂，计算密度越来越高。功耗大、管理复杂、可靠性低、成本高等问题严重制约着大规模计算技术的进一步发展。高端计算已经由一味地追求高性能转向综合考虑系统产出率的高效能，力求在提高系统性能、鲁棒性、易用性的同时，降低系统成本。系统软件是实现大规模系统高效能计算的关键环节。本文面向高效能计算，立足系统软件，从功耗管理和用户环境两个方面展开研究，取得如下创新性成果： 1、为保障大规模计算系统在供电预算约束下的用电安全，提出了基于结点分类的系统峰值功耗管理模型PCNC及相应的功耗控制算法，以可接受的管理开销和系统性能损失，有效地控制系统的运行峰值功耗。其创新点有：（A）按功耗特性和作用，将系统中的结点分为统计源结点集合、特权结点集合、候选结点集合和目标结点集合，降低系统功耗管理的采样和控制规模；（B）采用两级阈值设置，将系统功耗分为安全、警戒和危险三个状态，以作业为基本调节单位，对不同状态采取不同程度、不同结点范围的功耗控制措施；（C）设计并研究了基于状态和基于变化的两类目标结点选择策略。实验表明，该功耗控制系统在控制效果最大损失7.4%的代价下，将控制开销降低了76.3%，适用于大规模计算系统，两类目标结点选择策略在系统性能损失分别为1.4%和1.1%的代价下，超标功耗累积效应各自降低73%和66%，优化效果明显。 2、针对大规模系统中未运行作业的空闲活跃结点产生的能耗浪费，提出了大规模系统空闲结点的功耗管理模型ASDMIN以及对空闲结点休眠深度的自适应管理算法，以较小的响应速率损失，有效降低系统空闲能耗。其创新点有：（A）多级储备结构：以当前结点支持多个休眠状态的硬件机制为基础，将空闲结点按所处功耗状态划分为不同休眠等级的结点储备集合；（B）隐蔽式状态迁移：资源分配时，首先从最高级储备池选取结点，只有当高级储备池中的结点不足以满足应用需求时，才由低一级储备池中的结点补足，空闲结点的功耗状态根据应用负载需求动态升降级，状态迁移在结点处于空闲备用时进行，状态迁移的时间开销不影响系统响应速率；（C）自适应控制算法：综合考虑能耗与系统响应速率两个相互冲突的因素，设计了基于ASDMIN模型的资源分配与回收算法、休眠结点在不同休眠状态之间的动态升降级算法以及储备额阈值动态自适应校准算法。实验表明，ASDMIN方法在作业平均响应延迟率仅增加8.85%的代价下，系统空闲结点功耗降低84.12%，，系统空闲功效提高了82.71%，优化效果显著。 3、针对传统的、用户登录共享的并行计算系统使用环境中，系统使用、管理及用户数据安全所面临的问题，提出了高性能虚拟域技术，在保证系统高性能的前提下，为用户提供专用的虚拟化巨型机环境。创新点包括：（A）为满足用户的不同需求，设计了通用虚拟域和专用虚拟域两种用户环境，并采用本地与全局相结合的两级文件部署模式，优化文件访问性能；（B）基于环境提取和文件访问路径动态转换机制，实现计算阵列高性能计算域的动态构建；（C）提出影子系统文件设置和文件访问穿透链接技术，实现虚拟计算环境文件系统的安全隔离和双分区单映像存储。实验表明，该虚拟计算环境符合LSB和POSIX标准，前端服务阵列的性能损失小于3%，后端计算阵列的性能损失小于0.5%，满足高效能计算的需求。 4、为保障用户对计算资源的需求，同时防止用户占用过多资源而导致系统故障，提出了多粒度自适应服务质量保障机制。主要创新点包括：（A）多粒度服务质量控制机制：分别以进程、进程组、用户作业、虚拟用户环境等多种粒度进行资源管理和使用控制，满足虚拟化环境中不同运行实体的服务质量需求；（B）自适应的资源分配机制：设计了双阈值自适应资源限制机制，根据用户需求和系统资源状态，自适应调整用户资源使用限额，弥补了用户资源预约的不准确性；（C）终止目标选择策略：提出并分析了多种终止目标选择策略，以便在用户的资源分配请求无法被满足的情况下，合理终止某些运行实体，保障系统的高效运行。实验表明，本文提出的服务质量保障技术将系统产出率提高了17.14%，而对应用性能的影响不超过0.65%，可以忽略不计。 5、由于虚拟机的“隔离”效应，传统功耗管理技术在虚拟化环境中不能直接操作硬件，针对这一难题，提出了虚拟机和物理主机两级功耗管理模型以及相应的功耗管理算法。主要创新有：（A）两级功耗管理：分别在虚拟机内和虚拟化实现层实施功耗管理；（B）虚拟机中的功耗管理设施与机制：在虚拟机中引入虚设备功耗行为的监测与统计，设计了虚设备速率动态调节、虚设备动态休眠、以及虚拟机动态休眠等多种虚拟化功耗管理机制，实现了虚拟功耗状态到物理功耗状态的叠加映射，向虚拟机内用户提供了虚设备和虚拟机的功耗管理接口。实验表明，虚拟化功耗管理机制与传统物理功耗管理机制完全兼容，基于虚拟化功耗管理机制实现的虚拟机内功耗管理方案将系统能效优化了2.75%，而功耗管理虚拟化对应用性能的影响不超过0.4%。
[Abstract]:In order to meet the increasing demand for computing performance of large scale applications, the scale of the high end computing system is increasing, the structure is more complex, the calculation density is getting higher, the power consumption, the management complexity, the low reliability and the high cost seriously restrict the further development of the large-scale computing technology. The high-end computing has been blindly pursuing the high performance. Considering the high efficiency of system output rate, we strive to reduce system cost while improving system performance, robustness and ease of use.
System software is the key link to achieve high efficiency in large scale systems. This paper focuses on the efficiency calculation, based on the system software, from two aspects of power management and user environment, and obtain the following innovative results:
1, in order to ensure the power consumption security of large scale computing systems under the constraints of power supply budget, a system peak power management model PCNC based on node classification and the corresponding power control algorithm are proposed, which can effectively control the peak power consumption of the system with acceptable management overhead and system performance loss. The innovation point is: (A) according to the power characteristics and the power characteristics of the system. The nodes in the system are divided into the set of statistical source nodes, the set of privileged nodes, the set of candidate nodes and the set of target nodes to reduce the sampling and control scale of the power management of the system. (B) the system is set up by the two level threshold, and the system power is divided into security, alert and dangerous three states, with the operation as the basic regulation unit and the different state mining. The power control measures of different degree and different node range are taken. (C) the two types of target node selection strategy based on state and change are designed and studied. The experiment shows that the control system reduces the control overhead by 76.3% at the cost of the maximum control loss of 7.4%, and is suitable for the large scale computing system and the two target nodes. When the system performance loss is 1.4% and 1.1% respectively, the cumulative effect of over standard power consumption decreases by 73% and 66% respectively, and the optimization effect is obvious.
2, in view of the waste of idle active nodes in large scale systems, a power management model ASDMIN and an adaptive management algorithm for idle node dormant depth of large scale systems are proposed, which can reduce the idle energy consumption of the system effectively with smaller response rate loss. The innovation point is: (A) multilevel storage. Structure: Based on the hardware mechanism of the current node supporting multiple dormancy States, the free nodes are divided into the node reserve sets of different dormancy levels according to their power state; (B) hidden state migration: when resource allocation, the node is first selected from the most advanced reserve pool, only when the nodes in the advanced reserve pool are not enough to satisfy. When the requirements are applied, the nodes in the lower level reserve pool are complemented. The power state of the idle nodes is dynamically raised and lowered according to the application load demand. The state migration is carried out in the idle spare time. The time overhead of the state migration does not affect the response rate of the system; (C) the adaptive control algorithm: the energy consumption and the response rate of the system are considered. Two conflicting factors are designed, which are based on the resource allocation and recovery algorithm based on ASDMIN model, the dynamic lifting level algorithm and the reserve threshold dynamic adaptive calibration algorithm between dormancy nodes in different dormancy states. The experiment shows that the ASDMIN method has the idle node of the system under the cost of only increasing the delay rate of the job average response. The power consumption is reduced by 84.12%, the idle efficiency of the system is increased by 82.71%, and the optimization effect is remarkable.
3, in view of the traditional, user logon and shared parallel computing system, using the environment, system use, management and user data security, presents a high performance virtual domain technology to provide a special virtual giant environment for users under the premise of guaranteeing the high performance of the system. The innovation points include: (A) to satisfy the user's different In demand, two user environments of general virtual domain and special virtual domain are designed, and the two level file deployment mode combining local and global is used to optimize file access performance. (B) based on dynamic transformation mechanism of environment extraction and file access path, dynamic construction of computational array high energy computing domain is realized; (C) a shadow system file is proposed. It is shown that the virtual computing environment conforms to the LSB and POSIX standards, the performance loss of the front end service array is less than 3%, and the performance loss of the back end computing array is less than 0.5%, which satisfies the requirement of high efficiency calculation.
4, in order to guarantee the user's demand for computing resources and prevent the user from taking over too many resources and causing system failure, a multi granularity adaptive quality assurance mechanism is proposed. The main innovation points include: (A) multi granularity service quality control mechanism: resources such as process, process group, user job, virtual user environment, etc., respectively. Management and use control to meet the service quality requirements of different running entities in the virtualization environment; (B) adaptive resource allocation mechanism: a dual threshold adaptive resource restriction mechanism is designed, which adaptively adjusts the user resource usage limit according to the user needs and system resource status, and makes up the inaccuracy of the user's resource reservation; (C) To terminate the target selection strategy, a variety of terminating target selection strategies are proposed and analyzed in order to reasonably terminate some running entities and ensure the efficient operation of the system under the circumstances that the user's resource allocation request cannot be met. The experiment shows that the quality of service technology proposed in this paper raises the system output rate by 17.14% and is applied to the application. The impact of performance is not more than 0.65%, which can be ignored.
5, due to the "isolation" effect of the virtual machine, the traditional power management technology can not operate the hardware directly in the virtualization environment. Aiming at this problem, the two level power management model of virtual machine and physical host and the corresponding power management algorithm are proposed. The main innovations are: (A) two level power management: in virtual machine and virtual reality respectively The current layer implements power management; (B) power management facilities and mechanisms in virtual machines: monitoring and statistics are introduced in virtual machines, virtual device rate dynamic regulation, virtual equipment dynamic dormancy, and virtual machine dynamic dormancy are designed to realize virtual power state to physical work. The power management interface of virtual machines and virtual machines is provided to users in virtual machines. Experiments show that the virtualized power management mechanism is fully compatible with the traditional physical power management mechanism. The virtual machine internal power consumption management scheme based on virtual power management mechanism optimizes the energy efficiency of the system by 2.75%, and the power management system is used. The effect of physical virtualization on application performance is not more than 0.4%.

【学位授予单位】：国防科学技术大学
【学位级别】：博士
【学位授予年份】：2012
【分类号】：TP38;TP311.5

【参考文献】