当前位置:主页 > 科技论文 > 计算机论文 >

高性能计算机系统能耗管理技术方法研究

发布时间:2018-06-28 05:41

  本文选题:高性能计算 + 能耗管理 ; 参考:《国防科学技术大学》2012年硕士论文


【摘要】:高性能计算是继理论和实验之后人类认识世界的第三大工具和方法。高性能计算技术已广泛应用于石油勘探数据处理、生物医药研发、工程设计与仿真、新能源新材料、环境科学研究,以及多领域的基础科学等。为满足日益增长的高性能计算需求,世界各国纷纷制定高性能计算机发展计划,不断提升高性能计算机的性能,相应的计算机系统的能耗也不断提高。例如,现在最高性能计算机的峰值性能已经达到27Pflops,相应的能耗也已达到8.2MW。巨大的能耗不仅给高性能计算机的运营带来了高昂的成本开销,而且对高性能计算机的可靠性和可用性造成直接和潜在的不良影响。高性能计算机的能耗问题已成为国内外学者关注和研究的热点问题之一。 本文分析了高性能计算机的主要能耗源和能耗分布情况。经过分析,我们发现高性能计算机的计算子系统为整个系统中最主要的能耗源。通常,高性能计算机是向多个用户(数十或者上百)同时提供7×24小时计算服务(共享使用),超级计算中心通常是按周或者月向用户提供一定数量的计算资源。由于不同用户使用计算机的时间和需要计算资源数量不确定,这往往容易造成每周不同日期不同时间段计算资源使用不均衡,造成一定计算资源的闲置,,导致计算资源和相应的电能浪费。本文主要研究如何在对系统中运行作业影响最小的情况下,有效管理计算子系统中的计算资源,提高计算资源的利用率和提高能耗的利用率。 本文首先以TH-1A系统为例,分析了TH-1A计算子系统的能耗使用情况、计算结点支持的节能控制管理机制和资源管理系统SLURM。通过对SLURM节能模块的实验,本文分析了SLURM节能模块的不足,针对这些不足,提出了基于活跃资源利用率阈值的能耗管理策略。该策略使系统维持一定的空闲资源,使系统在降低能耗的同时尽量减少作业的平均等待时间。然后,面向未来高性能计算机系统,本文提出了基于负载规律的混合能耗管理策略。该策略一方面注重对高性能计算机系统负载规律的利用,另一方面将结点睡眠和结点关闭两种节能技术结合使用,进一步降低系统的能耗。最后,本文设计并实现了面向高性能计算机的能耗管理方法模拟器,用以评估上述两种能耗管理策略的效果。实验结果表明,本文所设计的两种能耗管理策略可有效降低系统的能耗。
[Abstract]:High performance computing is the third major tool and method for human understanding of the world after theory and experiment. High performance computing technology has been widely used in oil exploration data processing, biomedical research and development, engineering design and simulation, new energy and new materials, environmental science research, and the basic science of multi collar fields. In order to calculate the demand, countries in the world have formulated high performance computer development plans, constantly improving the performance of high-performance computers, and increasing the energy consumption of corresponding computer systems. For example, the peak performance of the highest performance computer has reached 27Pflops, and the corresponding energy consumption has reached the huge energy consumption of 8.2MW. not only for high performance calculation. The operation of the machine has brought high cost cost, and it has a direct and potential adverse effect on the reliability and availability of high performance computers. The energy consumption of high performance computer has become one of the hot issues of attention and research by scholars at home and abroad.
This paper analyzes the main energy source and energy consumption distribution of high performance computers. After analysis, we find that the computing subsystem of high performance computer is the most important energy source in the whole system. Usually, high performance computers offer 7 x 24 hours computing service (shared use) to multiple users (tens or hundreds), supercomputing. The center usually provides a certain amount of computing resources to the user according to the week or the month. Due to the uncertainty of the amount of resources used for the time and needs of the different users, it tends to cause unbalance of the use of computing resources at different dates and different periods of the week, resulting in the idle resources, which leads to the calculation of resources and the corresponding results. In this paper, we mainly study how to effectively manage the computing resources in the computing subsystem, improve the utilization of computing resources and improve the utilization of energy consumption in the case of the least impact of operation on the operation of the system.
Taking the TH-1A system as an example, this paper analyzes the energy consumption of the TH-1A computing subsystem, calculates the energy saving control management mechanism of the node support and the resource management system SLURM. through the experiment on the SLURM energy saving module. This paper analyzes the insufficiency of the SLURM energy saving module, and puts forward the threshold based on the active resource utilization ratio. The strategy makes the system maintain a certain number of idle resources, make the system reduce the energy consumption while minimizing the average waiting time of the operation. Then, facing the future high performance computer system, this paper proposes a hybrid energy management strategy based on the load rule. On the other hand, on the other hand, the two energy saving technologies are combined to reduce the energy consumption of the system. Finally, the energy management method simulator for high performance computer is designed and implemented to evaluate the effectiveness of the two energy consumption management strategies. The experimental results show that the design of this paper is designed. The two energy management strategies can effectively reduce the energy consumption of the system.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP38

【参考文献】

相关期刊论文 前3条

1 戴永涌;杨树军;;基于资源调度的集群节能系统的设计与实现[J];计算机工程与科学;2009年S1期

2 田宝华;蒋句平;李宝峰;张晓明;屈婉霞;;基于统一资源管理的超级计算机系统节能方案[J];计算机应用;2012年03期

3 姚信安;宋飞;胡世平;;高性能计算机系统电源设计[J];计算机应用;2012年04期



本文编号:2076956

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2076956.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户048ae***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com