大规模并行计算通信可扩展性—分析、优化与模拟

发布时间：2018-04-22 20:31

本文选题：并行计算 + 通信可扩展性　；参考：《国防科学技术大学》2013年博士论文

【摘要】：随着系统规模的扩大和结点计算能力的提高，通信已经成为制约并行计算可扩展性的重要瓶颈。通信可扩展性问题，即分析通信受何种因素影响并且该影响增大到何种程度会限制系统的可扩展性，是并行计算领域最具挑战性的理论问题之一。本文针对通信可扩展性问题，首次从性能加速比的角度量化了并行计算的通信墙，并建立了通信可扩展性模型。基于通信可扩展性模型的分析结论，本文分别针对程序优化和任务分配优化，提出了消息独立性指导下的程序优化技术和面向多作业的分配优化技术。最后，设计和实现了一款针对大规模并行计算的性能预测模拟器，该模拟器可用于验证通信可扩展性模型的正确性以及并行系统的各种相关优化技术的可扩展性。具体而言，本文的主要工作和创新点体现在： 1.建立了通信可扩展性模型（第二章）目前，国际上对于通信可扩展性问题大多是感性上的认识，并未对其进行系统的定量研究。本文首次提出了通信墙的定量化描述，给出了通信墙存在性定理。由此，本文建立了通信可扩展性模型，提出了系统度量方法及基于通信可扩展性模型的并行系统分类方法，量化了系统的通信可扩展性强弱和广义通信可扩展性强弱。最后结合具体案例，分析了程序、并行机拓扑以及常见优化方法对通信可扩展性的影响，比较了常见的巨型机拓扑的广义通信可扩展性强弱，指出优化系统通信可扩展性和广义通信可扩展性的方向。 2.提出了消息独立性指导下的程序优化技术（第三章）基于指令重排的通信隐藏技术是优化程序性能的主要手段之一，然而除去该技术自身面临的问题，它还会导致消息间产生严重的网络资源竞争。本文通过分析网络资源竞争的产生原因，首次提出了消息独立性的概念并研究了其具体涵义；然后针对MPI（Message Passing Interface）程序，建立了基于指令重排的消息独立性指导下的程序优化模型；基于上述优化模型，设计并实现了基于指令重排的消息独立性指导下的程序优化方法，，该方法可以在保证通信隐藏最大化的前提下减少消息间的网络资源竞争；针对并行CFD（Computational Fluid Dynamics）应用的实验表明，该方法能够很好的减少程序的通信开销并提升程序的性能。 3.提出了面向多作业的分配优化技术（第四章）合理地为多个作业分配计算资源以满足作业的性能需求，对于那些使用大规模并行计算系统的用户来说十分重要。本文首次提出将多作业分配优化问题分解为多作业分布优化和单作业任务映射优化两个子问题。针对多作业分布优化问题，本文首次提出闭合最小图划分模型，将多作业分布优化问题转化为闭合最小图划分问题；针对单作业任务映射优化问题，本文分析了通信协议对通信开销的影响，首次为MPI程序提出了协议感知的进程映射模型—PaPP。基于上述两个模型，本文设计并实现了面向多作业的分配优化方法。实验表明，对于NPB（NAS Parallel Benchmarks）测试集，面向多作业的分配优化方法有很好的性能优化效果。 4.设计并实现虚实结合的执行驱动模拟器—VACED-SIM（第五章）离散事件模拟是大规模并行计算常用的性能预测方法之一。本文基于对离散事件模拟方法的深入分析，提出了虚模拟和实模拟的概念；通过对虚模拟和实模拟以及轨迹驱动和执行驱动方法的对比，首次从两个正交的角度（模拟机制和事件驱动方法）将基于离散事件模拟的性能预测方法分为四类；针对大规模并行计算可扩展性预测的特点，首次提出了第四类模拟方法—虚实结合执行驱动（VACED）模拟方法的模型。基于该模型，本文设计和实现了一款轻量级的虚实结合执行驱动模拟器—VACED-SIM。在该模拟器中，本文首次提出并采用了细粒度的活动和事件定义方法，从而提高模拟的精度。在Tianhe-1A子系统上的实验结果表明，VACED-SIM具有很高的准确性与效率。
[Abstract]:With the expansion of the scale of the system and the improvement of the computing power of nodes, communication has become an important bottleneck to restrict the scalability of parallel computing. Communication scalability is the most challenging theoretical question in the field of parallel computing, which is to analyze what factors affect communication and to what extent the influence will limit the scalability of the system. One of the questions.
Aiming at the problem of communication scalability, this paper first quantifies the communication wall of parallel computing from the angle of performance acceleration ratio, and establishes a communication extensibility model. Based on the analysis conclusion of the communication scalability model, this paper puts forward the optimization technology and face of program optimization under the guidance of message independence for program optimization and task allocation optimization respectively. In the end, a performance prediction simulator for large-scale parallel computing is designed and implemented. The simulator can be used to verify the correctness of the communication scalability model and the scalability of the various related optimization techniques of the parallel system.
Specifically, the main work and innovation of this paper are as follows:
1. the communication scalability model (second chapters) is established.
At present, the problem of communication extensibility is mostly perceptual knowledge in the world, and the quantitative study of the communication wall is not carried out. In this paper, the quantitative description of the communication wall and the existence theorem of the communication wall are presented for the first time. Therefore, the communication extensibility model is established in this paper, and the system measurement method and the communication scalability based on the communication are proposed. The parallel system classification method of the model quantifies the scalability of the system and the extensibility of the generalized communication. Finally, it analyzes the influence of the program, the parallel machine topology and the common optimization methods on the scalability of the communication, and compares the extensibility of the common supercomputer topology with the specific cases. The scalability of communication system and the extensibility of generalized communication.
2. the program optimization technology under the guidance of message independence (third chapter) the communication hiding technology based on the rearrangement of instruction is one of the main means to optimize the performance of the program. However, it will also cause serious network resource competition between the messages. This paper analyzes the origin of network resource competition. For the first time, the concept of message independence is proposed and its specific meaning is studied. Then a program optimization model under the guidance of message independence is established for MPI (Message Passing Interface) program. Based on the above optimization model, a program under the guidance of message independence based on instruction rearrangement is designed and implemented. The optimization method can reduce the network resource competition between messages under the premise of guaranteeing the maximization of communication hiding. The experiment for parallel CFD (Computational Fluid Dynamics) application shows that this method can reduce the communication overhead of the program well and improve the performance of the program.
3. a multi job oriented allocation optimization technique (fourth chapter) is proposed to rationally allocate computing resources for multiple jobs to meet the performance requirements of the job. It is very important for users to use large scale parallel computing systems. This paper first proposes to decompose the multi assignment optimization questions into multi job distribution optimization and single job assignment for the first time. For the first time, two sub problems are proposed. For the problem of multi job distribution optimization, the closed minimum graph partition model is proposed for the first time. The problem of multi job distribution optimization is transformed into a closed minimum graph partition problem. In this paper, the influence of communication protocol on communication overhead is analyzed for the optimization problem of single task task mapping, and the MPI program is first proposed for the first time. Protocol aware process mapping model - PaPP. based on the above two models, this paper designs and implements a multi job oriented allocation optimization method. The experiment shows that for the NPB (NAS Parallel Benchmarks) test set, the optimization method for multi job assignment optimization has a good performance optimization effect.
4. VACED-SIM (fifth chapter) discrete event simulation is one of the common performance prediction methods used in large-scale parallel computing. Based on the in-depth analysis of the discrete event simulation method, the concept of virtual simulation and real simulation is proposed in this paper, and the virtual simulation, real simulation and trajectory driving are adopted. Compared with the execution drive method, the performance prediction methods based on discrete event simulation are divided into four classes for the first time from two orthogonal angles (simulation mechanism and event driven method). For the first time, the fourth analog square method, virtual reality combined execution drive (VACED) simulation method, is proposed for the characteristics of the scalability prediction of large-scale parallel computing. Based on this model, this paper designs and implements a lightweight virtual reality combined execution drive simulator - VACED-SIM., in this simulator, first proposed and adopted a fine-grained activity and event definition method to improve the accuracy of the simulation. The experimental results on the Tianhe-1A subsystem show that the VACED-SIM is very high. Accuracy and efficiency.

【学位授予单位】：国防科学技术大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：TP338.6

【参考文献】