非电离平衡天文数值模拟的性能优化
发布时间:2018-05-29 22:10
本文选题:天文模拟 + 非电离平衡 ; 参考:《天津大学》2016年博士论文
【摘要】:计算密集、耗时长是现代天文数值模拟的主要特点。提高模拟计算的性能,减少计算资源的消耗,在精度和性能之间取得一个最佳的平衡点,一直是天文数值模拟软件设计的关键目标。同时,建立完整的有效的非电离平衡(NEI)状态下的辐射流体数值模拟一直是天文数值模拟中的难题之一,然而传统的基于欧拉网格的非电离平衡求解过程在内存、计算和网络通讯上都带来了巨大的开销。本文在总结以前研究工作的基础上,从非电离平衡模拟过程中制约性能的几个关键因素入手,针对多核异构体系和大规模并行环境,分别从工作流程、框架结构、数值求解三个层面对非电离平衡模拟进行了性能优化。首先,本文分析并验证了传统算法的性能瓶颈,通过引入示踪粒子将底层的自适应网格与上层的非电离平衡计算解耦,然后基于MapReduce模型重新设计了非电离平衡的并行求解框架。同时针对新框架中伴随而来的大量粒子的快照生成、保存以及轨迹重建等问题,设计了串行I/O、并行I/O、直接I/O以及实时的流处理模式,使其能够适应不同的计算环境和具体要求。实验表明,框架结构层次上的优化克服了非电离平衡模拟在大规模并行时的性能瓶颈,在相同的实验环境下,仅用原来1/4的计算资源,就取得了3倍以上的性能提升。其次,为了突破传统CPU结构在求解大量非电离平衡方程时的性能制约,本文继续提出了基于多CPU-多GPU混合异构体系的非电离平衡求解器。算法设计上,通过使用基于共享内存和任务队列的任务调度策略,最大限度地发挥了CPU和GPU各自的优势,提高了整体的资源利用率,同时根据CUDA编程模型的特点,在算法的数据结构、任务粒度以及内存访问等方面进行了专门的优化。测试结果显示基于多核异构的求解器显著地提高了非电离平衡方程的求解效率,在4块GPU设备的情况下,加速比达到了15左右。最后,本文利用可视分析和驾驭式计算技术来优化天文模拟的工作流程。基于自适应网格的层级结构,利用快速的低精度的组合分析来指导耗时的高精度的模拟计算。同时又根据天文数值模拟的特点,设计了参数分类和调整接口,进而帮助天文学家高效准确地把握并控制数值模拟过程。本文提出的可视化驾驭计算环境有效的加速了非电离平衡模拟生命周期中的模型建立、电离状态分析等过程,并在参数调整、数值误差控制等方面辅助用户决策。文中所有实验都是基于实际的天文数值模拟,文中还对所有实验结果的精度进行了详细的对比分析。此外,本文对上述各类方法在其他问题的适应性上也进行了详细的分析和验证,相关实验显示本文的方法同样能够大幅度提升核合成、光谱计算等常见天文计算的性能,以及加速星风模型的探索和确立过程。
[Abstract]:The main characteristics of modern astronomical numerical simulation are dense computation and long time consuming. Improving the performance of simulation, reducing the consumption of computing resources and achieving an optimal balance between accuracy and performance are the key objectives of the design of astronomical and numerical simulation software. At the same time, it has been one of the difficult problems in astronomical numerical simulation to establish a complete and effective numerical simulation of radiation fluid under the condition of non-ionization equilibrium (NEI). However, the traditional non-ionization equilibrium solution based on Eulerian grid is in memory. Computing and network communication are costly. On the basis of summarizing the previous research work, this paper starts with several key factors that restrict the performance of the simulation process of non-ionization equilibrium, aiming at the multi-core isomerism system and large-scale parallel environment, respectively from the workflow, the framework structure, The performance of non-ionization equilibrium simulation is optimized at three levels. Firstly, this paper analyzes and verifies the performance bottleneck of the traditional algorithm, decouples the underlying adaptive mesh from the upper non-ionization equilibrium calculation by introducing tracer particles, and then redesigns the parallel solution framework of non-ionization equilibrium based on MapReduce model. At the same time, aiming at the problems of snapshot generation, preservation and trajectory reconstruction of a large number of particles accompanying in the new framework, serial I / O, parallel I / O, direct I / O and real-time stream processing modes are designed to adapt them to different computing environments and specific requirements. The experimental results show that the optimization of the frame structure level overcomes the performance bottleneck of the non-ionization equilibrium simulation in large-scale parallelism. In the same experimental environment, using only one quarter of the original computing resources, the performance is improved by more than three times. Secondly, in order to break through the performance constraints of traditional CPU structure in solving a large number of non-ionization equilibrium equations, this paper proposes a non-ionization equilibrium solver based on multi-CPU-multi-GPU mixed isomerization system. In algorithm design, by using the task scheduling strategy based on shared memory and task queue, the advantages of CPU and GPU are maximized, and the overall resource utilization is improved. At the same time, according to the characteristics of CUDA programming model, The algorithm is optimized in data structure, task granularity and memory access. The test results show that the efficiency of solving the nonionization equilibrium equation is significantly improved by the multi-core heterogeneous solver. The speedup ratio is about 15 in the case of 4 GPU devices. Finally, visual analysis and steering computing techniques are used to optimize the work flow of astronomical simulation. Based on the hierarchical structure of adaptive meshes, fast and low-precision combinatorial analysis is used to guide time-consuming and high-precision simulation. At the same time, according to the characteristics of astronomical numerical simulation, a parameter classification and adjustment interface is designed to help astronomers grasp and control the numerical simulation process efficiently and accurately. The visual steering computing environment proposed in this paper effectively speeds up the process of modeling and ionization state analysis in the life cycle of non-ionization equilibrium simulation, and assists users in parameter adjustment, numerical error control and so on. All experiments in this paper are based on actual astronomical numerical simulation, and the accuracy of all experimental results is compared and analyzed in detail. In addition, the adaptability of the above methods to other problems is also analyzed and verified in detail. The experimental results show that the proposed method can also greatly improve the performance of common astronomical calculations, such as nuclear synthesis, spectral calculation, and so on. And to accelerate the exploration and establishment of the star wind model.
【学位授予单位】:天津大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:P11
【参考文献】
相关期刊论文 前1条
1 罗力;杨超;赵宇波;蔡小川;;CPU/GPU集群上求解偏微分方程的可扩展混合算法[J];集成技术;2012年01期
,本文编号:1952626
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1952626.html