当前位置:主页 > 科技论文 > 计算机论文 >

多核集群上的混合并行分子动力学计算研究

发布时间:2018-02-03 23:58

  本文关键词: 混合编程模型 多核集群 分子动力学 MPI OpenMP 出处:《电子科技大学》2012年博士论文 论文类型:学位论文


【摘要】:随着高性能计算机的快速发展和计算资源的日益丰富,高性能计算已成为当今国内外研究的热点。由于高性能计算机的主流结构已从大规模并行处理机转向多核集群,系统也从单一内存模型转向混合内存模型,为高性能计算机所设计的并行程序必须适应这一转变,从而产生了混合并行编程模型。分子动力学(Molecular Dynamics,MD)模拟作为一种重要的科学研究方法,在多个学科领域里得到了广泛地应用。进一步加快MD模拟在多核集群上的计算速度,促进这些领域的科研工作进一步发展就变得非常紧迫。然而,当人们在设计多核集群上的基于混合并行编程模型的并行MD算法以及其它并行算法时,普遍遇到引入多线程并行时开销过高的问题,使混合模型常常不如原来的纯消息传递模型。因此,如何解决这类问题,提高科学与工程计算程序在多核集群上的计算速度,,是当前研究的一个重要方向。 本文全面系统地研究混合并行编程模型、混合并行MD算法的研究现状和存在的不足,在此基础上提出了一系列相关问题的优化或改进算法。 本文的主要内容及创新点如下: (1)本文深入地分析了适用于多核集群的混合并行编程模型、并行MD算法的基本原理和基本实现方法,为后面提出的多核集群上的混合并行MD算法打下了基础。 (2)本文论证了Critical Section算法进行多线程并行MD计算的可扩展性问题,理论分析和实验结果表明,Critical Section算法在处理器核心数量大于8时的加速比明显下降。本文进而提出了一个称为三角形并行MD算法的优化方法,该方法通过静态分配原子集的策略让各线程在不同的时刻进入临界区,从而减少临界区的闲置时间,加快并行计算速度。 (3)本文提出了基于OpenMP的并行MD算法——SPMD-like(Single ProgramMultiple Data)算法。该算法采用与SPMD程序相同的各自处理数据并冗余计算跨区域数据关系的策略,但是在实现上却接近简单的OpenMP实现,不需要修改MD的内部计算逻辑,只需要修改几个数据结构并添加一个空间分解子程序。该算法在保持OpenMP实现简单特点的同时取得接近纯消息模型的并行计算性能和可扩展性。 (4)本文提出了一种多核集群上的基于混合MPI/OpenMP模型的并行MD算法。该算法在保持尽量小修改原则的基础上,将SPMD-like算法嵌入纯MPI并行MD程序中。该混合并行程序在节点内采用OpenMP并行,在引入较小并行开销的同时,明显地减少了节点间的通信时间,从而有效地提高了MD程序在多核集群上的计算速度和并行效率。 (5)本文提出了一种完全避免临界区的归约算法——分块轮换归约算法,该算法在保持与Critical Section算法相似的简单性的同时,具有比Critical Section算法更好的并行性能和可扩展性。理论分析和实验测试证明该算法在节点内处理器核数为16时并行性能较好,但是达到32以及更大时,它的性能不如SPMD-like算法。因此它和SPMD-like算法分别适合于不同的混合并行场合:节点内处理器核数量不多时,可选择实现较简单的分块轮换归约法;处理器核数量较多时可采用性能更好的SPMD-like算法。 (6)本文提出了一种基于混合MPI/TBB模型的并行MD算法,并以LAMMPS为例进行了它的实现研究。实验测试结果表明,当多核集群中参与计算的节点数增加到一定程度后,混合模型可以获得比纯MPI模型更好的并行性能,且主要原因是通信时间的减少。
[Abstract]:With the rapid development of high - performance computers and the increasingly abundant computing resources , high - performance computing has become a hot topic at home and abroad . As the mainstream structure of high - performance computers has shifted from a large - scale parallel processing machine to a multi - core cluster , a parallel program designed by a high - performance computer has been widely used . In this paper , a systematic study of the mixed parallel programming model , the research status and the shortcomings of the hybrid parallel MD algorithm are studied systematically . Based on this , a series of optimization or improved algorithms are proposed . The main content and innovation points of this paper are as follows : ( 1 ) This paper deeply analyzes the mixed parallel programming model applicable to multi - core cluster , the basic principle and realization method of parallel MD algorithm , which lays a foundation for the hybrid parallel MD algorithm on multi - core cluster . ( 2 ) In this paper , the scalability problem of multi - thread parallel MD computation is demonstrated by Critical Section algorithm . The theoretical analysis and experimental results show that the critical section algorithm decreases significantly when the number of processor cores is greater than 8 . This paper further proposes an optimization method called triangle parallel MD algorithm . This method allows each thread to enter the critical area at different times by statically assigned atom set strategy , thus reducing the idle time of the critical area and speeding up the parallel computing speed . ( 3 ) In this paper , a parallel MD algorithm _ SPMD - like ( Single Program Multiple Data ) algorithm is proposed , which uses the same processing data as SPMD program and computes the cross - region data relationship . However , it is close to the implementation of the simple program . There is no need to modify the internal calculation logic of MD . It is only necessary to modify several data structures and add a spatial decomposition subroutine . ( 4 ) In this paper , a parallel MD algorithm is proposed based on the hybrid MPI - like model on a multi - core cluster . The algorithm is based on the principle of small modification , and the SPMD - like algorithm is embedded in a pure MPI parallel MD program . In the node , the hybrid parallel program is used in parallel , and the communication time between the nodes is obviously reduced while the smaller parallel overhead is introduced , thereby effectively improving the computing speed and the parallel efficiency of the MD program on the multi - core cluster . ( 5 ) In this paper , a reduction algorithm _ block rotation reduction algorithm is proposed to completely avoid the critical section . The algorithm has better parallel performance and scalability than the Critical Section algorithm while maintaining the similarity to the Critical Section algorithm . The theoretical analysis and experimental tests prove that the algorithm is better in parallel performance than the SPMD - like algorithm when the number of processors in the node is 16 . Therefore , it is better than the SPMD - like algorithm when the number of processors in the node is high . ( 6 ) A parallel MD algorithm based on mixed MPI / TBB model is presented in this paper , and its implementation is studied with LAMMPS . The experimental results show that when the number of nodes participating in the multi - core cluster increases to a certain degree , the hybrid model can obtain better parallel performance than pure MPI model , and the main reason is the reduction of communication time .

【学位授予单位】:电子科技大学
【学位级别】:博士
【学位授予年份】:2012
【分类号】:TP338

【参考文献】

相关期刊论文 前7条

1 王庆先;孙世新;尚明生;刘宴兵;;并行计算模型研究[J];计算机科学;2004年09期

2 陈国良;孙广中;徐云;吕敏;;并行算法研究方法学[J];计算机学报;2008年09期

3 白明泽;程丽;豆育升;孙世新;;基于OpenMP的分子动力学并行算法的性能分析与优化[J];计算机应用;2012年01期

4 单莹;吴建平;王正华;;基于SMP集群的多层次并行编程模型与并行优化技术[J];计算机应用研究;2006年10期

5 潘卫;陈燎原;张锦华;李永革;潘莉;夏凡;;基于SMP集群的MPI+OpenMP混合编程模型研究[J];计算机应用研究;2009年12期

6 赵永华,迟学斌;基于SMP集群的MPI+OpenMP混合编程模型及有效实现[J];微电子学与计算机;2005年10期

7 陈国良;苗乾坤;孙广中;徐云;郑启龙;;分层并行计算模型[J];中国科学技术大学学报;2008年07期



本文编号:1488715

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1488715.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户81cc1***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com