基于GPU的车身结构接触碰撞过程并行计算方法

发布时间：2018-04-09 18:49

本文选题：图形处理器　切入点：统一计算架构　出处：《湖南大学》2013年博士论文

【摘要】：汽车车身结构接触碰撞过程有限元计算是汽车CAE的重要组成部分，主要涉及汽车碰撞和车身覆盖件成形等工程问题分析，在力学上涉及到材料非线性、几何非线性和接触界面的边界非线性三类非线性问题，经常面临着数值计算量庞大，计算效率低的问题，因而实际应用中对并行计算的需求十分强烈。目前常见的有限元并行计算方法多采用区域分解等粗粒度并行策略，在以CPU为计算核心的网络计算机集群上运行，计算效率与计算机节点数直接相关，使用流程复杂且需要昂贵的硬件支持，因此这种并行计算方法的性价比不高。现代的图形处理器(GPU)是一种内部高度并行的众核处理器，浮点计算能力远高于同时期CPU的运算能力。可编程着色器的出现，使得GPU具有了通用处理器的特征，并开始应用于通用计算领域，为大数据处理和数值模拟研究带来了新思路和方法。最初的基于GPU的通用计算技术(GPGPU)采用Cg等高级着色语言编程，并已经应用于各类有限元计算，但是，由于这一时期的GPGPU技术只支持单精度计算，，数据传输效率也不高，导致有限元GPU并行计算的精度低且效率提升有限，工程应用局限性大。统一计算架构(CUDA)的出现，带来了高效、直观的GPU并行程序开发工具，基于CUDA架构的GPU并行计算方法具有计算硬件成本低，计算程序开发简单等特点。本文以工程应用需求为指导，采用CUDA架构研究高精度和高效率的显式有限元细粒度并行计算方法，以及全流程细粒度执行的并行接触算法，最终实现在普通个人计算机上进行汽车车身碰撞仿真和薄板冲压成形仿真两类大规模非线性有限元的快速并行计算。本文的主要工作和成果如下： (1)考虑到非线性显式有限元天然的可并行性以及GPU的轻量级线程执行模式，开发了具有自主知识产权的基于GPU的显式有限元计算平台(发明专利受理号：201210266435.1)。其主要特点在于：建立了线程与单元、线程与节点、线程与自由度三种层次的抽象映射方法，使显式有限元计算与GPU线程完美融合。同基于网格分区的粗粒度有限元并行策略相比，该细粒度并行策略没有任何前处理过程，在单块显卡也不存在边界数据处理问题，能够大幅度提升计算效率。因此，可以很方便的实现节点速度、位移计算等显式有限元绝大部分流程在GPU上的高效并行计算。 (2)针对单元计算中节点应力组装在GPU平台上难以并行化的技术瓶颈，提出了预索引并行应力组装策略，实现了BT四边形单元和EST三角形单元两种壳单元在GPU上的细粒度并行。提出了GPU上基于并行缩减算法的时间步长等单值并行求解方法。实现了显式有限元算法在GPU上的全过程计算，减少了GPU与CPU间数据交换的同时，使程序的计算效率达到最佳化。通过对板壳非线性问题计算表明，该算法的GPU并行计算结果与原串行算法在CPU中计算的结果完全一致，与同时期同价格的CPU相比，计算效率有明显的提升。在GTX580显卡上采用EST单元进行185万个自由度的弹塑性大变形问题求解时，可以达到近37倍的计算加速比。 (3)接触碰撞有限元分析中，接触算法需占用70%以上的计算时间，为此，本文提出了包含并行级域接触搜寻算法、并行防御节点接触力计算方法和并行罚函数接触力计算方法在内的全流程GPU执行的细粒度并行接触算法。级域算法是一种适用于复杂自接触问题的高效搜寻算法，其同一级内接触块的计算独立性也符合GPU细粒度计算的要求。本文提出了线程与接触块一一映射策略、GPU并行排序以及提升GPU线程计算粒度等技术手段，实现了测试对在GPU上的并行搜寻。在接触对搜寻阶段，本文提出了线程与测试对间的映射策略以实现同一级内接触对的并行搜寻，并采用计算后排序的策略进行上一级与下一级间的数据交换。在接触力计算阶段，本文采用线程与接触对间的映射策略给出了穿透量和接触力细粒度并行计算方法，并采用原子操作来实现接触力的离散。最后，基于自主开发的碰撞仿真软件DYSI3D开发了基于GPU的碰撞过程计算机仿真并行计算软件CPS-GPU(软件著作权编号：2011SR001966)。采用该软件在GTX580显卡上进行177万个自由度的白车身碰撞计算时，可以取得20倍左右的计算加速比。 (4)本文提出了完整的薄板冲压成形GPU并行计算方法。针对薄板冲压成形对材料流动模拟要求高的有限元计算特征，提出了包含复杂材料本构计算的单元GPU并行计算技术以及考虑摩擦的接触力GPU并行计算方法。本文提出了一体化接触搜寻算法在GPU上的计算策略：引入了计算机图形学中用于实时碰撞检测的广域搜寻方法来完成测试对搜寻，并在建立了相邻接触块信息的前提下，给出了接触后搜寻中接触对细粒度并行更新方法。在自主开发的薄板成形仿真软件CADEMII软件的基础上，开发了基于GPU的板料成形并行计算软件CADEM-GPU(软件著作权编号：2010SR052426)，并加入异步数据输出模式以及基于OpenGL的实时显示技术，进一步提高了软件的计算效率和实用性。数值算例表明，该软件具有较高的计算精度和计算效率，在GTX460显卡上，对于数万网格数的仿真模型，可以取得20倍以上的加速比，有效缩短了仿真计算时间。
[Abstract]:The finite element calculation of vehicle body structure contact collision course is an important part of automobile CAE . It mainly deals with the problems of automobile collision and body cover forming .

The modern graphics processor ( GPU ) is an internal highly parallel core processor . The floating point computing power is much higher than that of CPU in the same period . The GPU has the characteristics of general processor , and it has been applied in the field of general calculation . The initial GPU - based general - purpose computing technology ( GPGPU ) has been applied to all kinds of finite element calculations . The original GPU - based general - purpose computing technology ( GPGPU ) has been applied to various finite element calculations .

Based on the requirement of engineering application , this paper studies the explicit finite element fine - grained parallel computing method with high precision and high efficiency by using the method of the parallel computing with high precision and high efficiency , and the parallel contact algorithm of the full - flow fine - granularity execution , and finally realizes the fast parallel computation of two kinds of large - scale nonlinear finite elements on the ordinary personal computer . The main work and the results are as follows :

( 1 ) Considering the natural parallelism of the nonlinear explicit finite element and the lightweight thread execution mode of the GPU , an explicit finite element computing platform based on GPU with independent intellectual property is developed ( patent application number : 201210266435 . 1 ) . Compared with the coarse - grained finite element parallel strategy based on the grid partition , the fine - granularity parallel strategy does not have any pretreatment process , and the computation efficiency can be greatly improved compared with the coarse - grained finite element parallel strategy based on the grid partition .

( 2 ) Aiming at the technical bottleneck that the node stress assembly is difficult to parallelize on the GPU platform in the unit calculation , a pre - index parallel stress assembly strategy is put forward to realize the fine granularity parallelism of the two shell elements of the BT quadrangle unit and the EST triangular unit on the GPU .

( 3 ) In the finite element analysis of the contact collision , the contact algorithm takes more than 70 % of the computation time . In this paper , a parallel search algorithm is proposed which includes parallel level domain contact searching algorithm , parallel defense node contact force calculation method and parallel penalty function contact force calculation method . When using the software to perform 177 million degrees of freedom of white - body collision calculation on the GTK 80 video card , it is possible to obtain a calculation acceleration ratio of about 20 times .

( 4 ) In this paper , the parallel computing method of die - forming GPU is presented in this paper . In this paper , the parallel computing technology of unit GPU with complex material constitutive calculation and the parallel computing method of contact force GPU are presented .

【学位授予单位】：湖南大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：U467.14

【参考文献】