面向GPU集群领域的关键算法研究和实现
本文选题:LU-SGS + GPU ; 参考:《杭州电子科技大学》2017年硕士论文
【摘要】:GPU已不仅仅只用于图形图像领域,近几年由于其架构特点以及浮点数计算能力的提升带动了整个数值计算领域的发展,特别是对易于并行处理的任务,计算时间可以得到十几倍甚至几十倍的提速。在许多数值计算量大的领域中,GPU发挥着显著的提速作用。在机器学习领域,逻辑回归算法由于其特征维度可能达到上亿,单机版的训练,甚至简单的并行处理也已经无法满足训练要求。最好的解决方法是将高维的特征向量拆分成若干小的向量进行求解。并行机器学习算法的产生改善了当前的局面,上千台甚至上万台机器并行训练,从而提高运行速度。另外,在能源领域,因为核能具有低碳、能量密度高、高持续等特点,具有其他领域无法替代的发展趋势。而堆芯燃料的管理是核电站关心的问题之一,也直接关系到核电经济效益和成本。一般大型的堆芯扩展方程阶数非常大,计算方程非常复杂且运算极其耗时,因此堆芯燃料管理流程中扩散方程的计算至关重要。本文的主要研究工作与贡献如下:(1)本文以反应堆扩散方程为背景,抽取出非结构化网格流场计算中LU-SGS迭代部分,对一维和二维网格块进行划分,将网格划分为多个域,每个域平均分配在GPU的线程块上,并采用CUDA和MPI编程技术在GPU集群和CPU上对LU-SGS算法进行并行迭代计算。实验表明,与串行程序相比,GPU极大的提高了程序的执行效率,验证了GPU在数值计算领域上发挥了极大的作用。(2)本文详细推导和分析了并行逻辑回归算法,并提出了使用Raft算法替代DHT算法的理论更改参数服务器的一致性要求,着重分析了GPU对参数服务器的重要影响。(3)本文LU-SGS迭代法和机器学习领域中的并行逻辑回归算法,它们都属于并行度不高的算法。由于GPU底层block之间线程无法通信、共享内存也不能相互访问,因此并不能通过“碰撞”交换数据的方法进行迭代计算,本文提出了一种“延迟”迭代的方法,每次迭代少向前或向后迭代一步。该方法可以让LU-SGS算法执行时间减少20%左右。
[Abstract]:GPU has not only been used in the field of graphics and images. In recent years, the development of the whole numerical computing field has been driven by its architectural characteristics and the improvement of the ability of floating-point computing, especially for the tasks that are easy to process in parallel. The computing time can be 10 times or even tens of times faster. GPU plays a significant role in accelerating speed in many fields with large numerical computation. In the field of machine learning, the logical regression algorithm can reach hundreds of millions because of its characteristic dimension. The single machine version of training, even simple parallel processing, can no longer meet the training requirements. The best solution is to divide the high-dimensional eigenvector into several small vectors to solve. The generation of parallel machine learning algorithm improves the current situation, thousands or even tens of thousands of parallel training machines, thus improving the speed of operation. In addition, in the field of energy, nuclear energy has the characteristics of low carbon, high energy density, high persistence, and has the development trend that can not be replaced by other fields. Core fuel management is one of the most important issues in nuclear power plants, and it is also directly related to the economic benefits and costs of nuclear power. Generally, the order of large core expansion equation is very large, the calculation equation is very complex and time-consuming, so the calculation of diffusion equation in core fuel management process is very important. The main work and contributions of this paper are as follows: (1) based on the reactor diffusion equation, the LU-SGS iterative part of the unstructured grid flow field is extracted, and the one-dimensional and two-dimensional grid blocks are divided into several domains. Each domain is distributed equally on the thread block of GPU, and the parallel iterative computation of LU-SGS algorithm is carried out on GPU cluster and CPU using CUDA and MPI programming techniques. Experimental results show that GPU greatly improves the efficiency of program execution compared with serial program, and verifies that GPU plays a great role in numerical computation. (2) parallel logic regression algorithm is deduced and analyzed in detail in this paper. The consistency requirement of parameter server is changed by using Raft algorithm instead of DHT algorithm, and the important influence of GPU on parameter server is analyzed. (3) LU-SGS iterative method and parallel logic regression algorithm in machine learning field are discussed in this paper. They all belong to algorithms with low degree of parallelism. Because the threads in the underlying block can not communicate and the shared memory can not be accessed to each other, it is not possible to iterate through the method of "collision" exchanging data. In this paper, a method of "delayed" iteration is proposed. Each iteration takes one step forward or backward. This method can reduce the execution time of LU-SGS algorithm by about 20%.
【学位授予单位】:杭州电子科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP181
【参考文献】
相关期刊论文 前10条
1 ;钍基熔盐堆核能系统[J];中国科学院院刊;2016年S1期
2 郝海兵;张强;杨永;梁益华;;基于LU-SGS迭代的DGM隐式方法研究[J];西北工业大学学报;2014年03期
3 赵信文;杨永;张强;;预估-校正LU-SGS的隐式算法[J];航空计算技术;2012年04期
4 周宇;钱炜祺;邓有奇;马明生;;k-ω SST两方程湍流模型中参数影响的初步分析[J];空气动力学学报;2010年02期
5 杨彬;汪德r;;非结构网格上浅水方程的LU-SGS隐式算法[J];河海大学学报(自然科学版);2008年04期
6 赵松原,黄明恪;非结构网格中LU-SGS隐式算法的非平衡性影响[J];空气动力学学报;2004年04期
7 张来平,王志坚,张涵信;动态混合网格生成及隐式非定常计算方法[J];力学学报;2004年06期
8 许彦峰,孙汉旭;人工智能在机器人领域的开发应用[J];机电产品开发与创新;2004年01期
9 李冬生,章宗耀,谢仲生;压水堆核电厂堆芯燃料管理优化研究[J];核动力工程;1993年04期
10 汤健康;关于非对称逐次超松弛方法(USSOR)的误差界[J];高等学校计算数学学报;1987年02期
相关会议论文 前1条
1 周婷;郭文彬;张仕光;;雅可比迭代方法与AOR和GSOR迭代法的比较结果[A];数学·力学·物理学·高新技术交叉研究进展——2010(13)卷[C];2010年
,本文编号:2050211
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2050211.html