面向稀疏矩阵运算的异构并行算法研究
[Abstract]:Heterogeneous high-performance architectures have become more and more popular in the field of high-performance computing, which greatly promote the progress of large-scale computing in science and engineering. Sparse matrix computing is a very important operation in large-scale computing in science and engineering, but sparse matrix computing is a classic one. This is mainly because, on the one hand, the program behavior of sparse matrix operation is determined by the sparse structure of the matrix, and the sparse structure of the sparse matrix corresponding to different applications is different, so it is difficult to discover the locality of the sparse matrix, on the other hand, because of the sparse structure. Compressed storage of matrices leads to a large number of inter-address memory access and random memory access, so the system memory access bandwidth becomes the performance bottleneck of sparse matrix operation. In order to deal with these problems and challenges, this paper studies how to make full use of computing resources and memory access bandwidth in heterogeneous architectures to improve the performance of sparse matrix operation. The main work and innovations are as follows: (1) An efficient parallel optimization algorithm for sparse matrix vector multiplication algorithm based on heterogeneous architecture is proposed. A load-based sparse matrix vector multiplication algorithm with multiple parallel modes is proposed. The sparse matrix is partitioned based on the computing load to reduce the storage overhead caused by the change of the number of non-zero elements; different thread mapping methods are selected according to the computing load of each row to achieve load balance between threads; a load allocation strategy based on warp is designed to reduce the synchronization overhead between threads. Matrix vector multiplication algorithm can effectively improve the performance of the algorithm. (2) An efficient parallel optimization algorithm for sparse matrix transpose vector multiplication is proposed for heterogeneous architecture. A sparse matrix transpose vector multiplication algorithm without matrix transposition is proposed. The row compression scheme is used to reduce the preprocessing overhead, and the outer product calculation method is used to avoid the operation of sparse matrix transposition. The multi-channel parallel merge sort is used to eliminate the atom operation. The experimental results show that the proposed sparse matrix transposition vector multiplication algorithm can achieve effective performance improvement. (3) A heterogeneous architecture oriented sparse architecture is proposed. An efficient parallel optimization algorithm for sparse matrix multiplication is proposed in this paper.A heterogeneous cooperative sparse matrix multiplication algorithm is proposed.The upper bound allocation and the successive increase of the mixed space allocation strategy are used to improve the utilization of the storage space resources.The intermediate results are sorted by multi-parallel mergers and parallel forward scanning operations. The results show that the sparse matrix multiplication proposed in this paper can improve the utilization ratio of computing resources. (4) An efficient parallel algorithm for solving sparse trigonometric equations based on heterogeneous architecture platform is proposed. A parallel algorithm for solving sparse trigonometric equations based on hierarchical parallelism is proposed. A hybrid parallel mode of waterline parallel mode is proposed. The task layer with higher parallelism is processed by group parallel mode and the task layer with lower parallelism is processed by pipeline parallel mode. The parallel algorithm for solving sparse trigonometric equations is constructed. The experimental results show that the heterogeneous parallel algorithm proposed in this paper can effectively mine the computational performance of CPU and GPU, and improve the computational efficiency in heterogeneous architecture.
【学位授予单位】:国防科学技术大学
【学位级别】:博士
【学位授予年份】:2015
【分类号】:O241.6
【相似文献】
相关期刊论文 前10条
1 张兴令;关于高阶稀疏矩阵求解的一点注记[J];甘肃工业大学学报;1988年02期
2 张奠成 ,姚栋义;电子电路机助分析和设计中的稀疏矩阵技术[J];合肥工业大学学报;1981年02期
3 匡云太;一个缩减非对称稀疏矩阵的带宽和外形的算法[J];同济大学学报;1987年03期
4 于继业;稀疏矩阵块对角化的一种方法[J];数学的实践与认识;1988年03期
5 黄东泉;有向图在结构不对称稀疏矩阵重排序中的应用[J];西安交通大学学报;1982年06期
6 陆黎明;陈海强;朱鸿鹗;;稀疏矩阵技术在网络分析中的应用[J];上海师范学院学报(自然科学版);1984年03期
7 郑志镇,李尚健,李志刚;稀疏矩阵带宽减小的一种算法[J];华中理工大学学报;1998年12期
8 秦体恒;李学相;安学庆;;稀疏矩阵存储算法的探讨[J];河南机电高等专科学校学报;2008年01期
9 周永法;稀疏矩阵的并行算法[J];北京航空学院学报;1982年04期
10 郑金华;稀疏矩阵的存储结构和乘法运算[J];湘潭大学自然科学学报;1994年02期
相关会议论文 前3条
1 宋琦;陈璞;;稀疏求解—结构修改的一种新的可能性[A];北京力学会第20届学术年会论文集[C];2014年
2 徐道远;王宝庭;王向东;冯伯林;;求解大型稀疏矩阵的ICCG法[A];第八届全国结构工程学术会议论文集(第Ⅰ卷)[C];1999年
3 苑维然;陈璞;刘凯欣;;非对称线性方程组的快速外存解法[A];中国力学学会学术大会'2005论文摘要集(下)[C];2005年
相关博士学位论文 前1条
1 郭松;面向稀疏矩阵运算的异构并行算法研究[D];国防科学技术大学;2015年
相关硕士学位论文 前10条
1 刘健;基于稀疏矩阵分解的特征基因识别方法研究[D];曲阜师范大学;2015年
2 庄立;稀疏矩阵向量乘及自动调优[D];杭州电子科技大学;2011年
3 王冬;面向差异特征识别的稀疏矩阵分解方法的研究[D];曲阜师范大学;2016年
4 冯广祥;大型稀疏矩阵直接求解算法的研究及实现[D];东北大学;2010年
5 丁玲;低秩与稀疏矩阵恢复问题的若干研究[D];浙江大学;2012年
6 吴超凡;基于UB树的大型稀疏矩阵存储研究[D];云南大学;2013年
7 王亚南;基于FPGA的稀疏矩阵分解实现[D];西安电子科技大学;2009年
8 赵加强;基于OpenCL的稀疏矩阵向量乘优化[D];吉林大学;2012年
9 施浩;基于FPGA的稀疏矩阵向量乘的优化研究与实现[D];南京邮电大学;2011年
10 胡耀国;基于GPU的有限元方法研究[D];华中科技大学;2011年
,本文编号:2179981
本文链接:https://www.wllwen.com/kejilunwen/yysx/2179981.html