基于Intel Xeon Phi的稀疏矩阵向量乘性能优化

发布时间：2019-04-21 19:35

【摘要】：稀疏矩阵向量乘(Sp MV)是线性求解系统等科学计算中重要的计算核心.鉴于传统的稀疏矩阵向量乘算法在Intel Xeon Phi众核集成架构上存在SIM D利用率低,不规则访存开销高及负载不均衡的问题,难以发挥其运算能力.本文针对Intel Xeon Phi的体系结构特点,提出了一种通用的分块压缩存储表示的稀疏矩阵向量乘并行算法:(1)在ELLPACK存储格式基础上按列分块及压缩矩阵,增加非零元的密度,提高SIMD利用率;(2)通过精心的数据重排,保留矩阵非零元本身的局部性,从而提高数据重用率,降低访存开销;(3)将矩阵压缩后划分成近似等大的矩阵块并静态等量分配到不同核上计算,使各核负载均衡.实验结果表明,与Intel Xeon Phi上已有的MKL数学库中的CSR算法相比,本算法获得了更高的计算访存比,性能比M KL的CSR算法平均快2.05倍.
[Abstract]:Sparse matrix vector multiplication (Sp MV) is an important core of scientific computation such as linear solution system. Because the traditional sparse matrix vector multiplication algorithm has the problems of low utilization of SIM D, high overhead of irregular memory access and unbalanced load in the Intel Xeon Phi multikernel integration architecture, it is difficult to give full play to its computing power. According to the characteristics of Intel Xeon Phi architecture, this paper proposes a general sparse matrix vector multiplication algorithm based on block compression storage: (1) based on the ELLPACK storage format, the sparse matrix vector multiplication algorithm is proposed to increase the density of non-zero elements by column block and compression matrix. Improve the utilization rate of SIMD; (2) by meticulous data rearrangement, the locality of non-zero elements of the matrix is retained, so as to improve the data reuse rate and reduce the memory access overhead; (3) the compressed matrix is divided into approximately equal-size matrix blocks and distributed to different cores in static and equal quantities, so that the load of each core can be balanced. The experimental results show that compared with the CSR algorithm in the MKL mathematical library on Intel Xeon Phi, the proposed algorithm achieves a higher memory-to-computation ratio, and its performance is 2.05 times faster than that of MKL's CSR algorithm on average.
【作者单位】：中国科学技术大学计算机科学与技术学院;
【基金】：国家"八六三"高技术研究发展计划项目(2012AA010901,2012AA010902)资助
【分类号】：TP332;O241.6

【相似文献】