SAR目标识别方法的GPU并行实现与优化

发布时间：2018-04-01 18:32

本文选题：GPU　切入点：Jacobi　出处：《电子科技大学》2017年硕士论文

【摘要】：SAR目标识别方法已经成为近年来的研究热点,其研究成果被广泛应用于军事和民用领域。随着高分辨SAR成像技术的发展,SAR图像的分辨率和数据量均迅速增加,基于CPU串行计算的目标识别算法已经不能达到高分辨SAR目标识别软件实时处理数据的要求,且计算代价过高。而近些年出现的GPU(Graphic Process Unit)通用计算可以提供强大的计算能力和存储带宽,此外其具有开发成本低、周期短等优点。因此,基于GPU的并行目标识别算法的研究,对实时处理数据的目标识别软件系统的研究和建立具有重要推动作用。本文首先讨论了GPU的体系结构以及CUDA编程模型,并将目标识别算法分为特征提取部分和分类器部分,然后详细描述了如何将各部分的具体计算任务进行并行分解,以及如何通过CUDA并行编程实现各个计算任务,最终对CUDA程序进行一系列优化处理,争取实现算法的加速最大化。具体的工作安排如下:(1)分析了CUDA的编程模型、存储模型以及编程语言,然后研究主成分分析、非负矩阵分解和线性判别分析这三种比较成熟的特征提取技术和支持向量机这种分类方法的基础原理和实现方法,为后文目标识别算法并行分析提供理论依据和技术基础。(2)研究特征提取方法和分类器的计算任务,将计算过程拆分并做并行改进。分别对三种特征提取方法中的矩阵乘法、Jacobi迭代法求矩阵特征值、归约法、类间和类内散度矩阵构造等计算任务进行并行分析和GPU并行改进。然后分析SMO算法的计算过程和并行性,实现SVM在CUDA上的并行移植。最终,以MSTAR公开数据库为基础,通过实验得到目标识别算法在CPU端和GPU端的运行时间,并作对比分析,以证明GPU并行计算对目标识别算法的加速效果。(3)结合CUDA程序的通用评估方式和优化策略,深入分析了目标识别算法中影响CUDA程序运行速度的原因,实现了从通信、访存和指令流三个方面对算法进行优化处理。并通过实验表明基于GPU并行实现的目标识别算法经过优化获得了25-30倍的性能提升。
[Abstract]:The method of SAR target recognition has become a hot topic in recent years, and its research results have been widely used in military and civilian fields. With the development of high-resolution SAR imaging technology, the resolution and data volume of SAR images are increasing rapidly. The target recognition algorithm based on CPU serial computation can not meet the requirement of real-time data processing of high-resolution SAR target recognition software. In recent years, GPU(Graphic Process Unit can provide powerful computing power and storage bandwidth, besides, it has the advantages of low development cost, short period and so on. Therefore, parallel target recognition algorithm based on GPU is studied. This paper first discusses the architecture of GPU and the CUDA programming model, and divides the target recognition algorithm into feature extraction part and classifier part. Then it describes in detail how to decompose each part of the specific computing tasks in parallel, and how to realize each computing task by CUDA parallel programming, and finally carries on a series of optimization processing to the CUDA program. This paper analyzes the programming model, storage model and programming language of CUDA, and then studies principal component analysis. Non-negative matrix decomposition and linear discriminant analysis (LDA) are the three mature feature extraction techniques and the basic principles and implementation methods of support vector machine (SVM) classification. It provides theoretical and technical basis for parallel analysis of target recognition algorithm. The computation process is split and improved in parallel. The matrix eigenvalues are obtained by the matrix multiplication Jacobi iteration method, and the matrix eigenvalues are obtained by the reduction method, the matrix multiplication method and the Jacobi iteration method are used to calculate the eigenvalues of the matrix respectively. The parallel analysis and GPU parallel improvement are carried out by constructing inter-class and intra-class divergence matrix, and then the computation process and parallelism of SMO algorithm are analyzed to realize the parallel transplantation of SVM on CUDA. Finally, based on MSTAR open database, the parallel migration of SVM is realized. Through experiments, the running time of target recognition algorithm on CPU and GPU is obtained, and a comparative analysis is made to prove that the acceleration effect of GPU parallel computation to target recognition algorithm...) combined with the general evaluation method and optimization strategy of CUDA program. In this paper, the reasons that affect the speed of CUDA program in target recognition algorithm are analyzed, and the communication is realized. The algorithm is optimized from memory access and instruction stream, and the experiment results show that the target recognition algorithm based on GPU can achieve 25-30 times better performance after optimization.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN958

【参考文献】