基于CUDA平台的机器学习算法GPU并行化的研究与实现

发布时间：2018-12-21 15:45

【摘要】：目前机器学习的主要任务是对大批量用户数据进行学习分析,帮助管理者做决策。由于目前数据的维度以及样本数很大,导致CPU串行处理耗时过多。而另一方面,GPU(Graphics Processing Unit)快速发展,拥有强大的并行处理能力。由于GPU高效低价且天然并行,科研人员开始利用GPU做通用计算。CUDA(Compute Unified Device Architecture)是NVIDIA推出的用于发挥NVIDIA GPU通用计算能力的编程环境。采用CUDA编程模型,可以简单有效的使用GPU对机器学习相关算法进行并行化设计与实现。本文主要研究基础机器学习算法GPU并行化的可行性和实现方法,希望从中寻找出一种从CPU平台到CUDA平台的通用移植方案。主要工作包括:针对分类机器学习算法,以KNN和决策树算法为例,先是分析原有算法的性能消耗模块,接着对主要性能消耗模块进行CUDA加速,最终设计出了KNN和决策树算法适合CUDA的并行化方案,并选取KNN算法进行了实验,对比分析并行化前后的差异。最后总结了分类机器学习算法基于CUDA并行化的方案。针对聚类机器学习算法,以k-means和DBScan为例,先是分析原有算法的性能消耗模块,接着对主要性能消耗模块进行CUDA加速,最终设计出了k-means和DBScan算法适合CUDA的并行化方案,并选取k-means算法进行了实验,对比分析并行化前后的差异。最后总结了聚类机器学习算法基于CUDA并行化的方案。本文最后将基于CUDA的机器学习并行化方案成功应用到实际的工程中。
[Abstract]:At present, the main task of machine learning is to analyze mass user data and help managers make decisions. Because of the large data dimension and sample number, CPU serial processing takes too much time. On the other hand, GPU (Graphics Processing Unit) has developed rapidly and has powerful parallel processing ability. Due to the high efficiency and low cost and natural parallelism of GPU, researchers began to use GPU to do general computing. CUDA (Compute Unified Device Architecture), which is a programming environment developed by NVIDIA to give full play to the general computing ability of NVIDIA GPU. By using CUDA programming model, the parallel design and implementation of machine learning algorithms can be implemented simply and effectively by using GPU. This paper mainly studies the feasibility and implementation method of parallelization of basic machine learning algorithm (GPU), hoping to find a general transplanting scheme from CPU platform to CUDA platform. The main work includes: aiming at the classification machine learning algorithm, taking KNN and decision tree algorithm as examples, firstly analyzing the performance consumption module of the original algorithm, then accelerating the main performance consumption module with CUDA. Finally, the parallelization scheme of KNN and decision tree algorithm suitable for CUDA is designed, and the experiment of KNN algorithm is carried out, and the differences before and after parallelization are compared and analyzed. Finally, the parallel scheme of classifying machine learning algorithm based on CUDA is summarized. For clustering machine learning algorithm, taking k-means and DBScan as examples, the performance consumption module of the original algorithm is analyzed, and then the main performance consumption module is accelerated by CUDA. Finally, the parallelization scheme of k-means and DBScan algorithm suitable for CUDA is designed. The k-means algorithm is selected to experiment, and the differences before and after parallelization are compared and analyzed. Finally, the scheme of clustering machine learning algorithm based on CUDA parallelization is summarized. Finally, the machine learning parallelization scheme based on CUDA is successfully applied to practical engineering.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181

【参考文献】