卷积神经网络的并行化研究

发布时间：2018-04-02 06:11

本文选题：卷积神经网络　切入点：并行化　出处：《郑州大学》2013年硕士论文

【摘要】：卷积神经网络(Convolutional Neural NetWorks, CNN);算法能够有效地从原始输入中学习到高阶不变性的特征,广泛应用于车牌检测、人脸检测、手势识别、语音识别、图像复原和语义分析等领域。目前,CNN算法主要以串行方式实现。串行实现的CNN算法存在两个主要缺陷：(1)不能发挥算法内在的并行性,导致训练过程需要较长时间；(2)伸缩性不强,使得不能高效地处理数据密集型问题。谷歌提出的并行编程框架MapReduce具有良好的扩展性和容错性,成为当前云计算平台并行处理大规模数据的主流技术。本文使用MapReduce并行化CNN,并采用GPU加速计算过程,以增强算法的并行性和伸缩性,取得的成果如下： 1.提出利用MapReduce并行化训练卷积神经网络的方法(CNN-MR),并部署到Hadoop云计算平台。CNN-MR采用数据并行的分解方法,将训练样本划分给平台中的每个计算节点。并使用批量更新的方式,在所有计算节点处理本地训练样本结束之后,节点之间做一次通信,得到可训练参数在整个训练集上的全局梯度改变量并更新网络,多次迭代,至网络收敛到设定阈值或最大迭代次数,算法结束。 2.提出利用GPU加速CNN-MR算法的方法(CNN-MR-G),并部署到G-Hadoop计算平台。将CNN每层的特征图、神经元或权值分别映射到GPU的线程块、线程,使得同层神经元可并行地计算输出结果、输出误差或权值的局部梯度改变量。在手体字数据集(MNIST)和自建车牌数据集上的实验表明,CNN-MR算法相对于串行算法具有较好的加速比和伸缩性,CNN-MR-G算法对CNN-MR算法有较好的加速效果。
[Abstract]:Convolutional Neural Networks (CNN); the algorithm can effectively learn high-order invariance features from the original input, and is widely used in license plate detection, face detection, gesture recognition, speech recognition, and so on. Image restoration and semantic analysis. At present, the CNN algorithm is mainly implemented in serial mode. The serial CNN algorithm has two main defects: 1) it can not give full play to the inherent parallelism of the algorithm, which leads to the low scalability of the training process. Makes it impossible to deal with data-intensive problems efficiently. MapReduce, a parallel programming framework proposed by Google, has become the mainstream technology for parallel processing of large scale data in cloud computing platforms due to its good scalability and fault-tolerance. This paper uses MapReduce to parallelize MapReduce and GPU to speed up the computing process. To enhance the parallelism and scalability of the algorithm, the results are as follows:. 1. A method of training convolutional neural network by using MapReduce parallelism is proposed, and deployed to Hadoop cloud computing platform. CNN-MR uses data parallel decomposition method to divide the training samples into each computing node in the platform. After all the computing nodes process the local training samples, the nodes communicate once, get the global gradient change of the trainable parameters on the whole training set, update the network, and iterate many times. To the network convergence to set a threshold or the maximum number of iterations, the algorithm ends. 2. The method of using GPU to accelerate CNN-MR algorithm is put forward and deployed to the G-Hadoop computing platform. The characteristic graph, neuron or weight of each CNN layer are mapped to the thread block and thread of GPU, respectively, so that the same layer neuron can calculate the output results in parallel. The local gradient change of the output error or weight. Experiments on the handwritten character data set (MNIST) and the self-built license plate dataset show that the CNN-MR algorithm has a better speedup and scalability than the serial algorithm in accelerating the CNN-MR algorithm.
【学位授予单位】：郑州大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP183;TP338.6

【参考文献】