基于CPU-GPU的条件随机场并行化研究

发布时间：2018-04-14 12:37

本文选题：条件随机场 + 协作并行模型　；参考：《华中科技大学》2013年硕士论文

【摘要】：条件随机场作为一种机器学习算法，广泛应用于词性标注、信息抽取、图像分割等领域。实际应用中，条件随机场能灵活地生成大量描述性特征来增强学习效果，但特征维度会轻易突破百万级别，使训练过程中需要消耗大量的时间来计算梯度和似然函数，从而在一定程度上限制了条件随机场解决实际问题的能力。现有的解决方案分别利用多核CPU或GPU对条件随机场并行化，然而受限于CPU或GPU本身架构特性，加速效果并不让人满意。针对上述问题，提出了一种利用CPU-GPU协同加速条件随机场模型的新方法。在CPU-GPU协作并行架构中，利用CPU处理内存复杂度高和分支判断较多的模型训练优化过程，，解决了传统GPU并行中因处理器内存有限和弱分支判断处理能力而导致的整体并行效果低的问题。在GPU并行化方面，针对密集型计算过程的不同特性，提出两级并行方法以获得最大化并行效果：状态矩阵中所有元素在独立计算层Node Level并行；而状态矩阵中所有可能路径的计算则基于序列依赖计算层Sentence Level并行。此外，针对GPU的内存访问特性，优化了数据的内存布局，减少不必要的内存数据通信，提高了并行效率。实验结果表明，基于CPU-GPU协同并行条件随机场的方法，相比于CPU的单线程处理方式，在模型准确率持平的情况下，训练过程加速可达到10倍以上，预测过程可达到15倍以上的加速；另一方面，相比于只使用GPU并行的方法，协作并行加速性能有效提升50%。
[Abstract]:As a machine learning algorithm, conditional random field is widely used in the fields of part of speech tagging, information extraction, image segmentation and so on.In practical application, conditional random fields can flexibly generate a large number of descriptive features to enhance the learning effect, but the feature dimension can easily break through millions of levels, which makes it take a lot of time to calculate the gradient and likelihood function in the process of training.To a certain extent, it limits the ability of conditional random field to solve practical problems.The existing solutions use multicore CPU or GPU to parallelize conditional random fields, but due to the architectural characteristics of CPU or GPU, the acceleration effect is not satisfactory.In order to solve the above problems, a new method using CPU-GPU cooperative acceleration condition random field model is proposed.In the CPU-GPU collaborative parallel architecture, the model training optimization process with high memory complexity and more branch judgment is processed by CPU.The problem of low overall parallel effect caused by limited memory and weak branch judgment processing ability in traditional GPU parallelism is solved.In the aspect of GPU parallelization, according to the different characteristics of intensive computing process, a two-level parallel method is proposed to maximize the parallel effect: all the elements in the state matrix are parallel in the independent computing layer Node Level;The computation of all possible paths in the state matrix is based on Sentence Level parallelism.In addition, according to the memory access characteristics of GPU, the memory layout of data is optimized, the unnecessary memory data communication is reduced, and the parallel efficiency is improved.The experimental results show that, compared with the single-thread processing method of CPU, the training process can be accelerated more than 10 times with the same accuracy of the model based on the CPU-GPU co-parallel conditional random field method.The prediction process can be accelerated by more than 15 times. On the other hand, compared with only using GPU parallelism, the performance of collaborative parallel acceleration can be effectively improved by 50%.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP181;TP332

【参考文献】