基于GPU的近似字符串匹配并行算法的研究

发布时间：2018-10-14 14:46

【摘要】：图形处理器（GPU）具有很强的并行处理能力，并且计算成本低，利用GPU加速字符串操作已经成为了当前并行计算领域的研究热点。近似字符串匹配技术在病毒检测、文件检索、计算生物学等很多领域都有着广泛的应用，传统的串行算法运算速度慢，而现存的并行算法都是基于多处理器模式，计算成本很高。因此，在GPU上研究有效的近似字符串匹配并行算法具有重大的实际意义。本文的主要研究内容及贡献如下：首先，对GPU通用计算的编程环境进行了总结，重点研究了NVIDIA CUDA的工作原理，编程模型和存储器模型，以及如何配置CUDA编程环境。其次，对于允许k-mismatch的近似字符串匹配问题，基于CUDA模型，本文提出了三种并行算法，即线程级并行算法，，两级并行算法，以及两级并行优化算法。两级并行优化算法在利用GPU强大并行处理能力的同时，使得各线程负载均衡，并且利用GPU的存储器模型减少了每个线程对全局存储器中数据的访问次数。本文使用真实的DNA序列作为实验数据对算法的性能进行评估，实验结果表明，其中两级并行优化算法在GPU上的执行时间对于传统的CPU端串行算法加速比可达到40-80，加速比变化与理论分析一致，其它两个算法的加速比也可以达到10左右。最后，对于允许k-difference的近似字符串匹配问题，基于动态规划的方法，通过消除编辑距离矩阵中同一行数据间的依赖关系，本文提出了一种有效的并行算法。该算法使用很少的线程同步次数，并且具有很高的并行度。本文分别在GPU和多核CPU上对算法进行了实现，并对两种环境下对算法的加速比进行了分析。实验结果表明，本文的并行算法在GPU上的执行时间对于传统的CPU端串行算法加速比可达到7-12，在双核CPU上算法的加速比可以达到1.3，在24核CPU上的加速比可以达到8-13，并且加速比的变化与理论分析一致。
[Abstract]:Graphics processor (GPU) has strong parallel processing ability and low computing cost. Using GPU to speed up string operation has become a hot research topic in the field of parallel computing. Approximate string matching technology has been widely used in virus detection, file retrieval, computational biology and many other fields. The traditional serial algorithm is slow, while the existing parallel algorithms are based on multiprocessor mode. The calculation cost is very high. Therefore, it is of great practical significance to study the efficient parallel algorithm of approximate string matching on GPU. The main contents and contributions of this paper are as follows: firstly, the programming environment of GPU general computing is summarized, and the working principle, programming model and memory model of NVIDIA CUDA are emphatically studied, as well as how to configure CUDA programming environment. Secondly, for the approximate string matching problem that allows k-mismatch, this paper proposes three parallel algorithms based on CUDA model, namely, thread-level parallel algorithm, two-level parallel algorithm, and two-level parallel optimization algorithm. The two-level parallel optimization algorithm makes use of the powerful parallel processing ability of GPU to balance the load of each thread and reduces the number of times each thread accesses the data in the global memory by using the memory model of GPU. In this paper, real DNA sequences are used as experimental data to evaluate the performance of the algorithm. The experimental results show that, The execution time of the two-stage parallel optimization algorithm on GPU can reach 40-80 for the traditional CPU serial algorithm. The speedup variation is consistent with the theoretical analysis, and the speedup ratio of the other two algorithms can reach about 10. Finally, for the approximate string matching problem with k-difference, an efficient parallel algorithm is proposed by eliminating the dependency between the same row of data in the edit distance matrix based on dynamic programming. The algorithm uses a few thread synchronization times and has a high degree of parallelism. In this paper, the algorithm is implemented on GPU and multi-core CPU, and the speedup ratio of the algorithm is analyzed in the two environments. The experimental results show that, The execution time of the parallel algorithm on GPU can reach 7-12 for the traditional CPU serial algorithm, 1.3 for dual-core CPU and 8-13 for 24-core CPU. It is consistent with theoretical analysis.
【学位授予单位】：黑龙江大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP338.6

【共引文献】