异构地震资料处理集群的偏移效率研究

发布时间：2019-02-24 17:21

【摘要】：基于波动方程的叠前深度偏移能够对复杂地质区块实现高质量的偏移成像，是寻找油气的重要手段。但叠前深度偏移数据量极大，对计算需求极高，限制了其实际应用。CPU-GPU异构集群在性能、功耗、造价、散热等方面有着巨大优势，为叠前深度偏移的普及带来了契机。但是，CPU-GPU异构在系统组成、体系结构、编程模型等方面与一致、串行、简洁的传统CPU模型有很大不同，高效利用异构计算资源面临着许多问题与挑战。本文首先对非一致访问和总线竞争所带的影响进行了定性分析和定量测试，结果表明不合理的数据通路和总线竞争与饱和会对通信性能带来显著影响，可能成为I/O访问频繁的偏移处理的瓶颈。随后讨论了几种避免瓶颈的策略，并结合偏移处理中常用的数值计算方法进行了实验，，优化后的应用在性能和稳定性方面得到了改善。为充分挖掘GPU的计算潜力，本文对CUDA模型进行了剖析，并认为多线程SIMD处理器的视角更有助于把握GPU本质与开发高效的应用。针对Fermi架构，通过微基准测试探测了部分微体系结构特性，为深度性能优化提供支撑。考虑到快速傅里叶变换在偏移处理中的广泛应用，本文随后基于Fermi微体系结构，对已经优化的GPU快速傅里叶变换例程进行深入分析，通过数据预取和指令调整，提高了指令级并行，虽然线程规模有所下降，但性能仍改进了12%。针对SIMD分支分歧会导致性能显著下降的问题，本文提出了“聚合”与“提取”这两种软件级的优化策略。测试结果表明，对合适的分支，“聚合”能够提高每步SIMD执行有效结果的比重，“提取”能够降低SIMD分歧长度，使性能得到改善。最后，由实际偏移处理测试结果可以知道，合理的数据通路规划带来的加速效果最为显著，对热点GPU内核的深入优化同样可以带来一定的改进，而SIMD分支优化对偏移提速的贡献相对较小。
[Abstract]:Pre-stack depth migration based on wave equation can achieve high quality migration imaging of complex geological blocks, which is an important means to find oil and gas. However, CPU-GPU heterogeneous cluster has great advantages in performance, power consumption, cost, heat dissipation and so on, which brings an opportunity for the popularization of prestack depth migration. However, CPU-GPU isomerism is very different from the traditional CPU model in system composition, architecture, programming model and so on. The efficient use of heterogeneous computing resources is faced with many problems and challenges. In this paper, the effects of non-uniform access and bus competition are qualitatively analyzed and quantitatively tested. The results show that unreasonable data paths and bus competition and saturation will have a significant impact on communication performance. It may be the bottleneck of I / O frequent offset processing. Then, several strategies to avoid bottleneck are discussed, and the experiments are carried out by combining the numerical calculation methods commonly used in migration processing. The performance and stability of the optimized application are improved. In order to fully exploit the computing potential of GPU, this paper analyzes the CUDA model, and thinks that the view of multithreaded SIMD processor is more helpful to grasp the essence of GPU and develop efficient applications. For the Fermi architecture, some characteristics of the microarchitecture are detected by microbenchmark, which provides the support for the depth performance optimization. Considering the wide application of fast Fourier transform in migration processing, based on the Fermi microarchitecture, the optimized GPU fast Fourier transform routine is analyzed in depth, and the data prefetching and instruction adjusting are used. Improved instruction-level parallelism, although thread size has declined, but the performance is still improved 12. Aiming at the problem that branch bifurcation of SIMD can result in a significant degradation of performance, this paper proposes two software level optimization strategies, "aggregation" and "extraction". The test results show that "aggregation" can increase the proportion of effective results for each step of SIMD execution, and "extract" can reduce the bifurcation length of SIMD and improve the performance. Finally, from the test results of actual migration processing, we can know that the acceleration effect brought by reasonable data path planning is the most remarkable, and the deep optimization of the hot GPU kernel can also bring some improvement. The contribution of SIMD branch optimization to migration speed increase is relatively small.
【学位授予单位】：中国石油大学（华东）
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：P631.44;TP332

【参考文献】