基于现代硬件的并行内存排序方法综述

发布时间：2018-05-14 23:39

本文选题：现代硬件处理器 + 排序算法　；参考：《计算机学报》2017年09期

【摘要】：研究了现代硬件上的并行内存排序方法,对其研究现状与进展进行了综述.首先简要阐述了经典排序算法以及排序网络的优缺点,分析其并行优化的适用性,然后从现代CPU处理器设备(多核、配备大内存)、图形处理器(GPU)、现场可编程逻辑门阵列(FPGA)等新型处理器设备介绍现有排序方法的研究成果.处理器设备的架构不同,对排序算法的优化策略也不同,现代CPU主要利用线程的本地存储层次优化数据在存储单元中的排列,以减少访存次数及减少访存缺失,同时利用单指令多数据流技术(SIMD),以提高算法的数据级并行度;GPU则需要将多个线程组织成线程块,依靠共享内存提高线程块的访存速度,而在线程块内则使用单指令多线程(SIMT)技术提高线程的执行效率;FPGA则更靠近于硬件底层,受到自身的资源限制,FPGA的优化策略主要依靠硬件描述语言或高级综合语言优化电路的设计,提高资源利用率的同时增加FPGA的吞吐量.现有的成果表明,GPU的并行内存排序性能优于CPU端上的并行内存排序性能.作者最后对未来的研究方向进行了展望.
[Abstract]:In this paper, parallel memory sorting methods on modern hardware are studied, and their research status and progress are summarized. In this paper, the advantages and disadvantages of classical sorting algorithm and sorting network are briefly described, and the applicability of parallel optimization is analyzed. New processor devices, such as large memory, GPU, FPGA and so on, introduce the research results of existing sorting methods. The architecture of processor device is different, and the optimization strategy of sorting algorithm is also different. Modern CPU mainly uses the local storage layer of thread to optimize the arrangement of data in memory cell, in order to reduce the number of memory access and memory access missing. In order to improve the data level parallelism of the algorithm, GPU needs to organize multiple threads into thread blocks and rely on shared memory to improve the memory access speed of thread blocks. In the thread block, the single instruction multithreading (SIMT) technique is used to improve the execution efficiency of the thread and FPGA is closer to the bottom layer of the hardware. The optimization strategy of FPGA, which is limited by its own resources, mainly depends on the design of hardware description language or advanced synthesis language to improve the resource utilization and increase the throughput of FPGA. The existing results show that the parallel memory sorting performance of GPUs is better than that of parallel memory sorting on CPU. Finally, the author looks forward to the future research direction.
【作者单位】：中国人民大学数据工程与知识工程国家教育部重点实验室;中国人民大学信息学院;
【基金】：国家自然科学基金(61532021,61272137,61202114) 华为创新研究计划(HIRP 20140507)资助~~
【分类号】：TP333
，

本文编号：1890010

资料下载

论文发表

支付宝下载

Download by Alipay
微信下载

Download by Wechat
会员下载

Download by Member

本文链接：https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1890010.html

上一篇：应用ECU在环仿真的EPS系统嵌入式软件测试研究
下一篇：基于连续缓存和二级缓存的DFTL改进算法

论文发表

·知网|万方|维普|龙源|省级|国家级|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|