面向众核体系结构的宽度优先搜索算法研究
发布时间:2018-09-15 19:42
【摘要】:宽度优先搜索(Breadth-first Search,简称BFS)是一种基础的图形算法,是众多算法的核心组件,在大量领域中得到广泛应用,如网络安全、医学信息、数据挖掘、社交网络、语义网等等。宽度优先搜索算法还是一种典型的数据密集型应用,Graph 500即使用其对超级计算机处理数据密集型应用的能力进行排名。近年来,得益于较低的功耗和较高的性价比,众核体系结构的加速器在高性能计算领域得到广泛应用。MIC(Many Integreated Core)作为最新的众核体系结构协处理器,相对于其它加速器来说,具有和传统并行编程模型兼容的优势。随着采用CPU+MIC的“天河二号”登上Top 500榜首,MIC在高性能计算领域得到广泛重视。宽度优先搜索算法的并行实现具有数据竞争较严重、问题不规则和访存局部性差等特点,而MIC具有大量线程和宽向量处理能力且每线程的平均缓存较小,因而要充分发挥MIC硬件的优势高效实现宽度优先搜索,将面临线程间竞争开销较大,向量部件利用率不高,缓存利用率差等问题。本文即面向这些问题,研究利用MIC高效实现宽度优先搜索算法,主要取得了如下成果:1)设计并实现了自上向下和自下向上相结合的混合BFS算法。该算法根据图形的特点,在不同的搜索层使用不同的搜索策略,能够结合两种搜索策略的优势,性能分别为自上向下和自下向上策略的3.21和2.15倍。2)提出了一种面向MIC优化的多线程BFS算法。该算法以混合BFS算法为基础,通过减少数据竞争,消除原子操作,并采用动静相结合的线程调度方式,能够很好地利用MIC提供的大量线程处理能力。3)提出了一种使用宽向量部件进一步加速BFS的方法。该方法在自上向下搜索部分采用SIMD指令同时扫描顶点的邻居,在自下向上搜索部分采用SIMD指令并行查找未访问的顶点,能够进一步加速BFS算法,最高加速比达到1.85。4)设计并实现了CPU和MIC协同计算的异构混合BFS算法。该算法以混合BFS算法为基础,在搜索层中任务较多时采用CPU和MIC协同计算,通过比例可调的任务划分方法以及重叠计算的通信设计,解决了协同计算中任务不均衡和通信开销较大的问题,相对于CPU的加速比达到1.4倍左右。实验结果表明,本文在MIC中实现的BFS算法性能约为GPU的5.31倍;CPU+MIC异构混合算法的最高加速比达到1.46倍。
[Abstract]:Width first search (Breadth-first Search,) is a basic graphic algorithm, which is the core component of many algorithms. It is widely used in many fields, such as network security, medical information, data mining, social network, semantic web and so on. The width-first search algorithm is also a typical data-intensive application named Graph 500 even though it is used to rank the supercomputer's ability to process data-intensive applications. In recent years, due to the low power consumption and high performance-price ratio, the multi-core accelerator has been widely used in the field of high-performance computing, as the latest multi-core architecture coprocessor, compared with other accelerators. It has the advantage of compatibility with traditional parallel programming model. With the use of CPU MIC "Tianhe 2" to the top of the Top 500 Mics in the field of high performance computing has received widespread attention. The parallel implementation of the width-first search algorithm is characterized by serious data competition, irregular problems and poor memory access locality, while MIC has a large number of threads and wide vector processing capabilities, and the average cache per thread is small. Therefore, in order to give full play to the advantages of MIC hardware and efficiently implement breadth-first search, we will face the problems of high competition overhead between threads, low utilization of vector components and poor cache utilization. In this paper, aiming at these problems, we study how to implement the breadth-first search algorithm efficiently by using MIC. The main achievements are as follows: 1) A hybrid BFS algorithm combining top-down and bottom-up is designed and implemented. According to the characteristics of graphics, the algorithm uses different search strategies in different search layers, which can combine the advantages of two search strategies. The performance is 3.21 and 2.15 times higher than that of top-down and bottom-up strategies, respectively. A multi-threaded BFS algorithm for MIC optimization is proposed. Based on the hybrid BFS algorithm, the algorithm reduces the data competition, eliminates the atomic operation, and adopts the thread scheduling method which combines dynamic and static. In this paper, we propose a method to speed up BFS further by using wide vector components, which can make good use of a large number of thread processing capabilities provided by MIC. In this method, SIMD instructions are used to scan vertex neighbors in the top-down search part, and SIMD instructions are used to parallel search the unvisited vertices in the bottom-up search part, which can further accelerate the BFS algorithm. The maximum speedup is 1.85.4). A heterogeneous hybrid BFS algorithm based on CPU and MIC is designed and implemented. The algorithm is based on the hybrid BFS algorithm. When there are more tasks in the search layer, CPU and MIC are used to work together. The method of task partitioning with adjustable scale and the communication design of overlapping computation are adopted. The problem of task imbalance and communication overhead in cooperative computing is solved, and the speedup ratio of CPU is about 1.4 times. The experimental results show that the performance of the BFS algorithm implemented in this paper is about 5.31 times of that of GPU, and the maximum speedup of the MIC heterogeneous hybrid algorithm is 1.46 times that of GPU.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP332;TP301.6
,
本文编号:2244312
[Abstract]:Width first search (Breadth-first Search,) is a basic graphic algorithm, which is the core component of many algorithms. It is widely used in many fields, such as network security, medical information, data mining, social network, semantic web and so on. The width-first search algorithm is also a typical data-intensive application named Graph 500 even though it is used to rank the supercomputer's ability to process data-intensive applications. In recent years, due to the low power consumption and high performance-price ratio, the multi-core accelerator has been widely used in the field of high-performance computing, as the latest multi-core architecture coprocessor, compared with other accelerators. It has the advantage of compatibility with traditional parallel programming model. With the use of CPU MIC "Tianhe 2" to the top of the Top 500 Mics in the field of high performance computing has received widespread attention. The parallel implementation of the width-first search algorithm is characterized by serious data competition, irregular problems and poor memory access locality, while MIC has a large number of threads and wide vector processing capabilities, and the average cache per thread is small. Therefore, in order to give full play to the advantages of MIC hardware and efficiently implement breadth-first search, we will face the problems of high competition overhead between threads, low utilization of vector components and poor cache utilization. In this paper, aiming at these problems, we study how to implement the breadth-first search algorithm efficiently by using MIC. The main achievements are as follows: 1) A hybrid BFS algorithm combining top-down and bottom-up is designed and implemented. According to the characteristics of graphics, the algorithm uses different search strategies in different search layers, which can combine the advantages of two search strategies. The performance is 3.21 and 2.15 times higher than that of top-down and bottom-up strategies, respectively. A multi-threaded BFS algorithm for MIC optimization is proposed. Based on the hybrid BFS algorithm, the algorithm reduces the data competition, eliminates the atomic operation, and adopts the thread scheduling method which combines dynamic and static. In this paper, we propose a method to speed up BFS further by using wide vector components, which can make good use of a large number of thread processing capabilities provided by MIC. In this method, SIMD instructions are used to scan vertex neighbors in the top-down search part, and SIMD instructions are used to parallel search the unvisited vertices in the bottom-up search part, which can further accelerate the BFS algorithm. The maximum speedup is 1.85.4). A heterogeneous hybrid BFS algorithm based on CPU and MIC is designed and implemented. The algorithm is based on the hybrid BFS algorithm. When there are more tasks in the search layer, CPU and MIC are used to work together. The method of task partitioning with adjustable scale and the communication design of overlapping computation are adopted. The problem of task imbalance and communication overhead in cooperative computing is solved, and the speedup ratio of CPU is about 1.4 times. The experimental results show that the performance of the BFS algorithm implemented in this paper is about 5.31 times of that of GPU, and the maximum speedup of the MIC heterogeneous hybrid algorithm is 1.46 times that of GPU.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP332;TP301.6
,
本文编号:2244312
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2244312.html