CUDA图形处理器对聚类算法的加速实现

发布时间：2018-07-27 19:03

【摘要】：近些年来,伴随着信息处理以及通信技术的快速发展,促进了许多领域输出以及获取数据能力的进一步提高,这给我们带来了海量有待处理的数据信息。如今,GB或是TB数量级的数据已经十分常见。因此,传统串行化的数据挖掘技术已无法有效地对这些数据进行处理,取而代之的就是并行化的数据挖掘技术。如今,多核CPU(Central Processing Unit)作为并行化数据挖掘的计算平台已经十分普遍。但是,随着数据规模逐渐增大,数据挖掘所涉及算法的复杂度越来越高,其所带来的高密度数据运算将会耗费处理器大量的运算时间,这对整个系统的性能以及功耗是十分不利的。为了降低这部分大规模数据的运算量,我们需要选择一个更加适合这类运算的计算处理单元来减轻系统的负荷。而图形处理器——GPU(Graphics Processing Unit),凭借其特殊的体系结构,十分适合大规模高密度数据的并行计算。在设计之初,GPU的设计者们就为其配置了大量的数据计算单元以及较高内存访问带宽以应对日益苛刻的图形处理以及电子游戏等应用的要求。特别是自从NVIDIA在2007年推出CUDA(Compute Unified Device Architecture)架构的GPU产品及其配套的开环境后,使得基于GPU平台的开发与基于CPU平台的开发十分相似,让开发者能够快速地开始基于CUDA架构的GPU进行开发。目前,在很多领域,如科学计算、金融工程、数据挖掘等,开发者们都在尝试使用CUDA架构GPU来提升系统的运算能力。除了在传统的单机环境中实现GPU运算,同时对于GPU在分布式环境中应用也有了越来越多的研究。本论文将选取两种数据挖掘领域中普遍使用的聚类算法,k均值聚类以及单链合并式分层聚类,并基于一款NVIDIA GTX260系列CUDA架构的GPU分别对两种聚类算法进行并行化的实现,从而论证了GPU对于聚类算法性能提升的有效性及可行性。最后本文将结合企业客户关系管理系统(Customer Relationship Management,CRM)的实际需求,实现Hadoop框架下的GPU聚类运算。聚类是数据挖掘过程中一个十分常用的操作,其主要目的是将随意散开的对象根据某种约定的相似性或相关度将其聚集在一起,形成一个或多个簇。本论文所实现的k均值聚类算法,主要是指将N个待聚集的对象或称节点按照其空间欧式距离的远近将其划分到距离最近的簇中,数次迭代后,最终形成K个簇。作为一种经典的聚类算法,该算法已被广泛使用于数据挖掘、生物信息、图像识别、人工智能等领域。其中著名的Apache Mahout就在其聚类运算中用到k均值聚类算法。对于实验中所实现的另一种聚类算法——合并式分层聚类,首先将N个独立的数据对象初始化为N个独立的子簇,然后以欧式距离作为子簇间相似度的度量标准,将距离最近的子簇进行合并,多次迭代后,以整个数据集中所有数据对象都包含于一个簇中作为算法的结束。虽然其所实现的操作与最终结果都与k均值有所不同,但是算法中都包含了大规模的数据运算,并且计算之间有较高的独立性,正是因为这种共性,使我们能够利用多核CPU和GPU对聚类算法进行并行化处理,从而大幅缩减其程序运行时间及提高了数据的吞吐量。由于企业CRM需要运行于分布式计算环境中,从而选择了开源软件Hadoop。Haoop框架可以让用户将普通性能的计算机组成集群进行大规模数据的分布式计算,同时提供了很强的容错性及可靠性。目前,诸如Google、Facebook等公司都在使用基于Hadoop框架的分布式计算。本论文的实验中,分别设计了三组不同规模的输入数据及目标聚类数进行k均值运算。同时由于硬件条件所限,对于分层聚类算法,设计了较k均值聚类小一些的数据规模。根据CUDA的编程模型,程序分为两部分,其中非高密度计算的程序运行在CPU端,而聚类运算所涉及到的大规模浮点数数学运算则运行在GPU端,这种模型也被称为CPU+GPU的异构计算。同时我们也将相同的实验对象基于多核CPU进行运算。最后通过对比两种实现方式的运算耗时,获得基于GPU实现的加速比。最后本文将实现基于Hadoop框架的GPU聚类运算,其中的实现方法及结果可以作为企业CRM设计的原型与参考。
[Abstract]:In recent years, with the rapid development of information processing and communication technology, it has promoted the further improvement of output and data acquisition in many fields. This has brought us a large amount of data to be processed. Nowadays, GB or TB orders of magnitude of data are very common. Therefore, the traditional serial data mining technology has been unable to be used. It is effective to deal with these data, which is replaced by parallel data mining technology. Nowadays, multi-core CPU (Central Processing Unit) is very common as a computing platform for parallel data mining. However, as the scale of data is increasing, the complexity of the algorithms involved in data mining is getting higher and higher, which brings high level. Density data operation will consume a large number of computing time of the processor, which is very bad for the performance and power of the whole system. In order to reduce the computation of this part of large-scale data, we need to choose a computing unit that is more suitable for this kind of operation to reduce the load of the system. And the graphic processor - GPU (Graphics Processing Unit), with its special architecture, is very suitable for parallel computing of large scale and high density data. At the beginning of the design, GPU designers have configured a large number of data computing units and high memory access bandwidth to meet the demands of increasingly harsh graphics processing and electronic games, especially since NV When IDIA launched the GPU products of the CUDA (Compute Unified Device Architecture) architecture in 2007 and its supporting environment, the development of GPU based platform is very similar to the development of the CPU platform, allowing developers to quickly start developing GPU based on CUDA architecture. In many fields, such as scientific computing, financial engineering, Data mining, and so on, developers are trying to use the CUDA architecture GPU to improve the computing power of the system. In addition to the implementation of GPU operation in the traditional single machine environment, and more and more research on the application of GPU in the distributed environment. This paper will select two kinds of clustering algorithms commonly used in the data mining domain, K mean clustering And single chain combined hierarchical clustering, and based on a NVIDIA GTX260 series CUDA architecture GPU respectively to the two clustering algorithms to carry out parallel implementation, thus demonstrating the effectiveness and feasibility of GPU clustering algorithm performance improvement. Finally, this paper will combine the enterprise customer relationship management system (Customer Relationship Management, CRM). The actual requirement is to realize the GPU clustering operation under the Hadoop framework. Clustering is a very common operation in the process of data mining. The main purpose of this paper is to assemble the randomly scattered objects together to form one or more clusters according to some agreed similarity or correlation. The K mean clustering algorithm implemented in this paper mainly refers to N is divided into the nearest cluster according to the distance of its spatial Euclidean distance. After several iterations, K clusters are formed after several iterations. As a classic clustering algorithm, the algorithm has been widely used in data mining, biological information, image recognition, artificial intelligence and so on. The famous Apache Mahout The K means clustering algorithm is used in its clustering operation. For another clustering algorithm implemented in the experiment, combined hierarchical clustering, first N independent data objects are initialized to N independent subclusters, and the Euclidean distance is used as the measure of similarity between subclusters, and the nearest sub cluster is merged and repeated multiple times. After generation, all data objects in the whole data set are included in a cluster as the end of the algorithm. Although the operation and final results are different from the K mean, all the algorithms contain large data operations and there is a high independence between the computing. It is precisely because of this commonality that we can use it. Multi core CPU and GPU parallel clustering algorithm to reduce the running time of the program and improve the throughput of the data. Because enterprise CRM needs to run in the distributed computing environment, the choice of the open source software Hadoop.Haoop framework allows the users to cluster the common computer cluster for large-scale data. Distributed computing, at the same time, provides strong fault tolerance and reliability. At present, such as Google, Facebook and other companies are using the distributed computing based on the Hadoop framework. In this paper, three groups of different sizes of input data and target clustering number are designed to perform K mean operation. The layer clustering algorithm has designed a smaller size of data than the K mean clustering. According to the programming model of CUDA, the program is divided into two parts, in which the program of non high density calculation runs in the CPU end, and the mathematical operation of the large floating point number involved in the clustering operation is run at the GPU end, and this model is also called the heterogeneous calculation of CPU+GPU. At the same time, I We also calculate the same object based on the multi core CPU. Finally, by comparing the time consuming of the two implementations, the acceleration ratio based on the GPU implementation is obtained. Finally, this paper will implement the GPU clustering operation based on the Hadoop framework, and the realization method and the result can be used as the prototype and reference of the enterprise CRM design.
【学位授予单位】：上海交通大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP311.13

【相似文献】