基于并行计算的交互式数据挖掘和可视化系统

发布时间:2018-09-07 12:34
【摘要】:随着信息技术的进步,数据量呈现爆炸式增长,传统的基于CPU的数据挖掘技术已经不能高效地处理如此巨大的数据量了。此外,人的大脑对于枯燥的数字更容易识别颜色和几何图形,利用数据可视化技术可以将数据挖掘结果更加自然和直观地呈现在操作界面,可以更好地满足用户的需求。但目前,数据挖掘最常用的传统数据可视化工具只能绘制二维或三维图形,且缺乏互动性。基于上述问题本文提出了一个基于并行计算的交互式数据挖掘和可视化系统。本文提出了利用GPU(Graphics Processing Unit)编程的方式对经典的数据流挖掘算法进行优化,传统的基于CPU的数据挖掘技术采用串行的数据处理方式,无法满足多个计算机资源同时运行的需求,当数据量较大时,处理时迭代次数会很多,内存需求较大,处理速度会很慢,效率较低。而GPU编程方式采用的是并行的方式处理数据,多个线程相互独立同时运行,运算效率很高,更加适应于处理大量数据。本文针对大数据中数据独立性情况和数据依赖性情况,分别利用GPU编程技术对数据挖掘中聚类算法K-Means和连通区域标记算法(Connected Component Labeling,CCL)进行优化,更好地完成了对大数据的挖掘分析。本文提出了交互式的数据可视化方法,为了实现对数据的可视化,我们利用DirectX的软件开发工具包,将原始数据集或数据挖掘结果转换为顶点、线、面、颜色和其他图形等信息,利用软件开发工具包中提供的各种清晰明了的图形函数建立多维模型,并对最后的可视化结果进行渲染。此外,我们还创建了一个图形用户界面(GUI),用户可以根据自己不同的需求,改变聚类的参数,得到符合自己需求的可视化结果。基于上述算法,本文对空调运行产生的能耗数据进行了实验,通过使用GPU编程方式对传统算法进行优化,不仅实现了对数据的聚类分析,而且通过实验数据证明了使用本系统处理巨大的数据量时运行速度得到很大提升,运算效率更高。此外,我们使用DirectX的软件开发工具包将抽象的数据挖掘结果表示为具体的四维立体的图形图像,并且用户还可以通过键盘操作改变可视化结果的观察视角以及聚类的K值,得到自己想要的结果,满足了用户的真正需求。
[Abstract]:With the development of information technology, the amount of data increases explosively. The traditional data mining technology based on CPU can not deal with such a huge amount of data efficiently. In addition, the human brain is easier to recognize color and geometry for boring numbers. Using data visualization technology, data mining results can be more naturally and intuitively presented in the operation interface, which can better meet the needs of users. But at present, the traditional data visualization tools used in data mining can only draw 2D or 3D graphics, and lack of interactivity. This paper presents an interactive data mining and visualization system based on parallel computing. In this paper, the classical data stream mining algorithm is optimized by using GPU (Graphics Processing Unit) programming method. The traditional data mining technology based on CPU adopts serial data processing method, which can not meet the needs of multiple computer resources running at the same time. When the amount of data is large, the number of iterations will be many, the memory requirement will be large, the processing speed will be very slow and the efficiency will be low. The GPU programming method uses the parallel way to process the data. The multiple threads run independently and simultaneously, so the operation efficiency is very high, so it is more suitable to deal with a large amount of data. Aiming at the data independence and data dependence in big data, this paper optimizes the clustering algorithm K-Means and the connected area marking algorithm (Connected Component Labeling,CCL by using GPU programming technology, and completes the mining analysis of big data. In this paper, an interactive method of data visualization is proposed. In order to realize the visualization of data, we use the software development kit of DirectX to transform the original data set or data mining result into vertex, line, surface, color and other graphics. The multi-dimensional model is built by using various clear graphic functions provided in the software development toolkit, and the final visualization results are rendered. In addition, we also create a graphical user interface (GUI),) which can change the clustering parameters according to their different requirements and get the visualization results that meet their needs. Based on the above algorithm, the energy consumption data generated by air conditioning operation are experimented in this paper, and the traditional algorithm is optimized by using GPU programming method, which not only realizes the clustering analysis of the data, The experimental data show that the speed of the system is greatly improved and the operation efficiency is higher when the system is used to deal with the huge amount of data. In addition, we use the software development kit of DirectX to represent the abstract data mining results as concrete four-dimensional three-dimensional graphics and images, and users can change the visual view of the visual results and the K value of clustering through keyboard operation. Get the results you want to meet the real needs of users.
【学位授予单位】:北方工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13

【参考文献】

相关期刊论文 前10条

1 权国龙;冯园园;冯仰存;顾小清;;面向知识的可视化技术分析与观察[J];远程教育杂志;2016年01期

2 邓仲华;刘伟伟;陆颖隽;;基于云计算的大数据挖掘内涵及解决方案研究[J];情报理论与实践;2015年07期

3 Yang Ju;Heping Xie;Zemin Zheng;Jinbo Lu;Lingtao Mao;Feng Gao;Ruidong Peng;;Visualization of the complex structure and stress field inside rock by means of 3D printing technology[J];Chinese Science Bulletin;2014年36期

4 Yufeng Zhao;Qi Xie;Liyun He;Baoyan Liu;Kun Li;Xiang Zhang;Wenjing Bai;Lin Luo;Xianghong Jing;Ruili Huo;;Comparsion analysis of data mining models applied to clinical research in Traditional Chinese Medicine[J];Journal of Traditional Chinese Medicine;2014年05期

5 潘巍;李战怀;;大数据环境下并行计算模型的研究进展[J];华东师范大学学报(自然科学版);2014年05期

6 Amani Tahat;Jordi Marti;Ali Khwaldeh;Kaher Tahat;;Pattern recognition and data mining software based on artificial neural networks applied to proton transfer in aqueous environments[J];Chinese Physics B;2014年04期

7 Chenyang Ge;Zuoxun Hou;Huimin Yao;Nanning Zheng;Wenzhe Zhao;;A new implementation of image-processing engine for 3D visualization and stereo video stream display[J];Chinese Science Bulletin;2014年Z1期

8 Zhen Chen;Fuye Han;Junwei Cao;Xin Jiang;Shuo Chen;;Cloud Computing-Based Forensic Analysis for Collaborative Network Security Management System[J];Tsinghua Science and Technology;2013年01期

9 孙大为;常桂然;高尚;靳立忠;王兴伟;;Modeling a Dynamic Data Replication Strategy to Increase System Availability in Cloud Computing Environments[J];Journal of Computer Science & Technology;2012年02期

10 牛东晓;王永利;马小勇;;Optimization of support vector machine power load forecasting model based on data mining and Lyapunov exponents[J];Journal of Central South University of Technology;2010年02期

相关博士学位论文 前4条

1 李秋虹;基于MapReduce的大规模数据挖掘技术研究[D];复旦大学;2013年

2 周勇;基于并行计算的数据流处理方法研究[D];大连理工大学;2013年

3 张小庆;基于云计算环境的资源提供优化方法研究[D];武汉理工大学;2013年

4 任永功;面向聚类的数据可视化方法及相关技术研究[D];东北大学;2006年

相关硕士学位论文 前1条

1 王莉;基于Hadoop的大数据平台数据挖掘云服务研究[D];长江大学;2016年



本文编号:2228283

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/2228283.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a497f***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com