基于Spark的极限学习机算法并行化研究
发布时间:2018-10-22 17:46
【摘要】:极限学习机算法虽然训练速度较快,但包含了大量矩阵运算,因此其在面对大数据量时,处理效率依然缓慢。在充分研究Spark分布式数据集并行计算机制的基础上,设计了核心环节矩阵乘法的并行计算方案,并对基于Spark的极限学习机并行化算法进行了设计与实现。为方便性能比较,同时实现了基于Hadoop MapReduce的极限学习机并行化算法。实验结果表明,基于Spark的极限学习机并行化算法相比于Hadoop MapReduce版本的运行时间明显缩短,而且若处理数据量越大,Spark在效率方面的优势就越明显。
[Abstract]:Although the training speed of extreme learning machine is fast, it contains a large number of matrix operations, so its processing efficiency is still slow in the face of large amount of data. On the basis of fully studying the parallel computing mechanism of Spark distributed data set, the parallel computing scheme of matrix multiplication of core link is designed, and the parallel algorithm of extreme learning machine based on Spark is designed and implemented. In order to facilitate performance comparison, parallel algorithm of extreme learning machine based on Hadoop MapReduce is implemented at the same time. The experimental results show that the running time of the parallel algorithm based on Spark is significantly shorter than that of the Hadoop MapReduce version, and the more data is processed, the more obvious the efficiency of Spark is.
【作者单位】: 中国矿业大学物联网(感知矿山)研究中心;矿山互联网应用技术国家地方联合工程实验室;中国矿业大学信息与控制工程学院;南京大学计算机系PASA大数据实验室;
【基金】:国家重点研发计划:矿山安全生产物联网关键技术与装备研发(2017YFC0804400,2017YFC0804401) 国家自然科学基金项目(61471361,41302203)资助
【分类号】:TP18
本文编号:2287874
[Abstract]:Although the training speed of extreme learning machine is fast, it contains a large number of matrix operations, so its processing efficiency is still slow in the face of large amount of data. On the basis of fully studying the parallel computing mechanism of Spark distributed data set, the parallel computing scheme of matrix multiplication of core link is designed, and the parallel algorithm of extreme learning machine based on Spark is designed and implemented. In order to facilitate performance comparison, parallel algorithm of extreme learning machine based on Hadoop MapReduce is implemented at the same time. The experimental results show that the running time of the parallel algorithm based on Spark is significantly shorter than that of the Hadoop MapReduce version, and the more data is processed, the more obvious the efficiency of Spark is.
【作者单位】: 中国矿业大学物联网(感知矿山)研究中心;矿山互联网应用技术国家地方联合工程实验室;中国矿业大学信息与控制工程学院;南京大学计算机系PASA大数据实验室;
【基金】:国家重点研发计划:矿山安全生产物联网关键技术与装备研发(2017YFC0804400,2017YFC0804401) 国家自然科学基金项目(61471361,41302203)资助
【分类号】:TP18
【相似文献】
相关期刊论文 前5条
1 赵世光,陈文培,严隽琪;港口机械产品制造生产过程的并行化改造[J];机械工程师;1999年07期
2 薛晓青,刘德贵,宋晓秋;发汗控制方程差分解法的并行化[J];系统工程与电子技术;1996年09期
3 沈占锋;骆剑承;吴炜;胡晓东;;遥感影像均值漂移分割算法的并行化实现[J];哈尔滨工业大学学报;2010年05期
4 余永红;向小军;商琳;;并行化的情感分类算法的研究[J];计算机科学;2013年06期
5 曹琰;魏强;王清贤;丁文博;;基于集中式存储的全局约束并行相容模型[J];计算机应用研究;2013年11期
相关硕士学位论文 前5条
1 梁雪琦;极限学习机并行化算法及在NO_x排放预测中的应用[D];太原理工大学;2016年
2 付小利;并行化深度学习算法及其应用研究[D];西南交通大学;2016年
3 高彦明;蚁群算法并行化研究[D];苏州大学;2005年
4 皮兴杰;基于Spark的电网大数据统计中等值连接问题的优化及其应用[D];重庆大学;2016年
5 庄钟杰;基于MPI的并行化刚体动力学仿真[D];华南理工大学;2013年
,本文编号:2287874
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2287874.html