Spark下的并行多标签最近邻算法
[Abstract]:With the arrival of big data era, large-scale multi-label data mining methods have received extensive attention. Multi-label nearest neighbor algorithm (MLKNN) is a simple, efficient and widely used multi-label classification method, and its classification accuracy is higher than other common multi-label learning methods in many applications. However, with the increasing scale of data to be processed, the traditional serial ML-KNN algorithm has been difficult to meet the time and storage space constraints in big data's application. Combined with the parallel mechanism of Spark and the characteristics of iterative computation based on memory, a ML-KNN algorithm SML-KNN. based on Spark parallel framework is proposed. The K nearest neighbors of each partition are found in the Map phase, and then the final K nearest neighbors are determined according to the nearest neighbor sets of each partition in the Reduce phase. Finally, the label sets of the nearest neighbors are aggregated in parallel. The target label set of samples to be predicted is outputted by maximizing the posterior probability criterion. The experimental results in serial and parallel environments show that the performance of SML-KNN is approximately linear with computing resources on the premise of ensuring the accuracy of classification, which improves the processing ability of ML-KNN algorithm to large-scale multi-label data.
【作者单位】: 重庆邮电大学计算智能重庆市重点实验室;
【基金】:重庆市基础与前沿研究计划项目(csts2014jcyjA40001,cstc2014jcyjA40022) 重庆市教委科学技术研究项目(自然科学类)(KJ1400436)
【分类号】:TP181
【相似文献】
相关期刊论文 前10条
1 宋杰;;蛋白质亚细胞定位预测的最近邻算法[J];计算机应用研究;2007年11期
2 张瑞民;郭海刚;李培峦;;基于核的k最近邻算法[J];华北水利水电学院学报;2007年06期
3 潘世瑞;张阳;李雪;王勇;;针对不确定正例和未标记学习的最近邻算法(英文)[J];计算机科学与探索;2010年09期
4 李强;蒋静坪;;量子K最近邻算法[J];系统工程与电子技术;2008年05期
5 宋杰;;蛋白质相互作用预测的核最近邻算法[J];计算机应用研究;2009年11期
6 周武;赵春霞;张浩峰;;动态联合最近邻算法[J];电子学报;2010年02期
7 于学斗;;基于核的K-最近邻算法的主动式防御研究[J];计算机安全;2009年07期
8 杨梦雄;杨贯中;;基于K-最近邻算法的话务智能预测技术[J];科学技术与工程;2007年21期
9 陈凯;王立松;;一种新的加权最近邻算法的降水预报试验[J];计算机仿真;2014年06期
10 谢金晶;张艺濒;;基于改进的K-最近邻算法的病毒检测方法[J];现代电子技术;2007年03期
相关会议论文 前2条
1 潘世瑞;张阳;李雪;王勇;;针对不确定正例和未标记学习的最近邻算法(英文)[A];NDBC2010第27届中国数据库学术会议论文集A辑二[C];2010年
2 周晓丹;冯少荣;薛永生;;一种基于改进最近邻算法的缺失数据处理[A];第二十四届中国数据库学术会议论文集(技术报告篇)[C];2007年
相关硕士学位论文 前1条
1 陈煜;基于多维度量的出租车推荐系统的研究与实现[D];大连理工大学;2015年
,本文编号:2470875
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2470875.html