MapReduce并行化压缩近邻算法
发布时间:2018-06-08 07:11
本文选题:压缩近邻 + K-近邻 ; 参考:《小型微型计算机系统》2017年12期
【摘要】:压缩近邻(CNN:Condensed Nearest Neighbors)是Hart针对K-近邻(K-NN:K-Nearest Neighbors)提出的样例选择算法,目的是为了降低K-NN算法的内存需求和计算负担.但在最坏情况下,CNN算法的计算时间复杂度为O(n3),n为训练集中包含的样例数.当CNN算法应用于大数据环境时,高计算时间复杂度会成为其应用的瓶颈.针对这一问题,本文提出了基于MapReduce并行化压缩近邻算法.在Hadoop环境下,编程实现了并行化的CNN,并与原始的CNN算法在6个数据集上进行了实验比较.实验结果显示,本文提出的算法是行之有效的,能解决上述问题.
[Abstract]:CNN: Condensed nearest neighbor (CNN: Condensed nearest neighbor) is a sample selection algorithm proposed by Hart for K-NN: K-nearest neighbors. The aim of this algorithm is to reduce the memory requirement and computational burden of K-NN algorithm. But in the worst case, the computational complexity of CNN algorithm is the number of samples contained in the training set. When CNN algorithm is applied to big data environment, high computational time complexity will become the bottleneck of its application. To solve this problem, this paper proposes a parallel compressed nearest neighbor algorithm based on MapReduce. In Hadoop environment, the parallel CNNs are programmed and compared with the original CNN algorithm on 6 datasets. Experimental results show that the proposed algorithm is effective and can solve the above problems.
【作者单位】: 河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室;浙江师范大学数理与信息工程学院;
【基金】:国家自然科学基金项目(71371063)资助 河北省自然科学基金项目(F2017201026)资助 浙江省计算机科学与技术重中之重学科(浙江师范大学)课题项目资助
【分类号】:TP311.13
,
本文编号:1995065
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1995065.html