基于MapReduce的kNN-join算法的研究与设计
发布时间:2018-06-06 10:22
本文选题:MapReduce + kNN连接操作 ; 参考:《黑龙江大学》2016年硕士论文
【摘要】:由于互联网行业的不断发展,随之而来的是大量的数据,因此如何在这些大量数据中获得有价值的知识成为了人们关注的焦点。在所有的数据挖掘算法中,可以利用kNN算法进行数据分类,随着kNN算法的广泛应用,kNN-join算法随之被提出,算法被广泛的应用在数据挖掘的各个阶段:数据预处理阶段和数据挖掘阶段。然而随着数据量的不断增大,以及人们对操作效率的要求,传统方法已经无法满足,因此产生了基于MapReduce的kNN-join操作。本文对基于MapReduce的kNN-join操作的的各个阶段进行研究,首先,对数据进行预处理,对数据划分算法进行优化,对现有的数据划分算法进行改进,以保证数据均匀划分;其次,为了节约join过程中的开销,使得每个数据划分中的所有元素的最近k个邻居在一个集合内,为每个数据划分寻找种集;最后,为了均衡资源利用率与算法准确率,我们对数据划分进行群组划分。本文使用真实数据与合成数据相结合,对算法进行实验,以证实算法的有效性,实验结果显示,我们提出的算法优于已有算法。
[Abstract]:Due to the continuous development of the Internet industry, there is a large number of data, so how to obtain valuable knowledge in these data has become the focus of attention. Among all the data mining algorithms, the kNN algorithm can be used to classify the data. With the wide application of the kNN algorithm, the kNN-join algorithm has been proposed. The algorithm is widely used in all stages of data mining: data preprocessing and data mining. However, with the increasing amount of data and the requirement of operation efficiency, the traditional methods can not meet the requirements, so the kNN-join operation based on MapReduce is produced. In this paper, we study the stages of kNN-join operation based on MapReduce. Firstly, we preprocess the data, optimize the data partition algorithm, improve the existing data partition algorithm to ensure the uniform partition of data. In order to save the overhead in the join process, the nearest k neighbors of all the elements in each data partition are found in one set. Finally, in order to balance the resource utilization with the accuracy of the algorithm, the nearest k neighbors of all the elements in each data partition are found in a single set. We divide the data into groups. In this paper, we use real data and synthetic data to test the algorithm to verify the effectiveness of the algorithm. The experimental results show that the proposed algorithm is better than the existing algorithm.
【学位授予单位】:黑龙江大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
,
本文编号:1986197
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1986197.html