K-means聚类算法研究及应用

发布时间：2018-06-18 19:06

本文选题：数据挖掘 + 聚类分析　；参考：《长沙理工大学》2016年硕士论文

【摘要】：数据挖掘技术是一门融合了数据库、机器学习和AI等众多领域的交叉学科,能够从无序、杂乱和大量的数据集中挖掘出我们所需要的信息。聚类分析是数据挖掘领域最为重要的技术之一,至今已在理论和方法上取得了丰硕的研究成果。聚类已经被广泛的研究了许多年,主要集中在基于距离的聚类算法上,其中以K-均值聚类算法最为经典。K-means算法被认为是聚类中最重要的无监督机器学习方法。它是一种划分聚类算法,将全部数据分为k个相互差异很大的子类,通过不断的迭代,使得k-means算法中每个数据对象到其所在的子类的中心点的距离最小。由于k-means算法具有简单易行和效率高等优点,它被广泛应用于数据压缩、图像分割、市场营销、异常数据分析以及统计学科等领域。但是,k-means算法仍然存在一定的局限性,例如:对初始聚类中心极为敏感,如果初始聚类中心选择不当,算法很容易陷入局部最优解,而非全局最优解。本文主要深入研究和分析了聚类算法中的经典K-means聚类算法,总结出其优点和不足。考虑到K-means算法简单、快速的特性,在本文中将K-means算法应用于视频目标的跟踪技术中。针对K-means算法对初始中心值选取的依赖性,我们提出了一种新的初始中心值选取方法,并用大量的实验验证了随机选取初始值对聚类结果的影响性。具体研究内容和工作成果如下:(1)在本文中将K-means算法应用于视频目标的跟踪技术中,首先通过对视频图像的背景像素点建立样本模型,然后利用聚类的特性对样本进行划分,用来模拟物体的相关动作特征。并根据背景像素点的样本模型对视频帧中的相关像素点进行检测,检查出图像帧中存在的背景点。针对图像中相关像素点在样本模型中所划分的类对样本模型进行更新,从而达到提高图像背景检测的有效性。(2)本文通过利用均值漂移的快速局部收敛特性和分区域的全局划分特点对初始中心点进行了优化,能够在一定程度上减少算法的整体迭代次数,降低算法的复杂度,增强算法的全局性和稳定性。实验结果证明,新的改进算法能够增强结果的稳定性,提高数据聚类分组的准确度。
[Abstract]:Data mining technology is an interdisciplinary subject which combines many fields such as database machine learning and AI. It can mine the information we need from disorder clutter and a large number of data sets. Clustering analysis is one of the most important techniques in the field of data mining. Clustering has been widely studied for many years, mainly focused on distance-based clustering algorithm, in which the K-means clustering algorithm is considered as the most important unsupervised machine learning method. It is a partitioning and clustering algorithm, which divides all data into k subclasses which are very different from each other. Through continuous iteration, the distance between each data object in k-means algorithm and the center point of its subclass is minimized. Because of its simplicity and high efficiency, k-means algorithm is widely used in data compression, image segmentation, marketing, abnormal data analysis and statistics. However, the k-means algorithm still has some limitations, for example, it is very sensitive to the initial clustering center. If the initial clustering center is not properly selected, the algorithm can easily fall into the local optimal solution rather than the global optimal solution. In this paper, the classical K-means clustering algorithm is studied and analyzed, and its advantages and disadvantages are summarized. Considering that K-means algorithm is simple and fast, K-means algorithm is applied to video target tracking technology in this paper. In view of the dependence of K-means algorithm on the selection of initial center value, we propose a new method for selecting initial center value, and verify the influence of random selection of initial value on clustering results by a large number of experiments. In this paper, K-means algorithm is applied to video target tracking technology. Firstly, the sample model is established through the background pixels of video image, and then the samples are divided by clustering characteristics. Used to simulate the related action characteristics of an object. Based on the sample model of background pixels, the correlation pixels in video frames are detected, and the background points in image frames are detected. The sample model is updated according to the classes divided in the sample model for the relevant pixel points in the image. In order to improve the effectiveness of image background detection, this paper optimizes the initial center by using the fast local convergence of mean shift and the global partition of sub-region. To some extent, it can reduce the number of global iterations, reduce the complexity of the algorithm, and enhance the global and stability of the algorithm. Experimental results show that the new algorithm can enhance the stability of the results and improve the accuracy of data clustering.
【学位授予单位】：长沙理工大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13

【参考文献】