面向大规模数据的多视角K-means聚类算法的研究

发布时间：2018-03-17 04:09

本文选题：大规模多视角数据　切入点：多视角聚类　出处：《郑州大学》2017年硕士论文　论文类型：学位论文

【摘要】：在过去的几十年,收集了越来越多的多源数据或者多视角数据,每个视角具有不同的数据特征来表示数据。传统的聚类算法在处理聚类分析问题时已经显示出它的不足,多视角聚类方法便应运而生。现有的多视角聚类算法分为三类,即:协同聚类算法、基于多核的聚类算法以及基于子空间的多视角聚类算法。但是,随着数据量的爆炸式增长,越来越多的大规模多视角数据涌现,亟待人们去挖掘处理。现有的处理大规模数据的方法主要有四类,即:基于抽样的方法、基于聚类特征选择的方法、基于约束信息的半监督聚类算法和基于分布式平台的聚类算法。这些方法都是针对大规模单视角数据的聚类算法,不能直接用来解决大规模多视角聚类的问题。本文针对多视角聚类问题面临的这种状况进行了相关的研究。论文的主要工作和创新如下:1、对面向大规模数据的单视角聚类和现有的多视角聚类算法进行了归纳总结,指出了其原理和适用范围。并特别指出现有的多视角聚类算法在处理大规模数据方面存在的不足。2、针对现有多视角聚类在处理大规模数据方面存在的不足,提出了一种面向大规模数据的多视角K-means聚类算法(Multi-view K-means Clustering Algorithm on Large Data,LKMC)。该算法使用l_(1,2)结构化稀疏诱导范数对目标函数优化,在对数据均匀分块的基础上,对每个数据块进行多视角聚类,并计算其中心,然后对中心再进行多视角聚类从而得到最终结果。该算法具有对初始点的选择不敏感的特点且能处理大规模数据集。实验验证了其有效性。
[Abstract]:In the past few decades, more and more data collected multi-source and multi view data, each data perspective has different features to represent the data. The traditional clustering algorithm in clustering analysis problem processing has shown its shortcomings, multi view clustering method comes into being. The existing multi view clustering algorithm is divided into three class, namely: collaborative clustering algorithm, clustering algorithm based on multi core and multi view clustering algorithm based on subspace. However, with the explosive growth of data, large-scale multi view data more and more urgent for people to emerge, mining method to deal with large-scale data processing. There are four main categories, namely: sampling the method based on Clustering method based on feature selection, clustering algorithm based on semi supervised clustering algorithm based on constraint information and distributed platform. These methods are based on the number of large-scale single view According to the clustering algorithm, which can not be used to solve large-scale multi view clustering problem. Researches the situation according to the multi view clustering problem. The main work and innovation are as follows: 1, opposite to multi view clustering algorithm in single view clustering large-scale data and existing were summarized, pointed out. The principle and application scope. And especially pointed out that the shortcomings of.2 multi view clustering algorithm existing in large-scale data processing, aiming at the problems in large-scale data processing of existing multi view clustering, is proposed for large-scale data multi view clustering algorithm K-means (Multi-view K-means Clustering Algorithm on Large Data, LKMC). The algorithm uses l_ (1,2) structured sparse induced norm to optimize the objective function, based on the uniform data blocks, for each block of data. Multi angle clustering is performed and the center is calculated. Then the center is clustered with multi view to get the final result. The algorithm has the characteristics of insensitivity to the initial point selection and can handle large scale data set. The validity of the algorithm is verified by experiments.

【学位授予单位】：郑州大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【相似文献】