基于边界度模型的聚类技术研究

发布时间：2018-06-12 17:13

本文选题：聚类边界 + 聚类算法　；参考：《郑州大学》2017年硕士论文

【摘要】：聚类是将相似的数据点划分到同一个簇中,不相似的数据点划分到不同的簇中的技术。在数据分析中,聚类技术可以用来分析数据集中数据的结构、聚类之间的关系等,在模式识别、生物监测、药品研制、信息安全监测等领域发挥着重要的作用。但是,由于高维空间数据的稀疏性,现有的聚类技术对高维空间聚类时存在发现聚类困难和聚类精度不高等问题。与传统的聚类思想不同,本文采用优先查找聚类边界然后向聚类中心搜索寻找聚类的思路提出了新的聚类算法。其创新点如下:提出了一种新的适用高维的聚类算法CASB(A Clustering Algorithm With Affine Space Based Boundary Detection)。该算法首先利用空间的仿射变换后拓扑结构不变性建立聚类边界模型,并以此寻找聚类的边界;然后以边界点为基础构建连接矩阵,再从聚类边界向聚类内部搜索的方式形成聚类。实验表明该算法能够对含有不同密度、不同大小、不同形状的高维数据聚类,与同类的算法相比具有较高的准确度,且参数选取简单。提出了一种基于偏斜边界检测的聚类算法C-USB(A Clustering Algorithm Using Skewness-based Boundary Detection)。该算法首先提出一种偏斜假设,即聚类边缘位置的点及其近邻点在其空间分布上存在偏斜的情况;然后通过计算数据点的偏斜程度来计算数据点的边界度并寻找聚类的边界;最后以边界点为基础删减数据点的近邻关系构建连接矩阵形成聚类。实验表明该算法能够对复杂高维数据集进行聚类分析并保持较高的准确度,特别是能够在大规模的数据集上仍然能够取得很好地聚类效果。提出一种新的面向复杂数据的聚类算法CUSBD(Clustering Based On Skew-based Boundary Detection)。该算法同样提出一种边界点的分布假设,即聚类边缘位置的点及其近邻点在其空间分布上满足偏斜分布(采用gamma分布);然后在此假设的基础上计算数据点及其近邻点的分布偏斜程度来作为该点的边界度并寻找聚类的边界,再以边界点为基础构建连接矩阵形成聚类。实验表明该算法能够有效地控制算法在不同密度、大小、形状、规模的数据集中的聚类准确度,具有计算方便的特点。
[Abstract]:Clustering is a technique that divides similar data points into the same cluster and dissimilar data points into different clusters. In data analysis, clustering technology can be used to analyze the structure of data set, the relationship between clustering and so on. It plays an important role in the fields of pattern recognition, biological monitoring, drug development, information security monitoring and so on. However, due to the sparsity of high-dimensional spatial data, the existing clustering techniques are difficult to find and the accuracy of clustering is not high. Different from the traditional clustering idea, this paper proposes a new clustering algorithm based on the idea of first looking for the clustering boundary and then searching the cluster center to find the clustering. The innovations are as follows: a new clustering Algorithm with Affine space based boundary detection algorithm is proposed. Firstly, the clustering boundary model is established by using the invariance of topological structure after affine transformation of space, and then the boundary of clustering is found, and then the connection matrix is constructed based on the boundary point. Then the clustering is formed from the edge of the cluster to the internal search of the cluster. Experiments show that the algorithm can cluster high-dimensional data with different densities, sizes and shapes, and has higher accuracy than similar algorithms, and the selection of parameters is simple. This paper presents a clustering algorithm based on skew boundary detection, C-USBU A clustering Algorithm using Skewness-based boundary detection. In this algorithm, a skew assumption is first proposed, that is, the skew exists in the spatial distribution of the points at the edge of the clustering and its adjacent points, and then the boundary degree of the data points is calculated and the boundary of the clustering is found by calculating the skew degree of the data points. Finally, based on the boundary point, the nearest neighbor relation of the data point is deleted to construct the join matrix to form the clustering. Experimental results show that the proposed algorithm can be used to cluster complex high dimensional data sets with high accuracy, especially on large scale data sets. A new clustering algorithm for complex data, CUSBD clustering based on Skew-based Boundary Detection, is proposed. The algorithm also proposes a boundary point distribution hypothesis. That is, the point of clustering edge position and its nearest neighbor point satisfy skew distribution in its spatial distribution (using gamma distribution), and then calculate the skew degree of data point and its nearest neighbor point as the boundary degree of this point on the basis of this assumption. And look for the boundaries of the cluster, Then the connection matrix is constructed on the basis of boundary points to form clustering. Experiments show that the algorithm can effectively control the clustering accuracy of the algorithm in different data sets with different density, size, shape and size, and has the characteristics of convenient calculation.
【学位授予单位】：郑州大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【相似文献】