基于边界度模型的聚类技术研究
本文选题:聚类边界 + 聚类算法 ; 参考:《郑州大学》2017年硕士论文
【摘要】:聚类是将相似的数据点划分到同一个簇中,不相似的数据点划分到不同的簇中的技术。在数据分析中,聚类技术可以用来分析数据集中数据的结构、聚类之间的关系等,在模式识别、生物监测、药品研制、信息安全监测等领域发挥着重要的作用。但是,由于高维空间数据的稀疏性,现有的聚类技术对高维空间聚类时存在发现聚类困难和聚类精度不高等问题。与传统的聚类思想不同,本文采用优先查找聚类边界然后向聚类中心搜索寻找聚类的思路提出了新的聚类算法。其创新点如下:提出了一种新的适用高维的聚类算法CASB(A Clustering Algorithm With Affine Space Based Boundary Detection)。该算法首先利用空间的仿射变换后拓扑结构不变性建立聚类边界模型,并以此寻找聚类的边界;然后以边界点为基础构建连接矩阵,再从聚类边界向聚类内部搜索的方式形成聚类。实验表明该算法能够对含有不同密度、不同大小、不同形状的高维数据聚类,与同类的算法相比具有较高的准确度,且参数选取简单。提出了一种基于偏斜边界检测的聚类算法C-USB(A Clustering Algorithm Using Skewness-based Boundary Detection)。该算法首先提出一种偏斜假设,即聚类边缘位置的点及其近邻点在其空间分布上存在偏斜的情况;然后通过计算数据点的偏斜程度来计算数据点的边界度并寻找聚类的边界;最后以边界点为基础删减数据点的近邻关系构建连接矩阵形成聚类。实验表明该算法能够对复杂高维数据集进行聚类分析并保持较高的准确度,特别是能够在大规模的数据集上仍然能够取得很好地聚类效果。提出一种新的面向复杂数据的聚类算法CUSBD(Clustering Based On Skew-based Boundary Detection)。该算法同样提出一种边界点的分布假设,即聚类边缘位置的点及其近邻点在其空间分布上满足偏斜分布(采用gamma分布);然后在此假设的基础上计算数据点及其近邻点的分布偏斜程度来作为该点的边界度并寻找聚类的边界,再以边界点为基础构建连接矩阵形成聚类。实验表明该算法能够有效地控制算法在不同密度、大小、形状、规模的数据集中的聚类准确度,具有计算方便的特点。
[Abstract]:Clustering is a technique that divides similar data points into the same cluster and dissimilar data points into different clusters. In data analysis, clustering technology can be used to analyze the structure of data set, the relationship between clustering and so on. It plays an important role in the fields of pattern recognition, biological monitoring, drug development, information security monitoring and so on. However, due to the sparsity of high-dimensional spatial data, the existing clustering techniques are difficult to find and the accuracy of clustering is not high. Different from the traditional clustering idea, this paper proposes a new clustering algorithm based on the idea of first looking for the clustering boundary and then searching the cluster center to find the clustering. The innovations are as follows: a new clustering Algorithm with Affine space based boundary detection algorithm is proposed. Firstly, the clustering boundary model is established by using the invariance of topological structure after affine transformation of space, and then the boundary of clustering is found, and then the connection matrix is constructed based on the boundary point. Then the clustering is formed from the edge of the cluster to the internal search of the cluster. Experiments show that the algorithm can cluster high-dimensional data with different densities, sizes and shapes, and has higher accuracy than similar algorithms, and the selection of parameters is simple. This paper presents a clustering algorithm based on skew boundary detection, C-USBU A clustering Algorithm using Skewness-based boundary detection. In this algorithm, a skew assumption is first proposed, that is, the skew exists in the spatial distribution of the points at the edge of the clustering and its adjacent points, and then the boundary degree of the data points is calculated and the boundary of the clustering is found by calculating the skew degree of the data points. Finally, based on the boundary point, the nearest neighbor relation of the data point is deleted to construct the join matrix to form the clustering. Experimental results show that the proposed algorithm can be used to cluster complex high dimensional data sets with high accuracy, especially on large scale data sets. A new clustering algorithm for complex data, CUSBD clustering based on Skew-based Boundary Detection, is proposed. The algorithm also proposes a boundary point distribution hypothesis. That is, the point of clustering edge position and its nearest neighbor point satisfy skew distribution in its spatial distribution (using gamma distribution), and then calculate the skew degree of data point and its nearest neighbor point as the boundary degree of this point on the basis of this assumption. And look for the boundaries of the cluster, Then the connection matrix is constructed on the basis of boundary points to form clustering. Experiments show that the algorithm can effectively control the clustering accuracy of the algorithm in different data sets with different density, size, shape and size, and has the characteristics of convenient calculation.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【相似文献】
相关期刊论文 前10条
1 马志方;;聚类技术及理论聚类[J];计算机科学;1988年04期
2 宋浩远;;应用于大型数据库的聚类技术研究[J];重庆文理学院学报(自然科学版);2008年01期
3 姚炜,田建明,赵宝珍,钱国正,陈宁宁;计算机自动勾画人体脏器声像图边界[J];中华超声影像学杂志;2003年10期
4 杨斌;;基于聚类技术的数据动态搜索方法[J];计算机教育;2006年12期
5 陈利军;;常用的聚类技术分析[J];湖南工业职业技术学院学报;2012年01期
6 刘佳佳;;浅论聚类技术及其在图书馆服务中的应用[J];现代企业教育;2012年22期
7 王萍;;运用聚类技术分析客户信息的方法与实证研究[J];情报科学;2006年05期
8 邱保志;岳峰;;基于引力的边界点检测算法[J];小型微型计算机系统;2008年02期
9 李玉擰;孟东霞;桂智明;;几何集成的改进——特征边界点快速计算[J];山东大学学报(工学版);2011年04期
10 刘毅;;计算机技术在图象边界修复中的应用[J];山东电子;1997年01期
相关会议论文 前5条
1 宋二祥;;无限地基数值模拟的传输边界[A];第六届全国结构工程学术会议论文集(第三卷)[C];1997年
2 宿太学;吕天阳;张锡哲;王钲旋;徐长青;;一个集成多种聚类技术实验平台的整体设计方案和基本模块实现[A];第十二届全国图象图形学学术会议论文集[C];2005年
3 高煜;程昊;毕传兴;陈剑;;基于分布源边界点的结构声辐射和声灵敏度的研究[A];第九届全国振动理论及应用学术会议论文摘要集[C];2007年
4 王s,
本文编号:2010427
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/2010427.html