当前位置:主页 > 社科论文 > 社会学论文 >

基于网格密度区分的多维聚类挖掘算法设计

发布时间:2018-03-19 10:28

  本文选题:聚类算法 切入点:网格 出处:《西安财经学院》2014年硕士论文 论文类型:学位论文


【摘要】:聚类分析为数据挖掘算法的重要组成部分,是数据挖掘中的一种分析活动。聚类算法是总体聚类分析的核心,决定了全部聚类分析结果的质量。目前,如何在保证算法稳定与有效的前提下,进一步提高聚类效率,,减少用户成本和负担,成为当前非常有意义的研究方向。 由于传统的聚类算法对计算机硬件资源要求比较高,海量数据聚类运算时间比较长,本文提出了一种新的基于网格和密度的聚类算法。一般基于网格的聚类具有节省时间成本、高效率的特点,但它的聚类质量不是很好;密度的聚类算法可以将任意具有相异外形的簇进行聚类,但它在处理高维空间数据的时间复杂度高。由于这两者的互补关系,基于网格密度结合的策略进行样本空间的区分,能够极大的提高聚类效率。本文聚类算法的思想是:首先,创建网格,对数据空间进行初始网格划分。其次,样本空间划分,根据得到的网格密度阀值,将网格单元的数据划分成高、低密度区两部分;将高密度区所有网格按照密度大小进行排列,找到密度最大的网格,利用其周围最近低密度网格区寻找到第一个高密度簇;将第一个高密度簇的点去掉,将剩余高密度网格进行排序,依次进行,直到形成最终空间的划分结果。最后,计算各子簇类重心,将临近簇重心空间合并,形成新簇重心,依次合并空间,直到等于给定簇类数,形成最终聚类结果。 本文首先从理论方面对该算法进行了描述,验证了该算法设计的合理性和科学性。最后通过Matlab随机生成几组数据进行了实证分析,验证了本算法能够在与经典的K-means算法组间离差平方和相差不大的条件下,运算时间上有了显著的改善。
[Abstract]:Clustering analysis is an important part of data mining algorithm and an analysis activity in data mining. Clustering algorithm is the core of overall clustering analysis, which determines the quality of all the results of clustering analysis. How to further improve the clustering efficiency and reduce the cost and burden of users under the premise of ensuring the stability and effectiveness of the algorithm has become a very meaningful research direction. Because the traditional clustering algorithm requires high computer hardware resources, the clustering time of mass data is relatively long. In this paper, a new clustering algorithm based on grid and density is proposed. Generally, the clustering based on grid has the characteristics of saving time cost and high efficiency, but its clustering quality is not very good. The density clustering algorithm can cluster any cluster with different shapes, but it has a high time complexity in processing high-dimensional spatial data. Because of the complementary relationship between the two, the sample space is distinguished based on the combination of grid density. The idea of clustering algorithm in this paper is: firstly, to create grid, to divide the data space into the initial grid, secondly, to divide the sample space, according to the grid density threshold, The data of the grid cells are divided into high and low density areas, and all the grids in the high density region are arranged according to the density to find the most dense grid, and the first high density cluster is found by using the nearest low density grid area around the grid. The point of the first high density cluster is removed, the remaining high density grid is sorted, and then the final space is obtained. Finally, the center of gravity of each subcluster is calculated, and the adjacent center of gravity space is merged to form a new cluster center of gravity. The space is merged in turn until it is equal to a given number of clusters, and the final clustering result is obtained. Firstly, this paper describes the algorithm from the theoretical aspect, and verifies the rationality and scientificity of the algorithm design. Finally, several groups of data are generated randomly by Matlab for empirical analysis. It is verified that the algorithm can significantly improve the operation time under the condition that the sum of squared difference between the two groups is not different from that of the classical K-means algorithm.
【学位授予单位】:西安财经学院
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:C81

【参考文献】

相关期刊论文 前10条

1 韩家炜,孟小峰,王静,李盛恩;Web挖掘研究[J];计算机研究与发展;2001年04期

2 岳士弘,王正友;二分网格聚类方法及有效性[J];计算机研究与发展;2005年09期

3 胡亮;任维武;任斐;刘晓博;金刚;;基于改进密度聚类的异常检测算法[J];吉林大学学报(理学版);2009年05期

4 胡文瑜,孙志挥,周晓云;基于最优K相异性的密度聚类算法研究[J];计算机工程与应用;2005年22期

5 孟海东;宋飞燕;郝永宽;;基于密度与划分方法的聚类算法设计与实现[J];计算机工程与应用;2007年27期

6 李星毅;包从剑;施化吉;奚春海;;基于加权快速聚类的异常数据挖掘算法[J];计算机工程与应用;2007年35期

7 赵卫中;马慧芳;傅燕翔;史忠植;;基于云计算平台Hadoop的并行k-means聚类算法设计研究[J];计算机科学;2011年10期

8 胡吉祥;许洪波;刘悦;程学旗;;重复串特征提取算法及其在文本聚类中的应用[J];计算机工程;2007年02期

9 张玉芳,毛嘉莉,熊忠阳;一种改进的K-means算法[J];计算机应用;2003年08期

10 郑洪英;倪霖;肖迪;;大规模数据集聚类中的数据分区及应用研究[J];计算机应用研究;2007年02期



本文编号:1633868

资料下载
论文发表

本文链接:https://www.wllwen.com/shekelunwen/shgj/1633868.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a4162***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com