高维数据流快速降维聚类算法研究
[Abstract]:With the explosive growth of data, it is more difficult to find valuable information from data and transform it into organized knowledge, so data mining emerges as the times require. As one of the important research methods of data mining, clustering analysis is widely used in many fields. With the continuous development of information technology, data flow has become a new data type, and gradually become the mainstream. Therefore, the research on clustering algorithm of data flow becomes hot and meaningful. The clustering algorithm of high-dimensional data flow includes two parts: reduction and clustering. In this paper, aiming at the shortcomings of the existing dimensionality reduction algorithm and clustering algorithm, an improved algorithm is proposed, and the advantages of the improved algorithm are proved by experiments. In this paper, on the basis of others, the high-dimensional data carrier space dimension reduction algorithm can not automatically adjust the dimensionality reduction results according to the dynamic changes of the data stream and needs to scan the data stream many times. An adaptive dimension reduction algorithm for high dimensional data carrier space based on structure tree is proposed. By improving the relative entropy to find the correlation dimension of the region, the algorithm establishes the corresponding subspace, and implements clustering in the subspace to ensure that different regions correspond to different subspaces. Using relative entropy to find regional correlation dimension is simpler and more natural than Sun Yufen's GSCDS algorithm. At the same time, the structure tree is used to save the relevant information of the partition process, and combined with the idea of backtracking algorithm, the adaptive function of high dimensional data carrier space clustering algorithm is realized. It avoids the embarrassment that the algorithm needs to rerun the subspace algorithm every time it faces the new data, and the use of the attenuation factor also avoids the excessive influence of the old data on the clustering results. The experimental results show that the algorithm achieves high clustering quality with small time complexity. The clustering algorithm based on grid is applied to the clustering processing of dimension reduction results, which preserves the advantages of efficient grid algorithm and strong adaptive ability, but the classification of grid leads to the problem of low precision of class edge, which affects the clustering quality. In this paper, an improved data flow clustering algorithm is proposed to solve the problems of low cluster edge accuracy and multiple scanning of grid to realize clustering in grid-based data flow clustering algorithm. The algorithm is mainly improved in two aspects: firstly, in the initial clustering stage, the method from inside to outside and from point to surface is used to complete clustering by scanning grid at one time to solve the problem of low efficiency caused by repeatedly scanning grid in the original algorithm; Then, by finding the maximum density connected set to distinguish the noise points and useful points in the edge area to the maximum extent, the problem of missing edge points in the original algorithm can be solved. Finally, the experimental results show that the improved algorithm has a good effect on improving the edge accuracy of the class, and has a good adaptability to the distribution of data.
【学位授予单位】:长沙理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
【参考文献】
相关期刊论文 前10条
1 贾瑞玉;李振;;基于最小生成树的层次K-means聚类算法[J];微电子学与计算机;2016年03期
2 高亮;谢健;曹天泽;;基于Kd树改进的高效K-means聚类算法[J];计算技术与自动化;2015年04期
3 邢长征;刘剑;;基于近邻传播与密度相融合的进化数据流聚类算法[J];计算机应用;2015年07期
4 王彩霞;;基于改进引力搜索的混合K-调和均值聚类算法研究[J];计算机应用研究;2016年01期
5 支晓斌;许朝晖;;鲁棒的特征权重自调节软子空间聚类算法[J];计算机应用;2015年03期
6 亢红领;李明楚;焦栋;郭成;徐淑珍;;一种基于属性相关度的子空间聚类算法[J];小型微型计算机系统;2015年02期
7 高兵;张健沛;邹启杰;;基于共享最近邻密度的演化数据流聚类算法[J];北京科技大学学报;2014年12期
8 邢长征;王晓旭;;基于扩展网格和密度的数据流聚类算法[J];计算机工程;2014年12期
9 刘波;王红军;成聪;杨燕;;基于属性最大间隔的子空间聚类[J];南京大学学报(自然科学);2014年04期
10 王治和;杨晏;;基于双层网格和密度的数据流聚类算法[J];计算机工程;2014年04期
相关博士学位论文 前4条
1 王平水;基于聚类的匿名化隐私保护技术研究[D];南京航空航天大学;2013年
2 赵旭剑;中文新闻话题动态演化及其关键技术研究[D];中国科学技术大学;2012年
3 魏小涛;在线自适应网络异常检测系统模型与相关算法研究[D];北京交通大学;2009年
4 单世民;基于网格和密度的数据流聚类方法研究[D];大连理工大学;2006年
相关硕士学位论文 前10条
1 王理想;子空间高维聚类算法的研究[D];重庆理工大学;2015年
2 胡国辉;基于不规则网格的高维数据流聚类算法研究[D];燕山大学;2014年
3 张焯;基于聚类的软件模块缺陷预测方法研究[D];重庆大学;2014年
4 杨志;基于粒子群的粗糙聚类算法分析与研究[D];长沙理工大学;2014年
5 白云悦;基于DBSCAN和相似度的子空间聚类算法研究[D];燕山大学;2013年
6 郑燕;基于增量学习的自适应话题追踪技术研究[D];山东师范大学;2013年
7 廖浩伟;基于网页结构聚类的Web信息提取技术研究[D];西南交通大学;2013年
8 靳艳虹;基于PSO的基因表达数据聚类研究[D];中南大学;2013年
9 张井;高维数据子空间聚类算法研究[D];天津大学;2012年
10 刘之岗;基于有效维选择的子空间聚类算法研究[D];燕山大学;2012年
,本文编号:2479822
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2479822.html