当前位置:主页 > 科技论文 > 软件论文 >

基于密度的聚类集成

发布时间:2018-04-21 13:29

  本文选题:聚类集成 + 半监督聚类集成 ; 参考:《西南交通大学》2017年硕士论文


【摘要】:伴随互联网技术的迅猛进步,社会步入了大数据时代。人类日常生活中产生了大量的数据。未来,无论是在哪个领域,越来越多的决策将依赖于数据分析。如何合理高效地分析大量数据,找到数据背后有价值的信息,成为新的关注点。聚类集成融合了聚类以及集成学习两种技术。使用此类模型处理问题能够提升最终结果的准确率、鲁棒性以及稳定性。集成过程中,增加半监督信息,可以得到新的模型:半监督聚类集成模型。在某些特定条件下,此模型获得的聚类结果可能会优于无监督聚类集成模型。本文选择近邻传播(Affinity propogation,AP)算法作为基聚类器,在实验过程中,多次设置不同输入参数,以此来获得不一致的基聚类结果。随后使用改进的最大信息系数(Rapid computation of the maximal information coefficient,RapidMic)计算各基聚类结果之间的相关性,用相似性矩阵表示。选取此矩阵来展示样本数据集的密度关系。本文借助等距映射(Isometric feature mapping,Isomap)进行降维,例证样本数据的密度关系能够透过基聚类结果来揭示。通过对密度峰值(Density peaks,DP)算法进行改进,本文设计出可以自动选取拥有较大密度峰值的几个点作为聚类中心的k_DP算法。然后基于此,得到一种新的聚类集成算法KDPE。实验表明,相较几种经典模型,KDPE能够获得更好的聚类集成效果。最后,本文尝试将半监督信息加入新模型,试图通过这种方式改善聚类集成效果。改进DP得到semi_DP后,依赖于这一新的方法,设计得到一种新的半监督聚类集成算法SDPE。通过对比实验,发现在某些特定的半监督比例下,SDPE可以优化聚类效果,一定程度上提升KDPE的表现。
[Abstract]:With the rapid progress of Internet technology, the society has stepped into the era of big data. A great deal of data is produced in human daily life. In the future, no matter where, more and more decisions will depend on data analysis. How to analyze a large amount of data reasonably and efficiently and find valuable information behind the data has become a new concern. Clustering integration combines two technologies: clustering and integrated learning. Using this model to deal with problems can improve the accuracy, robustness and stability of the final results. In the process of integration, a new model, semi-supervised clustering integration model, can be obtained by adding semi-supervised information. Under some special conditions, the clustering results obtained by this model may be superior to those of unsupervised clustering ensemble model. In this paper, we choose the Affinity propoation (AP) algorithm as a base clustering device. In the experiment, different input parameters are set many times to obtain inconsistent clustering results. Then the correlation between the results of each base clustering was calculated by using the improved computation of the maximal information coefficient Rapid Mici, which is expressed by similarity matrix. Select this matrix to show the density relation of the sample data set. In this paper, dimension reduction is carried out by means of isometric feature mapping. The density relationship of sample data can be revealed by the result of base clustering. By improving the Density peaks-dpp algorithm, this paper designs a k_DP algorithm which can automatically select several points with high density peak as the clustering center. Based on this, a new clustering integration algorithm, KDPE, is proposed. The experimental results show that KDPE can obtain better clustering integration effect than several classical models. Finally, this paper attempts to add semi-supervised information to the new model and try to improve the clustering integration effect by this way. After improving DP to get semi_DP, a new semi-supervised clustering algorithm SDPe is designed based on this new method. Through comparative experiments, it is found that the clustering effect can be optimized under certain semi-supervised ratios, and the performance of KDPE can be improved to a certain extent.
【学位授予单位】:西南交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13

【相似文献】

相关硕士学位论文 前1条

1 褚睿鸿;基于密度的聚类集成[D];西南交通大学;2017年



本文编号:1782635

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1782635.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户841a6***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com