联合聚类算法研究及应用
发布时间:2018-03-18 18:34
本文选题:联合聚类 切入点:非负矩阵分解 出处:《浙江大学》2012年硕士论文 论文类型:学位论文
【摘要】:聚类分析技术以研究对象之间的相似性为基础,将具有类似模式的对象在茫茫的数据集中聚集成多个不同的类。多年来,聚类分析是被国内外专家学者深入研究和学习,提出了很多优秀的方法,取得了很多不错的成果,使得这项效果显著、功能强大的数据挖掘分析技术得到了很大的发展。近年来,随着计算机技术的日新月异,互联网行业的飞速发展,数据信息越来越丰富,规模越来越庞大,人们逐渐的发现,传统的基于单一类型的聚类技术由于其自身存在的伸缩性能较差、处理多类型数据能力匮乏等缺点,已经越来越不能满足用户的需求。在这样的背景下,针对二类型乃至多类型数据的联合聚类技术应运而生。 多类型联合聚类技术近年来吸引了越来越多的眼球,这项技术应用广泛,能在基因分析、搜索引擎、电子商务等多个领域发挥极大的作用,但其发展仍然有很大局限性和不成熟性。本文就此课题开展研究,主要做了四方面的工作:(1)简单的介绍了聚类分析技术的历史背景、研究意义以及国内外的研究现状,深入分析已有的聚类分析技术的发展情况,仔细剖析了这些技术的优缺点。(2)基于对这些已有的优秀的聚类技术的分析和理解,本文建立了一种基于EM迭代更新的非负矩阵分解(Tri-NMF)的模型,该模型结合了复杂谱图划分原理以及基于准则划分原理的长处,同时加入权重调整因子,使得模型在综合了两者优点的同时又能针对不同的数据进行灵活的调整。(3)在此模型的理论基础之上,建立了一套基于Tri-NMF模型的联合聚类算法族,囊括了二类型乃至多类型数据联合聚类的硬分析方法和软分析方法。(4)为了验证系统的有效性和实用性,本文抽取了两个标准数据集进行了充分细致的实验。实验结果显示,在准确率(AC)和归一化互信息(NMI)两个经典的被广泛采用作为聚类分析技术衡量指标的表现上,本文提出的联合聚类方法族都要优于其他几种已有的优秀的聚类分析技术。这些都证明了本文提出的基于Tri-NMF模型的联合聚类算法族的有效性和正确性,以及良好的伸缩性能,因此具有很好的实用价值和应用前景。
[Abstract]:Clustering analysis technology is based on the similarity between the research objects, the objects with similar patterns are clustered into different classes in the vast data set. For many years, clustering analysis has been deeply studied and studied by experts and scholars at home and abroad. Many excellent methods have been put forward, and many good results have been achieved, which make the technology of data mining and analysis with remarkable effect and powerful function have been greatly developed. In recent years, with the rapid development of computer technology, With the rapid development of the Internet industry, the data information is more and more abundant and the scale is more and more large. People have gradually found that the traditional clustering technology based on single type has poor scalability because of its own existence. The lack of ability to deal with multiple types of data has become increasingly unable to meet the needs of users. In this context, the United clustering technology for two types and even multiple types of data emerged as the times require. Multi-type combined clustering technology has attracted more and more attention in recent years. This technology is widely used and can play a great role in many fields, such as gene analysis, search engine, electronic commerce and so on. However, the development of cluster analysis is still limited and immature. In this paper, four aspects of research are carried out, including the historical background of cluster analysis, the significance of cluster analysis, and the current research situation at home and abroad. Based on the analysis and understanding of these excellent clustering techniques, the advantages and disadvantages of these techniques are analyzed. In this paper, a non-negative matrix decomposition Tri-NMF-based model based on EM iteration is established. The model combines the advantages of the principle of complex spectral graph partitioning and the principle of criterion partitioning, and adds a weight adjustment factor. The model not only integrates the advantages of the two methods, but also adjusts the different data flexibly. On the basis of the theory of this model, a set of joint clustering algorithms based on Tri-NMF model is established. In order to verify the effectiveness and practicability of the system, two standard data sets are extracted for detailed experiments. In terms of accuracy rate (AC) and normalized mutual information (NMI), two classical methods are widely used as indicators for cluster analysis. The proposed joint clustering method family is superior to other excellent clustering analysis techniques, which prove the validity and correctness of the proposed joint clustering algorithm family based on Tri-NMF model, and its good scalability. Therefore, it has good practical value and application prospect.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.13
【引证文献】
相关硕士学位论文 前1条
1 张秀秀;基于图像服装检索系统设计与实现[D];电子科技大学;2013年
,本文编号:1630800
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1630800.html