基于CTM模型与K-means算法融合的文本聚类研究

发布时间：2018-03-01 05:13

本文关键词： K-means算法 CTM模型融合文本聚类　出处：《情报理论与实践》2017年11期 　论文类型：期刊论文

【摘要】：[目的/意义]针对CTM模型与K-means算法在文本聚类中存在的不足,提出将二者融合的CKM算法。该算法利用CTM模型确定聚类数和初始聚类中心,利用K-means算法进行聚类。[方法/过程]以Web of Science为数据来源,建立涵盖6个主题的多学科文献集,以F值为评价指标,利用R语言中的k-means函数和topicmodels包,比较K-means算法、CTM模型、CKM算法的聚类效果。[结果 /结论]结果表明,与单纯使用K-means算法、CTM模型相比,CKM算法的聚类效果更优,稳定性更好。
[Abstract]:[objective / significance] in view of the shortcomings of CTM model and K-means algorithm in text clustering, a CKM algorithm combining the two algorithms is proposed. The CTM model is used to determine the clustering number and initial clustering center. Using K-means algorithm to cluster. [method / process] taking Web of Science as data source, establishing a multidisciplinary literature set covering 6 topics, taking F value as evaluation index, using k-means function and topicmodels package in R language. The clustering effect of K-means algorithm is compared with that of K-means algorithm. [results / conclusion] the results show that the clustering effect of CKM algorithm is better and the stability is better than that of using K-means algorithm alone.
【作者单位】：曲阜师范大学传媒学院;
【基金】：国家社会科学基金项目“文献内容分析与引文分析融合的知识挖掘与发现研究”的成果,项目编号:16BTQ074
【分类号】：G254

【相似文献】