基于SOM算法改进的K-medoids算法及其研究

发布时间：2018-04-23 08:57

本文选题：聚类分析 + SOM算法　；参考：《太原理工大学》2017年硕士论文

【摘要】：科技的飞速发展,引起信息的急剧膨胀,给计算机存储和行业数据库带来巨大挑战。随着数据指数级的增大,维度不断加大,数据类型的复杂度也在不断提升。对于这些超高维数据,需要通过数据挖掘技术来探索隐藏于数据内的信息并利用获取的信息辅助我们做出科学合理的预测与决策。常见处理高维数据方法有:数据降维、聚类分析、回归分析等。本文介绍了传统的自组织映射(SOM)神经网络和K-medoids算法。传统的SOM算法在使用时,存在部分样本点和对应的权向量之间差距较大,造成聚类的准确性较低;K-medoids算法在聚类前需要人为确定聚类个数和初始中心点,而不同的聚类个数和初始中心点的选择会造成不同的聚类结果。为弥补以上两种方法的不足,本文提出一种自组织映射(SOM)神经网络与K-medoids算法结合的算法——改进的SOM-K算法。文中,第一章详细描述了大数据背景下,聚类和降维算法的研究意义;第二章主要讲述了基于聚类算法距离的定义;第三章主要阐述传统的K-medoids算法和SOM算法;第四章主要说明了本文提出的基于SOM算法与K-medoids算法的改进聚类算法并比较了传统的K-medoids算法、SOM算法和SOM-K算法对鸢尾花数据集的聚类结果,证实了 SOM-K算法是优于传统的K-medoids算法和SOM算法的一种算法;第五章用SOM-K算法对于全国水资源分布进行聚类分析并结合分析结果给出详细的结论阐述;第六章进行总结与展望,阐明改进算法的优势与不足,以便后续继续学习与探究。
[Abstract]:The rapid development of science and technology, causing the rapid expansion of information, computer storage and industry database brings great challenges. With the increase of data exponential level, the dimension is increasing, and the complexity of data type is also increasing. For these ultra-high dimensional data, we need to explore the information hidden in the data through data mining technology and use the obtained information to help us to make scientific and reasonable prediction and decision-making. Common methods to deal with high-dimensional data are: data dimension reduction, cluster analysis, regression analysis and so on. This paper introduces the traditional self-organizing mapping SOM) neural network and K-medoids algorithm. When the traditional SOM algorithm is used, there is a big gap between the partial sample points and the corresponding weight vectors, so the accuracy of the clustering algorithm is lower than that of the K-medoids algorithm. Before clustering, the number of clusters and the initial center points need to be determined artificially. Different clustering numbers and initial centers will result in different clustering results. In order to make up for the shortcomings of the above two methods, this paper presents an improved SOM-K algorithm, which combines the self-organizing mapping (SM) neural network with the K-medoids algorithm. In the first chapter, the research significance of clustering and dimensionality reduction algorithm under big data background is described in detail; the second chapter mainly describes the definition of distance based on clustering algorithm; the third chapter mainly describes the traditional K-medoids algorithm and SOM algorithm; In chapter 4, the improved clustering algorithm based on SOM algorithm and K-medoids algorithm is introduced, and the clustering results of traditional K-medoids algorithm and SOM-K algorithm for Iris data set are compared. It is proved that the SOM-K algorithm is superior to the traditional K-medoids algorithm and the SOM algorithm. Chapter 5 uses the SOM-K algorithm to cluster the distribution of water resources in China and gives a detailed conclusion. Clarify the advantages and disadvantages of the improved algorithm, so as to continue to learn and explore.
【学位授予单位】：太原理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13;TP183

【参考文献】