一种改进的遗传算法在K-Means聚类分析中的应用

发布时间：2018-04-07 14:52

本文选题：分类　切入点：聚类分析　出处：《华北电力大学》2017年硕士论文

【摘要】：一般我们在处理某些特殊事情的时候,往往需要借助工具,将事物分类之后再进行研究,比如地质资源勘探中根据物探、钻探的指标对样本进行分类;考古生物学研究中根据发掘出的骨骸形状和大小将它们分类;气象卫星系统中因为所监测到数据信息非常复杂、庞大,需要将它们按照不同的指标进行分类归整,再进行深入分析,以便做出准确的预报等等,聚类分析因此应运而生。聚类分析技术是一门将具体或抽象的对象划分成不同的类别的科学,其中并没有事先的分类。它也是一种重要的人类行为。随着计算机和信息技术的快速发展,数据信息呈现飞速增长的趋势,作为数据挖掘技术的一种重要手段,聚类分析已经被越来越多的人所关注。K-Means算法是一种基于划分的算法,由于其操作简单,原理通俗易懂的特点,得到了人们的广泛应用和研究,成为十大典型数据挖掘算法之一。但是K-Means算法也有自身的不足,即k值不好确定的问题,初始中心只能随机选择,容易陷入局部最优解等,导致算法很不稳定。因此,本文在K-Means聚类算法基础上引入遗传算法,它是一种通过模仿生物演化过程而开发的进行搜索最优解的方法,具有良好的全局搜索能力。针对两种算法的特点,提出一种改进遗传算法应用于K-Means聚类的混合算法,并用样本数据集进行了仿真实验,实验表明本文算法在应用中得到了良好的聚类效果。本文的工作主要分为两部分:1)第一部分主要简单介绍了聚类分析、K-Means算法和遗传算法的基本概念。重点概述了K-Means算法和遗传算法的基本思想,描述了算法的组成和基本要素以及流程,最后介绍了算法的应用。2)第二部分重点介绍了一种改进的基于遗传算法的K-Means聚类算法,并对此算法在染色体编码、适应度函数的选取、选择、交叉和变异算子的设计和改进、K-Means算法与遗传算法的结合操作等方面进行了全面描述。最后,为了验证本文提出的算法的有效性进行了测试实验,根据实验结果对两种方法进行对比分析,证实了本文方法的可行性和良好的聚类性能。
[Abstract]:In general we deal with some special things, often need to use the tools, the object classification after research, such as geological prospecting according to geophysical prospecting, drilling parameters to classify the samples; biological research according to the archaeological unearthed bones of the shape and size of their classification; meteorological satellite systems for monitoring the data is very complex and huge, they need to be classified according to different indicators of consolidation, then in-depth analysis, in order to make an accurate prediction, clustering analysis came into being. The clustering analysis technology is a concrete or abstract objects are divided into different categories of science, which did not advance classification. It is also a an important human behavior. With the rapid development of computer and information technology, the data has shown a rapid growth trend, as the data mining technology One of the important means of the cluster analysis has been more and more people are concerned about the.K-Means algorithm is a classification algorithm based on the principle, because of its simple operation, user-friendly features, has been widely used and the study of people, to become the ten largest one of the typical data mining algorithm. But K-Means algorithm also has its own shortcomings, i.e. the K value is not easily determined, only random selection of initial centers, easy to fall into the local optimal solution, the algorithm is very unstable. Therefore, this paper introduces the genetic algorithm based on K-Means clustering algorithm, it is a kind of imitation by the process of evolution and development of the method of searching optimal solution, has a good overall search ability. According to the characteristics of the two algorithms, this paper proposes a hybrid algorithm of improved genetic algorithm in the application of K-Means clustering, and the simulation experiments with the sample data set, experiments show this algorithm Get a good clustering effect in the application. The main work of this paper is divided into two parts: 1) the first part mainly introduces the basic concepts of clustering analysis, K-Means algorithm and genetic algorithm. Mainly introduce the basic idea of K-Means algorithm and genetic algorithm, describes the composition and basic elements of the algorithm and process, at the end of the paper the application of.2 algorithm) the second part mainly introduces an improved K-Means clustering algorithm based on genetic algorithm, and this algorithm in the chromosome encoding, adapt to the selection, the selection of fitness function, design and improvement of crossover and mutation operator, combined with the operation of K-Means algorithm and genetic algorithm is described. Finally, in order to verify the validity of the proposed algorithm was tested, the comparative analysis of the two methods according to the experimental results, confirmed the feasibility of this method and good The clustering performance.

【学位授予单位】：华北电力大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18;TP311.13

【参考文献】