一种改进的遗传算法在K-Means聚类分析中的应用
本文选题:分类 切入点:聚类分析 出处:《华北电力大学》2017年硕士论文
【摘要】:一般我们在处理某些特殊事情的时候,往往需要借助工具,将事物分类之后再进行研究,比如地质资源勘探中根据物探、钻探的指标对样本进行分类;考古生物学研究中根据发掘出的骨骸形状和大小将它们分类;气象卫星系统中因为所监测到数据信息非常复杂、庞大,需要将它们按照不同的指标进行分类归整,再进行深入分析,以便做出准确的预报等等,聚类分析因此应运而生。聚类分析技术是一门将具体或抽象的对象划分成不同的类别的科学,其中并没有事先的分类。它也是一种重要的人类行为。随着计算机和信息技术的快速发展,数据信息呈现飞速增长的趋势,作为数据挖掘技术的一种重要手段,聚类分析已经被越来越多的人所关注。K-Means算法是一种基于划分的算法,由于其操作简单,原理通俗易懂的特点,得到了人们的广泛应用和研究,成为十大典型数据挖掘算法之一。但是K-Means算法也有自身的不足,即k值不好确定的问题,初始中心只能随机选择,容易陷入局部最优解等,导致算法很不稳定。因此,本文在K-Means聚类算法基础上引入遗传算法,它是一种通过模仿生物演化过程而开发的进行搜索最优解的方法,具有良好的全局搜索能力。针对两种算法的特点,提出一种改进遗传算法应用于K-Means聚类的混合算法,并用样本数据集进行了仿真实验,实验表明本文算法在应用中得到了良好的聚类效果。本文的工作主要分为两部分:1)第一部分主要简单介绍了聚类分析、K-Means算法和遗传算法的基本概念。重点概述了K-Means算法和遗传算法的基本思想,描述了算法的组成和基本要素以及流程,最后介绍了算法的应用。2)第二部分重点介绍了一种改进的基于遗传算法的K-Means聚类算法,并对此算法在染色体编码、适应度函数的选取、选择、交叉和变异算子的设计和改进、K-Means算法与遗传算法的结合操作等方面进行了全面描述。最后,为了验证本文提出的算法的有效性进行了测试实验,根据实验结果对两种方法进行对比分析,证实了本文方法的可行性和良好的聚类性能。
[Abstract]:In general we deal with some special things, often need to use the tools, the object classification after research, such as geological prospecting according to geophysical prospecting, drilling parameters to classify the samples; biological research according to the archaeological unearthed bones of the shape and size of their classification; meteorological satellite systems for monitoring the data is very complex and huge, they need to be classified according to different indicators of consolidation, then in-depth analysis, in order to make an accurate prediction, clustering analysis came into being. The clustering analysis technology is a concrete or abstract objects are divided into different categories of science, which did not advance classification. It is also a an important human behavior. With the rapid development of computer and information technology, the data has shown a rapid growth trend, as the data mining technology One of the important means of the cluster analysis has been more and more people are concerned about the.K-Means algorithm is a classification algorithm based on the principle, because of its simple operation, user-friendly features, has been widely used and the study of people, to become the ten largest one of the typical data mining algorithm. But K-Means algorithm also has its own shortcomings, i.e. the K value is not easily determined, only random selection of initial centers, easy to fall into the local optimal solution, the algorithm is very unstable. Therefore, this paper introduces the genetic algorithm based on K-Means clustering algorithm, it is a kind of imitation by the process of evolution and development of the method of searching optimal solution, has a good overall search ability. According to the characteristics of the two algorithms, this paper proposes a hybrid algorithm of improved genetic algorithm in the application of K-Means clustering, and the simulation experiments with the sample data set, experiments show this algorithm Get a good clustering effect in the application. The main work of this paper is divided into two parts: 1) the first part mainly introduces the basic concepts of clustering analysis, K-Means algorithm and genetic algorithm. Mainly introduce the basic idea of K-Means algorithm and genetic algorithm, describes the composition and basic elements of the algorithm and process, at the end of the paper the application of.2 algorithm) the second part mainly introduces an improved K-Means clustering algorithm based on genetic algorithm, and this algorithm in the chromosome encoding, adapt to the selection, the selection of fitness function, design and improvement of crossover and mutation operator, combined with the operation of K-Means algorithm and genetic algorithm is described. Finally, in order to verify the validity of the proposed algorithm was tested, the comparative analysis of the two methods according to the experimental results, confirmed the feasibility of this method and good The clustering performance.
【学位授予单位】:华北电力大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18;TP311.13
【参考文献】
相关期刊论文 前10条
1 孟子健;马江洪;;一种可选初始聚类中心的改进k均值算法[J];统计与决策;2014年12期
2 王永贵;林琳;刘宪国;;结合双粒子群和K-means的混合文本聚类算法[J];计算机应用研究;2014年02期
3 冯波;郝文宁;陈刚;占栋辉;;K-means算法初始聚类中心选择的优化[J];计算机工程与应用;2013年14期
4 耿跃;任军号;吉沛琦;;基于K-Means变异算子的混合遗传算法聚类研究[J];计算机工程与应用;2011年29期
5 王颖;刘建平;;基于改进遗传算法的K-means聚类分析[J];工业控制计算机;2011年08期
6 任景彪;尹绍宏;;一种有效的k-means聚类初始中心选取方法[J];计算机与现代化;2010年07期
7 胡_g;毕晋芝;;遗传优化的K均值聚类算法[J];计算机系统应用;2010年06期
8 赖玉霞;刘建平;杨国兴;;基于遗传算法的K均值聚类分析[J];计算机工程;2008年20期
9 葛继科;邱玉辉;吴春明;蒲国林;;遗传算法研究综述[J];计算机应用研究;2008年10期
10 邓莉;鲁瑞华;;一种改进的抑制早熟收敛的模糊遗传算法[J];计算机科学;2007年11期
相关硕士学位论文 前4条
1 李芳;K-Means算法的k值自适应优化方法研究[D];安徽大学;2015年
2 陶晶;基于聚类和密度的离群点检测方法[D];华南理工大学;2014年
3 赵松;数据挖掘中基于遗传算法的K-means聚类算法的研究及应用[D];浙江工业大学;2014年
4 朱建宇;K均值算法研究及其应用[D];大连理工大学;2013年
,本文编号:1719614
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1719614.html