基于属性权重的混合聚类算法研究

发布时间：2018-03-24 17:14

本文选题：FCM算法　切入点：粒子群算法　出处：《西南大学》2017年硕士论文

【摘要】：聚类分析的目标是在相似的基础上收集数据进行分类,使得各个类之间的数据差别应尽可能大,类内之间的数据差别应尽可能小,即为算法的选择取决于数据的类型、聚类的目的和应用方向。例如k-means、BIRCH、CURE、DBSCAN、COBWEB等,对于相同的数据集,使用不同的聚类算法可能有不同的划分结果。FCM算法是目前应用最为广泛的聚类算法。研究发现,传统FCM算法存在两个不足:第一,算法从样本点出发,通过优化目标函数计算各样本点对于类中心的隶属度,从而达到自动分类的目的,如果初始值选择不当就会导致算法收敛到局部极小点;第二,聚类分析处理数据样本的各维属性贡献度是不一样的,FCM算法采用标准的欧式距离进行计算忽略了属性权重值对聚类结果的影响。因此从本质上来讲,FCM算法是一种局部搜索的优化算法。基于以上分析,论文提出了基于属性权重的混合聚类算法。主要研究内容如下:(1)将“粒子演化”策略结合“粒子分组及重组”引入粒子群算法,得到改进的粒子群优化算法,为属性权重的求取提供了算法基础。(2)属性权重学习算法实现:在改进的粒子群优化算法中,将粒子的位置向量作为属性权重向量,选用交叉熵作为属性权重评价函数,利用梯度下降法极小化属性权重评价函数,通过迭代最终得到一组最优的属性权重值。(3)混合聚类算法实现:将遗传算法和模拟退火算法相结合,引入FCM聚类算法,初始聚类中心映射成染色体,目标函数作为遗传算法的适应度函数,经过选择、交叉和变异,运用FCM聚类算法计算聚类中心、隶属度及个体适应度值,利用模拟退火算法以一定概率接受新个体,通过迭代,最终得到全局最优解。
[Abstract]:The goal of clustering analysis is to collect data for classification on a similar basis, so that the data differences between each class should be as large as possible, and the data differences between classes should be as small as possible, that is, the choice of algorithm depends on the type of data. For the same data set, different clustering algorithms may have different partition results. FCM algorithm is the most widely used clustering algorithm. The traditional FCM algorithm has two disadvantages: first, the algorithm calculates the membership degree of each sample point to the center of the class by optimizing the objective function from the sample point, so as to achieve the purpose of automatic classification. If the initial value is not chosen properly, the algorithm will converge to a local minimum. Second, In clustering analysis, the contribution of attributes in different dimensions of data samples is different. The standard Euclidean distance algorithm is used to calculate and ignore the influence of attribute weights on clustering results. Therefore, FCM algorithm is essentially a kind of FCM algorithm. Local search optimization algorithm. Based on the above analysis, In this paper, a hybrid clustering algorithm based on attribute weight is proposed. The main research contents are as follows: (1) the particle evolution strategy combined with "particle grouping and recombination" is introduced into the particle swarm optimization algorithm, and the improved particle swarm optimization algorithm is obtained. In the improved particle swarm optimization algorithm, the particle position vector is used as the attribute weight vector, and the cross-entropy is chosen as the attribute weight evaluation function. Using gradient descent method to minimize the attribute weight evaluation function, a group of optimal attribute weight value. 3) hybrid clustering algorithm is obtained by iteration. The genetic algorithm is combined with simulated annealing algorithm, and the FCM clustering algorithm is introduced. The initial cluster center is mapped to chromosome, and the objective function is used as the fitness function of genetic algorithm. After selection, crossover and mutation, FCM clustering algorithm is used to calculate the clustering center, membership degree and individual fitness. The simulated annealing algorithm is used to accept the new individuals with a certain probability, and the global optimal solution is obtained by iteration.
【学位授予单位】：西南大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【参考文献】