基于参考点的演化聚类算法研究

发布时间：2018-03-11 21:01

本文选题：演化聚类　切入点：时间平滑性　出处：《中国科学技术大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着互联网的蓬勃发展和人们采集数据能力的增强,实际生活中出现了越来越多的随时间变化的数据,我们称这类数据为演化数据(EvolutionaryData)。近年来,演化数据的聚类问题引起了很多研究者的兴趣。一般地,演化聚类的要求有两个方面:1)每个时刻发现的聚类结构要尽可能好地划分当前时刻的快照数据;2)每个时刻发现的聚类结构要尽可能保持时间平滑性,即和上个时刻相比,当前时刻发现的聚类结构尽量不发生太大的变化。演化数据聚类有很广泛的应用背景,其研究有着很重要的意义。本文从核心节点和参考点的角度来研究演化聚类。本文的主要内容包括三个方面。1)受静态社区发现算法Top Leaders启发,我们提出一个基于核心节点(LeaderNodes)的演化社区发现算法(EvoLeaders)。首先,我们基于结合时间信息的更新策略来得到每个时刻的初始核心节点。通过保持发现的初始核心节点集合与上个时刻核心节点集合的时间平滑性,来保证由这些核心节点发现的社区跟以前的结构尽量保持平滑。然后,通过一组分裂合并操作提高社区质量。在两个实际数据集上的实验结果表明,EvoLeaders算法比Top Leaders算法效果更好。该工作表明了从核心节点的角度进行演化社区发现的可行性。2)Top Leaders算法的主要缺点是需要人工输入社区数目。基于网络中每个节点与其邻居节点之间度的关系,以及节点之间共同邻居的重叠程度,我们改进了 Top Leaders算法,并提出了能够自动发现社区数目的AutoLeaders算法。在三个经典数据集上的实验结果表明,AutoLeaders算法不仅能够发现合理的社区数目,还能够发现合理的社区结构。进一步,基于两种时间平滑性策略,我们提出了在动态网络中发现社区的新的解决方案,即EvoAutoLeaders算法。在两个实际数据集上的结果表明EvoAutoLeaders算法的效果比较好。3)我们从参考点的角度来处理演化聚类问题。首先,我们引入了三种不同的参考点,以及相应的计算个体到参考点距离的策略。然后,基于r-dominance关系和多目标演化算法,提出了一个演化聚类算法(即rEvoC算法)。实验结果证明,与经典算法相比,rEvoC算法更适合聚类演化数据,而且能够取得更好的效果。总的来说,我们从核心节点和参考点的角度来处理演化数据聚类问题,并且通过实验证明了其有效性,而且效果比经典算法更优。本文的工作对演化社区发现和演化数据聚类方法研究方面具有一定的参考价值。
[Abstract]:With the rapid development of the Internet and the enhancement of people's ability to collect data, more and more data have changed over time in real life, which we call evolutionary data in recent years. The clustering of evolutionary data has aroused the interest of many researchers. The evolutionary clustering requirement has two aspects: 1) the cluster structure discovered at each moment should be as well divided as possible into the snapshot data of the current moment. (2) the clustering structure discovered at each moment should be as smooth as possible, that is, compared with the previous moment. The clustering structure discovered at present is as little as possible. Evolutionary data clustering has a wide range of applications. This paper studies evolutionary clustering from the point of view of core nodes and reference points. The main contents of this paper include three aspects. 1) inspired by the static community discovery algorithm Top Leaders. We propose an evolutionary community discovery algorithm based on core node LeaderNodes.First, We obtain the initial core nodes at each moment based on the update strategy combined with time information, by maintaining the temporal smoothness of the initial core node set and the core node set at the last moment. To ensure that the community found by these core nodes remains as smooth as possible from the previous structure. Then, The experimental results on two real data sets show that the EvoLeaders algorithm is more effective than the Top Leaders algorithm. This work shows that the EvoLeaders algorithm can be found in the evolutionary community from the point of view of the core nodes. The main drawback of the row. 2n Leaders algorithm is the need to manually input the number of communities. Based on the degree relationship between each node in the network and its neighbor node, And the degree of overlap of the common neighbors between nodes, we improve the Top Leaders algorithm, The experimental results on three classical data sets show that the AutoLeaders algorithm can not only find the reasonable number of communities, but also find the reasonable community structure. Based on two time smoothing strategies, we propose a new solution for community discovery in dynamic networks. That is, EvoAutoLeaders algorithm. The results on two actual data sets show that the effect of EvoAutoLeaders algorithm is better. 3) We deal with the evolutionary clustering problem from the point of view of reference points. First, we introduce three different reference points. Then, based on r-dominance relation and multi-objective evolutionary algorithm, an evolutionary clustering algorithm (i.e. rEvoC algorithm) is proposed. Compared with the classical algorithm, the rEvoC algorithm is more suitable for clustering evolutionary data and can achieve better results. In general, we deal with the problem of evolutionary data clustering from the point of view of core nodes and reference points, and the experimental results show that the algorithm is effective. And the result is better than the classical algorithm. The work of this paper has some reference value for the research of evolutionary community discovery and evolutionary data clustering method.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【相似文献】