基于膜系统的多关系聚类算法的研究与应用

发布时间：2018-10-12 10:56

【摘要】：膜系统是自然计算领域中的一个年轻的分支,受器官、组织、细胞及其他生物构造中化学元素处理方法的启发而从中抽象出的分布式并行计算模型。由于具有并行性强、容错性强和分布式等特性,膜系统在众多领域得到了普遍的应用,并且已经解决了众多的现实问题。传统的聚类方法通常假设数据之间是相互独立的,然而,现在大部分的应用数据存储在关系数据库以多关系的形式。传统的聚类方法已不再能满足现在应用数据的要求,本文针对多关系聚类存在聚类质量差和聚类效率低的问题,展开了深入的研究。本文以膜系统为基础模型,首先提出了一种初始中心选取的方法对K-means聚类算法进行优化改进,然后在此基础上提出了两种高效的多关系聚类算法,并将提出的算法应用于协同过滤推荐系统:(1)基于初始聚类中心优化的K-means算法(OIK-means算法)。该算法首先根据相似性计算每个对象的密度,然后通过计算对象与任意高密度对象的最小距离来筛选候选中心,接着通过平均密度来排除离群点,最后确定K初始中心点。OIK-means算法在人工数据集和UCI数据集上进行测验,并与传统的K-means算法在初始中心选取的准确性上进行了对比。(2)基于综合相似性的多关系聚类算法(ISMC)。算法使用元组ID传播的思想,为关系数据库中的每个表设置一个权重,对传统的相似性计算进行改进,按照一定的权重把对象的类内相似性和类外相似性整合成综合相似性,基于综合相似性对目标表中的对象进行OIK-means聚类。ISMC算法在UCI数据集Movie上进行了测验,并与TPC、ReCOM、LinkClus算法进行了比较。(3)基于膜系统的遗传K-means多关系聚类算法(GKM)。算法从膜系统与多关系聚类算法相结合的新角度出发,设计了由三个细胞组成的进化-交流组织型P系统,并在三个细胞中使用了三种不同的遗传进化机制,这种混合遗传机制能够改善算法的收敛性和增强对象的多样性,使多关系数据集能有一个准确的聚类。GKM算法在UCI数据集Movie上进行了测验,并与ReCOM、LinkClus、ISMC算法进行了比较。(4)将基于膜系统的多关系聚类应用于协同过滤推荐系统中,提出了一个基于膜系统和多关系聚类的高效的协同过滤推荐方法(MCMCF)。该方法充分利用了膜系统的极大并行(Max)和分布式执行的特点,综合相似性计算方法使得数据稀疏性问题得到有效解决,多关系聚类也有效的缩减了近邻的搜索规模,提高了算法的推荐质量和运行效率。
[Abstract]:Membrane system is a young branch in the field of natural computing. It is an abstract distributed parallel computing model inspired by the processing methods of chemical elements in organs, tissues, cells and other biological structures. Because of its strong parallelism, fault tolerance and distributed characteristics, membrane systems have been widely used in many fields, and many practical problems have been solved. Traditional clustering methods usually assume that the data are independent of each other. However, most of the application data are stored in the relational database in the form of multiple relationships. The traditional clustering method can no longer meet the requirements of the current application data. This paper focuses on the problems of poor clustering quality and low clustering efficiency in multi-relational clustering. In this paper, based on the membrane system model, an initial center selection method is proposed to optimize and improve the K-means clustering algorithm, and then two efficient multi-relational clustering algorithms are proposed. The proposed algorithm is applied to collaborative filtering recommendation system: (1) K-means algorithm based on initial clustering center optimization (OIK-means algorithm). The algorithm first calculates the density of each object according to the similarity, then selects the candidate center by calculating the minimum distance between the object and any high-density object, and then excludes outliers by the average density. Finally, the initial center point of K is determined. OIK-means algorithm is tested on artificial data set and UCI data set, and compared with the traditional K-means algorithm in the accuracy of initial center selection. (2) the multi-relation clustering algorithm (ISMC). Based on synthetic similarity is proposed. Using the idea of tuple ID propagation, the algorithm sets a weight for each table in relational database, improves the traditional similarity calculation, and integrates the intra-class similarity and out-of-class similarity of objects into comprehensive similarity according to certain weights. Based on the synthetic similarity, the objects in the target table are clustered by OIK-means. The ISMC algorithm is tested on the UCI dataset Movie, and compared with the TPC,ReCOM,LinkClus algorithm. (3) the genetic K-means multi-relation clustering algorithm (GKM). Based on the membrane system is proposed. From the view of the combination of membrane system and multi-relation clustering algorithm, an evolution-alternating tissue P system composed of three cells was designed, and three different genetic evolutionary mechanisms were used in the three cells. This hybrid genetic mechanism can improve the convergence of the algorithm and enhance the diversity of objects, so that there can be an accurate clustering of multi-relational datasets. The GKM algorithm is tested on the UCI dataset Movie. And compared with ReCOM,LinkClus,ISMC algorithm. (4) Multi-relational clustering based on membrane system is applied to collaborative filtering recommendation system, and an efficient collaborative filtering recommendation method (MCMCF). Based on membrane system and multi-relational clustering is proposed. This method makes full use of the characteristics of the maximal parallel (Max) and distributed execution of the membrane system. The synthetic similarity calculation method can effectively solve the problem of data sparsity, and the multi-relation clustering can effectively reduce the search scale of the nearest neighbor. The recommended quality and efficiency of the algorithm are improved.
【学位授予单位】：山东师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【参考文献】