基于自适应图的半监督流形正则化分类学习框架研究

发布时间：2019-05-12 18:30

【摘要】：半监督分类学习是机器学习中一个重要研究领域,目前已有大量半监督分类学习算法被相继提出。但在真实的学习任务中,研究者很难决定究竟选择哪一种方法。据我们所知,目前并没有相关的理论或经验指导。另外,流形正则化(Manifold regularization,MR)为半监督分类学习提供了强有力的学习框架,但是传统流形正则化方法存在的两个问题:1)流形正则化方法通常事先构建流形结构图,并且在学习的过程中固定。流形结构图的构建与分类学习的过程相互独立,所建图并不一定有利于后续分类。2)图的构建过程中存在若干可调参数,但是目前半监督学习中仍然缺乏参数选择的有效解决方法,这为流形结构图的构建带来一定的障碍。因此,本文研究的内容主要包含以下两个部分:首先,为在半监督分类方法的选择方面给出经验指导,对比了典型的半监督分类方法。因现有半监督分类方法可按照所采用的数据分布假设进行划分,因此本文以最小二乘(Least Squares,LS)方法为基准,研究比较了基于聚类假设的转导支持向量机(Transductive Support Vector Machine,TSVM)、基于流形假设的正则化最小二乘分类(Laplacian Regularized Least Squares Classification,LapRLSC)方法、同时利用两种假设的SemiBoost以及不利用任何假设的隐式约束最小二乘(Implicitly Constrained Semi-supervised Least Squares,ICLS)方法的分类性能。获得如下研究结论:1)在已知数据分布的情况下,利用相应数据分布假设的方法可保证较好的分类性能;2)在对数据分布没有任何先验知识且样本数量有限的情况下,TSVM能够达到较高的分类精度;3)在较难获得样本类别标记而又强调分类安全性时,宜选择ICLS,而LapRLSC也是较好的选项之一。其次,提出了一个基于自适应图的半监督流形正则化分类学习框架(简称AGMR),同时进行图的构建和分类学习。在该框架中,图的构建和分类学习过程相互统一,从而相互促进;同时,流形结构图的参数随着学习的过程调整,无需事先给定。对于图权值约束,分别采用熵约束和稀疏约束,发展出熵约束AGMR(AGMR_entropy)和稀疏约束AGMR(AGMR_sparse)方法。实验结果表明新方法能够有效提升传统流形正则化框架的学习性能。
[Abstract]:Semi-supervised classification learning is an important research field in machine learning. At present, a large number of semi-supervised classification learning algorithms have been proposed one after another. But in real-life learning tasks, it is difficult for researchers to decide which method to choose. As far as we know, there is no relevant theoretical or empirical guidance. In addition, manifold regularization (Manifold regularization,MR) provides a powerful learning framework for semi-supervised classification learning, but there are two problems existing in traditional manifold regularization methods: 1) Manifold regularization methods usually construct manifold structure graphs in advance; And in the process of learning fixed. The construction of Manifold structure graph and the process of classification learning are independent of each other, and the graph is not necessarily beneficial to the subsequent classification. 2) there are some adjustable parameters in the process of graph construction. However, there is still a lack of effective solutions to parameter selection in semi-supervised learning, which brings some obstacles to the construction of manifolds. Therefore, the content of this paper mainly includes the following two parts: firstly, in order to give empirical guidance on the selection of semi-supervised classification methods, the typical semi-supervised classification methods are compared. Because the existing semi-supervised classification methods can be divided according to the data distribution hypothesis, this paper studies and compares the transduction support vector machine (Transductive Support Vector Machine,TSVM) based on the clustering hypothesis based on the least squares (Least Squares,LS) method. The regularized least square classification (Laplacian Regularized Least Squares Classification,LapRLSC (LLS) method based on manifold hypothesis and the classification performance of the two hypothetical SemiBoost and implicitly constrained least squares (Implicitly Constrained Semi-supervised Least Squares,ICLS without any assumptions are used. The following conclusions are obtained: 1) when the data distribution is known, the better classification performance can be guaranteed by using the corresponding data distribution hypothesis; 2) TSVM can achieve high classification accuracy when there is no prior knowledge of data distribution and the number of samples is limited. 3) when it is difficult to obtain sample category markers and emphasize classification security, ICLS, should be selected and LapRLSC is also one of the better options. Secondly, a semi-supervised manifolds regularization classification learning framework based on adaptive graph (AGMR), for short) is proposed to construct and classify graphs at the same time. In this framework, the process of graph construction and classification learning is unified with each other, thus promoting each other. At the same time, the parameters of Manifold structure graph are adjusted with the learning process and do not need to be given in advance. For graph weight constraints, entropy constraint AGMR (AGMR_entropy) and sparse constraint AGMR (AGMR_sparse) methods are developed by using entropy constraint and sparse constraint, respectively. The experimental results show that the new method can effectively improve the learning performance of the traditional manifold regularization framework.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181

【相似文献】