基于图拉普拉斯的稀疏贝叶斯分类算法研究

发布时间：2018-01-07 06:06

本文关键词：基于图拉普拉斯的稀疏贝叶斯分类算法研究　出处：《中国科学技术大学》2017年硕士论文　论文类型：学位论文

【摘要】：稀疏贝叶斯作为一种主流的机器学习算法,它能够充分利用数据的先验信息,并结合已知的样本集,建立合理的数学模型,在训练的结果中拥有良好稀疏性并且可以得到概率性输出,已经成为机器学习研究领域中的热点,受到越来越多的研究人员关注。然而,对于流形数据集,目前的这些稀疏贝叶斯算法,例如相关向量机、概率分类向量机,因为忽略了数据集内部具有的特殊结构信息,输出的模型不能很好的表达真实数据,导致模型过于复杂,分类性能较低。本文基于现有的稀疏贝叶斯算法,结合流形正则化框架,提出一种能够充分利用数据内部流形信息的稀疏贝叶斯算法:基于图拉普拉斯的稀疏贝叶斯分类算法。本算法通过在稀疏贝叶斯模型权值的先验分布上引入稀疏流形先验,把数据集自身的流形结构信息作为先验知识利用起来,以训练出更符合数据分布的模型,提高了稀疏贝叶斯算法的分类性能。在本文中,我们从理论上分析了此模型,验证了算法分类的准确率。随后,在不同数据集上进行了多组对比实验,结果显示本文提出的算法不但在具有流形的数据集上取得了理想的分类性能,在普通的数据集上也有良好的效果。本文的主要工作总结如下:(1)本文基于传统的稀疏贝叶斯和流形正则化框架,提出的算法定义了一个流形先验,把数据的流形信息融入到了模型的先验当中,在训练过程中重复利用此信息,进而约束分类函数,因此可以得到精确性高、泛化性强的分类函数。(2)本文根据模型权值参数上的流形先验,通过拉普拉斯算法将参数的后验概率分布近似为高斯分布,然后使用迭代重加权最小二乘法求出分布的均值,提出了基于图拉普拉斯的稀疏贝叶斯分类算法,该算法具有良好的稀疏性,训练时间较短,收敛速度快,同时通过概率输出能够度量预测结果的不确定性。(3)本文利用边际似然函数能够控制模型对样本数据流形信息利用程度的超参数λ和超参数α进行自动优化,因此本文算法参数少,简单高效。(4)本文实验部分在人工数据集、标准数据集和流形数据集上进行大量的实验,通过与传统的监督学习算法对比,验证了本文提出的算法具有良好的分类准确性、较好的模型稀疏性和较快的收敛速度。
[Abstract]:As a mainstream machine learning algorithm, sparse Bayes can make full use of the prior information of the data and combine the known sample set to establish a reasonable mathematical model. With good sparsity and probability output in the training results, it has become a hot topic in the field of machine learning, and has attracted more and more researchers' attention. However, for manifold data sets. These sparse Bayesian algorithms, such as correlation vector machine, probability classification vector machine, because of ignoring the special structure information in the data set, the output model can not express the real data very well. As a result, the model is too complex and the classification performance is low. Based on the existing sparse Bayes algorithm, this paper combines manifold regularization framework. A sparse Bayesian algorithm which can make full use of the data manifold information is proposed. A sparse Bayesian classification algorithm based on graph Laplace. This algorithm introduces a sparse manifold priori on the prior distribution of the weights of the sparse Bayesian model. The manifold structure information of the dataset itself is used as a priori knowledge in order to train a more data distribution model and improve the classification performance of the sparse Bayesian algorithm. We analyze the model theoretically, and verify the accuracy of the algorithm. Then, we carry out a number of comparative experiments on different data sets. The results show that the proposed algorithm not only achieves ideal classification performance on the dataset with manifold. The main work of this paper is summarized as follows: 1) based on the traditional framework of sparse Bayes and manifold regularization, the proposed algorithm defines a manifold priori. The manifold information of the data is incorporated into the priori of the model, and the information is reused in the training process, and then the classification function is constrained, so the accuracy can be obtained with high accuracy. In this paper, the posterior probability distribution of the parameter is approximated to Gao Si distribution by Laplace algorithm according to the manifold priori on the weight parameter of the model. Then we use iterative reweighted least square method to find the mean value of the distribution, and propose a sparse Bayesian classification algorithm based on graph Laplace. The algorithm has good sparsity, short training time and fast convergence speed. At the same time, the uncertainty of the prediction result can be measured by the probability output. In this paper, the marginal likelihood function is used to control the model to automatically optimize the super-parameter 位 and the super-parameter 伪 of the data stream information. Therefore, this algorithm has few parameters and is simple and efficient. 4) in the experiment part, a large number of experiments are carried out on artificial data sets, standard data sets and manifold data sets, and compared with traditional supervised learning algorithms. It is verified that the proposed algorithm has good classification accuracy, better model sparsity and faster convergence speed.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181

【相似文献】