基于变分求解的有监督狄利克雷过程混合主成分分析
发布时间:2018-07-07 23:33
本文选题:狄利克雷过程 + 混合模型 ; 参考:《中山大学》2015年硕士论文
【摘要】:狄利克雷过程混合模型(DPM)与传统的有限混合模型相比,能够解决簇个数未知的问题,并且可随着数据规模的增长自适应地调整簇的数量,因此近年来得到了广泛的应用。有监督狄利克雷过程混合模型(SDPM)通过将DPM与有监督学习模型相结合,使有监督学习中的协变量和响应值的联合分布可以通过狄利克雷过程来非参数地建模,在每个簇中都学习出对应的局部专家模型。当簇的个数大于一时,线性有监督模型将变成全局非线性的,这拓展了线性模型的学习能力并提高了模型的灵活性。然而,由于上述模型是直接根据协变量来对模型进行训练的,当特征维数较高时会遭遇维数灾难的问题。为了解决这个问题,本文提出在SDPM中引入概率主成分分析(PPCA),形成有监督狄利克雷过程混合主成分分析模型(SDPM-PCA)。PPCA作为常用的降维算法,通过将高维数据投影到低维隐空间,能够有效提升模型的训练速度并且避免过拟合情况的发生。SDPM-PCA假设模型中的协变量以及响应变量是由PPCA中处于低维隐空间的隐变量独立产生的,并使用狄利克雷过程来非参数地建模。通过将聚簇、有监督学习以及降维这三个过程进行联合学习,SDPM-PCA可以在每个簇中进行局部降维,然后在低维隐空间中训练局部有监督模型,从而在避免维数灾难的同时提升降维效果,以及提高模型在低维空间上的预测性能。本文基于变分推断法来对SDPM-PCA进行近似求解,相对于基于蒙特卡洛模拟的采样算法,能够提供更快的训练速度以及确定性的近似解,为模型在高维数据场景下的应用提供了可行性。最后,本文将SDPM-PCA在回归问题上根据贝叶斯线性回归模型进行实例化,使用多组真实世界数据进行了实验测试并与SDPM及其他常用的回归算法进行对比。实验结果表明,通过设定合适的隐空间维数,SDPM-PCA能提供更好的降维效果,并且通常在处理高维回归问题时具有更好以及更稳定的预测性能。
[Abstract]:Compared with the traditional finite hybrid model, Drickley process hybrid model (DPM) can solve the problem that the number of clusters is unknown, and can adjust the number of clusters adaptively with the increase of data scale, so it has been widely used in recent years. By combining DPM with supervised learning model, the supervised Drickley process hybrid model (SDPM) enables the joint distribution of covariables and response values in supervised learning to be modeled nonparametric through the Delikley process. The corresponding local expert model is obtained in each cluster. When the number of clusters is larger than 1, the linear supervised model will become globally nonlinear, which extends the learning ability of the linear model and improves the flexibility of the model. However, because the above model is trained directly according to the covariable, the problem of dimension disaster will be encountered when the characteristic dimension is high. To solve this problem, this paper proposes to introduce probabilistic principal component analysis (PPCA) into SDPM to form a supervised Delikler process mixed principal component analysis model (SDPM-PCA) .PPCA as a commonly used dimensionality reduction algorithm, by projecting high-dimensional data into low-dimensional hidden space. It can effectively improve the training speed of the model and avoid the occurrence of over-fitting. The covariables and response variables in the SDPM-PCA hypothesis model are generated independently by the hidden variables in the low-dimensional hidden space of the PPCA. And use the Delikley process to non-parametric modeling. By combining the three processes of clustering, supervised learning and dimensionality reduction, SDPM-PCA can reduce the local dimension in each cluster, and then train the locally supervised model in low-dimensional hidden space. Thus, the dimensionality reduction effect and the prediction performance of the model in low dimensional space can be improved while avoiding the dimensionality disaster. In this paper, the approximate solution of SDPM-PCA is based on variational inference method. Compared with the sampling algorithm based on Monte Carlo simulation, it can provide faster training speed and deterministic approximate solution. It provides the feasibility for the application of the model in the high dimensional data scene. Finally, SDPM-PCA is instantiated on the basis of Bayesian linear regression model in regression problem, and the experiments are carried out using multiple sets of real world data and compared with SDPM and other commonly used regression algorithms. The experimental results show that SDPM-PCA can provide better dimensionality reduction effect by setting appropriate hidden space dimension and usually has better and more stable prediction performance when dealing with high-dimensional regression problems.
【学位授予单位】:中山大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP301.6;O212.1
【相似文献】
相关期刊论文 前6条
1 许珠香;江弋;;基于潜在狄利克雷分配模型的医疗数据研究[J];厦门大学学报(自然科学版);2013年03期
2 许两有;许珠香;;潜在狄利克雷分配模型在网络日志的应用[J];厦门大学学报(自然科学版);2013年04期
3 梁晓毅;狄里可雷空间的循环性[J];西安科技大学学报;2004年03期
4 江雨燕;李平;王清;;用于多标签分类的改进Labeled LDA模型[J];南京大学学报(自然科学版);2013年04期
5 常彦勋;素数幂分布定理(英文)[J];北方交通大学学报;1999年02期
6 ;[J];;年期
相关硕士学位论文 前2条
1 李康;基于变分求解的有监督狄利克雷过程混合主成分分析[D];中山大学;2015年
2 梁镇锋;基于狄利克雷混合过程半监督分类模型研究[D];中山大学;2013年
,本文编号:2106644
本文链接:https://www.wllwen.com/kejilunwen/yysx/2106644.html