基于Wasserstein距离概率分布模型的非线性降维
发布时间:2019-07-11 20:06
【摘要】:降维是大数据分析和可视化领域中的核心问题,其中基于概率分布模型的降维算法通过最优化高维数据模型和低维数据模型之间的代价函数来实现降维。这种策略的核心在于构建最能体现数据特征的概率分布模型。基于此,将Wasserstein距离引入降维,提出一个基于Wasserstein距离概率分布模型的非线性降维算法W-map。Wmap模型在高维数据空间和其相关对应的低维数据空间建立相似的Wasserstein流,将降维转化为最小运输问题。在解决Wasserstein距离最小化的问题同时,依据数据的Wasserstein流模型在高维空间与其在低维空间相同的原则,寻找最匹配的低维数据投射。三组针对不同数据集的实验结果表明W-map相对传统概率分布模型可以产生正确性高且鲁棒性好的高维数据降维可视化结果。
[Abstract]:Dimension reduction is the core problem in the field of big data analysis and visualization, in which the dimension reduction algorithm based on probability distribution model realizes dimension reduction by optimizing the cost function between high dimensional data model and low dimensional data model. The core of this strategy is to construct the probability distribution model which can best reflect the data characteristics. Based on this, the Wasserstein distance is introduced into dimension reduction, and a nonlinear dimension reduction algorithm W-map.Wmap model based on Wasserstein distance probability distribution model is proposed to establish similar Wasserstein flows in high dimensional data space and its related low dimensional data space, and the dimension reduction is transformed into the minimum transportation problem. In order to solve the problem of minimizing Wasserstein distance, according to the principle that the Wasserstein flow model of the data is the same as the Wasserstein flow model in the low dimensional space, the most matching low dimensional data projection is found. The experimental results of three groups for different data sets show that W-map can produce high dimensional data reduction visualization results with high correctness and good robustness compared with the traditional probability distribution model.
【作者单位】: 陕西师范大学物理学与信息技术学院;
【基金】:国家自然科学基金资助项目(11374199,11574192)~~
【分类号】:O21
,
本文编号:2513448
[Abstract]:Dimension reduction is the core problem in the field of big data analysis and visualization, in which the dimension reduction algorithm based on probability distribution model realizes dimension reduction by optimizing the cost function between high dimensional data model and low dimensional data model. The core of this strategy is to construct the probability distribution model which can best reflect the data characteristics. Based on this, the Wasserstein distance is introduced into dimension reduction, and a nonlinear dimension reduction algorithm W-map.Wmap model based on Wasserstein distance probability distribution model is proposed to establish similar Wasserstein flows in high dimensional data space and its related low dimensional data space, and the dimension reduction is transformed into the minimum transportation problem. In order to solve the problem of minimizing Wasserstein distance, according to the principle that the Wasserstein flow model of the data is the same as the Wasserstein flow model in the low dimensional space, the most matching low dimensional data projection is found. The experimental results of three groups for different data sets show that W-map can produce high dimensional data reduction visualization results with high correctness and good robustness compared with the traditional probability distribution model.
【作者单位】: 陕西师范大学物理学与信息技术学院;
【基金】:国家自然科学基金资助项目(11374199,11574192)~~
【分类号】:O21
,
本文编号:2513448
本文链接:https://www.wllwen.com/kejilunwen/yysx/2513448.html