基于高斯混合模型的变分自动编码器

发布时间：2018-11-28 18:12

【摘要】：无监督学习作为一种能从无标签数据中学习真实世界的方法,它能把人类从数据的标签化中解放出来。费曼说过:做不出来就没有真正明白,评价无监督学习好坏的方式有很多,其中生成任务就是最直接的一个。只有当我们能生成/创造我们的真实世界,才能说明我们是完完全全理解了它。因此,生成模型成为了近年来无监督学习算法中最流行的算法之一。本文将介绍一种复杂分布无监督学习中最流行的生成模型之一,即变分自动编码器,一种能够自动生成数据的模型,它是将高维复杂图像分布降低成低维简单分布,从而再从低维简单分布中自动生成原始图像。目前的变分自动编码器中隐变量z的后验分布大多满足单个简单分布,比如高斯分布,这就造成低维表示太过简单。然而真实世界中有许多非高斯形态的分布,特别地,对于一些高度扭曲的多峰分布,一个高斯近似往往是不足够的。而数据集的隐空间也可能是任意复杂的分布。基于此,我们主要做了以下几点贡献。首先,为了提高后验分布的灵活性,我们把近似后验分布改为高斯混合模型,高斯混合模型的加入大大提高了变分自动编码器在数据集上的边缘似然。其次,为了进一步提高后验分布的灵活性,我们在变分自编码器中引入了Normalizing Flows,并将Normalizing Flows与高斯混合模型结合。Normalizing Flows可以用来指定任意复杂的、灵活的、可缩放的近似后验分布,即一个简单的初始化密度函数通过运用一系列的可逆变换被转移成一个渴望获得的复杂分布。最后,我们重新推导了高斯混合模型下变分自动编码器的变分下界,并获得了其对应的优化算法。由于Normalizing Flows的加入,高斯混合模型中的每一个单高斯都可以近似全协方差矩阵,即高斯混合模型的所有协方差矩阵都是非对角的,因此,基于高斯混合模型的变分自动编码器又被称为非对角高斯混合变分自动编码器(non-diagonal Gaussian mixture variational auto-encoders,NDGMVAE)。NDGMVAE使得隐变量z能够更真实地匹配隐变量空间。进一步,为了提高变分自动编码器的图像生成清晰度,我们改进了变分自动编码器中编码器和解码器的结构,使用了最新的卷积神经网络(CNN)和具有门控机制(gating mechanism)的神经网络,我们还对不同结构的变分自动编码器的变分下界进行了比较。为了证明新引入的后验分布更加的灵活,能够更真实地匹配隐变量空间,我们基于MNIST数据集、OMNIGLOT数据集和Histopathology数据集进行了实验,着重比较了各个数据集下的log似然的变分下界,并且在MNIST、OMNIGLOT和Freyfaces数据集上进行了可视化,比较了MNIST对应的隐变量分布。不仅如此,我们还基于不同高斯混合个数、不同高斯混合系数和Normalizing Flows的长度做了相应的实验。总之,新改进地基于高斯混合模型的变分自动编码器在性能和变分推理的各种应用上都有一个明显的提高,并且在理论上也具有优势。
[Abstract]:No supervised learning, as a method of learning real-world from non-label data, can free mankind from the labeling of data. It's not true to do it, and there are a lot of ways to evaluate the quality of unsupervised learning, which is one of the most direct ones. Only when we can generate/ create our real world can we tell us that we are completely understood. Therefore, the generation model is one of the most popular algorithms in the non-supervised learning algorithm in recent years. This paper will introduce one of the most popular generation models in the unsupervised learning of complex distribution, i.e., the variable sub-automatic encoder, a model capable of automatically generating data, which is to reduce the high-dimensional complex image distribution to a low-dimensional simple distribution, so as to automatically generate the original image from the low-dimensional simple distribution. The post-test distribution of the hidden variable z in the present variable-frequency automatic encoder mostly satisfies a single simple distribution, such as a Gaussian distribution, which makes the low-dimensional representation too simple. However, there are many non-Gaussian distributions in the real world, and in particular, for some highly distorted multi-peak distributions, a Gaussian approximation is often not sufficient. while the hidden space of the data set may be any complex distribution. Based on this, we have made the following contributions. First, in order to improve the flexibility of the post-test distribution, we change the approximate post-check distribution to the Gaussian mixture model, and the addition of the Gaussian mixture model greatly improves the edge of the variable sub-automatic encoder on the data set. Secondly, in order to further improve the flexibility of the post-test distribution, we introduced the Normalizing Flops in the variable sub-encoder and combined the Normalizing Flops with the Gaussian mixture model. Normalizing Flows can be used to specify any complex, flexible, scalable, approximately post-test distribution, that is, a simple initialization density function is transferred into a complex distribution that is desired by the use of a series of reversible transformations. In the end, we re-derive the lower bound of the variable sub-automatic coder under the Gaussian mixture model, and obtain the corresponding optimization algorithm. Due to the addition of the Normalizing Flops, each single Gaussian in the Gaussian mixture model can approximate the full covariance matrix, that is, all the covariance matrices of the Gaussian mixture model are non-diagonal, and therefore, A variable sub-automatic encoder based on a Gaussian mixture model is also referred to as a non-diagonal Gaussian mixture variable auto-encoding device (NDGMVAE). The NDGMVAE enables the hidden variable z to more truly match the hidden variable space. Further, in order to improve the definition of the image generation of the variable sub-automatic encoder, we improve the structure of the encoder and decoder in the variable sub-automatic encoder, using the latest convolutional neural network (CNN) and a neural network with gating mechanism, We also compare the lower and lower bounds of the variable sub-automatic encoder with different structures. In order to prove that the newly introduced post-test distribution is more flexible, the hidden variable space can be more truly matched. We have carried on the experiment based on the MNIST data set, the OMNIGLOT data set and the Histology data set, and the log-like variable lower bound under each data set is emphatically compared, and at MNIST, The hidden variable distribution corresponding to MNIST is compared on the OMNIGLOT and the Freysurfaces data set. In addition, we have done a corresponding experiment based on the number of different Gaussian mixture, the different Gaussian mixture coefficient and the length of the Normalizing Flow. In conclusion, the new modified auto-encoder based on the Gaussian mixture model has an obvious improvement in various applications of performance and variable-division reasoning, and has an advantage in theory.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】