Variational Autoencoder Basis

The Probabilistic Model Formulation

Variational autoencoder, first proposed by Kingma and Welling (2014) and Rezende, Mohamed, and Wierstra (2014), was originally a deep Bayesian network composed of latent variable \(\mathbf{z}\) and observed variable \(\mathbf{x}\), formulated as: \[ p(\mathbf{x};\theta,\lambda) = \int_{\mathcal{Z}} p_{\theta}(\mathbf{x}|\mathbf{z})\,p_{\lambda}(\mathbf{z})\,d\mathbf{z} \] where \(p_{\lambda}(\mathbf{z})\) is the prior for \(\mathbf{z}\), either fixed, or derived by a neural network with parameter \(\lambda\); and \(p_{\theta}(\mathbf{x}|\mathbf{z})\) is derived by a neural network with parameter \(\theta\). \(p_{\lambda}(\mathbf{z})\) is denoted as \(p_{\theta}(\mathbf{z})\) in some literature, including in its original papers.

Variational Approximation of the Posterior

The most significant design of variational autoencoder, which makes it much different from other generative models, is the use of variational inference to approximate the intractable \(p_{\theta}(\mathbf{z}|\mathbf{x}) = \frac{p_{\theta}(\mathbf{x}|\mathbf{z})\,p_{\lambda}(\mathbf{z})}{\int_{\mathcal{Z}} p_{\theta}(\mathbf{x}|\mathbf{w})\,p_{\lambda}(\mathbf{w})\,d\mathbf{w}}\), using a separatedly learned \(q_{\phi}(\mathbf{z}|\mathbf{x})\), derived by a neural network with parameter \(\phi\). The evidence lower-bound (ELBO) can be used to joinly train these components, formulated as: \[ \begin{aligned} \log p(\mathbf{x};\theta,\lambda) &\geq \log p_{\theta}(\mathbf{x}) - \operatorname{D}_{KL}\big[ q_{\phi}(\mathbf{z}|\mathbf{x})\|p_{\theta}(\mathbf{z}|\mathbf{x}) \big] \\ &= \mathbb{E}_{\mathbf{z} \sim q_{\phi}(\mathbf{z}|\mathbf{x})}\big[ \log p_{\theta}(\mathbf{x}) + \log p_{\theta}(\mathbf{z}|\mathbf{x}) - \log q_{\phi}(\mathbf{z}|\mathbf{x}) \big] \\ &= \mathbb{E}_{\mathbf{z} \sim q_{\phi}(\mathbf{z}|\mathbf{x})}\big[ \log p_{\theta}(\mathbf{x}|\mathbf{z}) + \log p_{\lambda}(\mathbf{z}) - \log q_{\phi}(\mathbf{z}|\mathbf{x}) \big] \\ &= \mathbb{E}_{\mathbf{z} \sim q_{\phi}(\mathbf{z}|\mathbf{x})}\big[\log p_{\theta}(\mathbf{x}|\mathbf{z})\big] - D_{\mathrm{KL}}\left( q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\lambda}(\mathbf{z}) \right) \\ &= \mathcal{L}(\mathbf{x};\theta,\lambda,\phi) \end{aligned} \]

Kingma and Welling (2014) proposed to optimize ELBO using SGVB gradient estimator, which requires \(q_{\phi}(\mathbf{z}|\mathbf{x})\) to be re-parameterized. Only some of the continuous distributions can be re-parameterized. For non-reparameterizable continuous distributions and discrete distributions, other gradient estimators may be adopted, which are reviewed in variational inference.

The Auto-Encoding Structure

The pair of \(q_{\phi}(\mathbf{z}|\mathbf{x})\) and \(p_{\theta}(\mathbf{x}|\mathbf{z})\) resembles an autoencoder, where \(q_{\phi}(\mathbf{z}|\mathbf{x})\) is the encoder, and \(p_{\theta}(\mathbf{x}|\mathbf{z})\) is the decoder. In this perspective, \(D_{\mathrm{KL}}\left( q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\lambda}(\mathbf{z}) \right)\) becomes a regularization term to encourage a meaningful latent coding, which was further discussed in \(\beta\)-VAE (Higgins et al. 2017; Burgess et al. 2018; Mathieu et al. 2018) and others.

The auto-encoding structure is even more well-known and widely used than the probabilistic formulation of a variational autoencoder. Because of this, the term variational autoencoder now has been generalized to refer to a family of generative models, which learn stochastic encoders and infer latent variables by variational inference, rather than just the original model.

Advanced Model Architectures

Some more advanced model architectures, which is composed of more than just one latent variable \(\mathbf{z}\) and one observed variable \(\mathbf{x}\), are reviewed in this section.

Training Variational Autoencoder


Burgess, Christopher P., Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. “Understanding Disentangling in $\beta$-VAE.” arXiv:1804.03599 [Cs, Stat], April. http://arxiv.org/abs/1804.03599.

Higgins, Irina, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. “Beta-Vae: Learning Basic Visual Concepts with a Constrained Variational Framework.” In International Conference on Learning Representations. Vol. 3.

Kingma, Diederik P, and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In Proceedings of the International Conference on Learning Representations.

Mathieu, Emile, Tom Rainforth, N. Siddharth, and Yee Whye Teh. 2018. “Disentangling Disentanglement in Variational Autoencoders.” arXiv:1812.02833 [Cs, Stat], December. http://arxiv.org/abs/1812.02833.

Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2014. “Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, II–1278–II–1286. ICML’14. Beijing, China: JMLR.org.