Understanding Variational Inference

This post is a note I take from while reading Blei et al 2018.

Goal:

Motivation of variational inference
Understand the derivation of ELBO and its intiution
Walk through the derivation, some of which was skip the in original paper
Implementation of CAVI

ELBO

Goal is to find $q(z)$ to approximate $p(z|x)$

The KL-divergence

$$ \begin{equation} \begin{aligned} KL[q(z)||p(z | x)] &= \int_z{q(z)\log{\frac{p(z|x)}{q(z)}} dz} \end{aligned} \end{equation} $$

However, this quantity is intractable to compute hence, we’re unable to optimize this quantity directly.

$$ \begin{equation} \begin{aligned} KL[q(z)||p(z | x)] &= - \int_z{q(z)\log{\frac{p(z|x)}{q(z)}} dz} \\ &= -\int_z{ q(z) \log { \frac{\log p(z, x)}{q(z) p(x)} } }\\ &= -\int_z{q(z)[\log{\frac{p(z,x)}{q(z)}} - \log p(x)]dz} \\ &= -\int_z{ q(z) \log \frac{p(z, x)}{q(z)}dz } + \int_z{q(z)\log p(x) dz} \\ & =: -\texttt{ELBO}[q] + \log p(x) \\ \iff \texttt{ELBO}[q] &= -KL(q||p) + \log p(x) \end{aligned} \end{equation} $$

Because $\log p(x)$ is a constant, by maximizing $\text{ELBO}[q]$, we minimize $KL(q||p)$ by proxy. Rewrite ELBO:

$$ \begin{equation} \begin{aligned} \texttt{ELBO}(q) &= \int_z{q(z)\log \frac{p(z, x)}{q(z)}} \\ &= \mathbb{E}_{z\sim q}[\log p(z, x)] - \mathbb{E}_{z\sim q}[\log q(z)] \end{aligned} \end{equation} $$

Mean field Variational Family

Mean-field variational family made a strong assumption of independence between it’s latent variable

$$ q(\mathbf{z}) = \prod_{j} {q_j(z_j)} $$

Coordinate ascent variational inference is a common method to solve mean-field variational inference problem. Holding other latent variable fixed, the $j^{th}$ latent variable is given by:

$$ q^*_{j}(z_j) = \text{exp}{\mathbb{E}_{-j}[\log p(z_j | z_{-j}, \mathbf{x})]} \propto \exp{\mathbb{E}_{-j} [\log p(z_j, z_{-j}, \mathbf{x})]} $$

Proof

$$ \begin{equation} \begin{aligned} q^*_j(z_j) &= \texttt{arg}\max_{q_j(z_j)} \quad{\texttt{ELBO}(q)} \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_q[\log p(z_j, z_{-j}, x)] - \mathbb{E}_q[\log q(z_j, z_{-j})] \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\mathbb{E}_{-j}[\log q_j(z_j) + \log q_{-j}(z_{-j})]] \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\log q_j(z_j)] + const \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\log q_j(z_j)] \end{aligned} \end{equation} $$

We need to find function $q_j(z_j)$ that maximize $\text{ELBO}(q)$

Assuming $q_j(z_j)= \epsilon \eta(z_j) + q^*_j(z_j)$

$$ \begin{aligned} K(\epsilon) &= \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\log q_j(z_j)] \\ &= \int_{z_j} q_j(z_j) A d_{z_j} - \int_{z_j}q_j(z_j)\log q_z(z_j) d_{z_j} \\ &= \int_{z_j} [\epsilon \eta(z_j) + q^*_j(z_j)] A d_{z_j} - \int_{z_j}[\epsilon \eta(z_j) + q^*_j(z_j)] \log [\epsilon \eta(z_j) + q^*_j(z_j)] d_{z_j} \end{aligned} $$

Evaluate the partial derivative of $K$ wrt $\epsilon$ we have:

$$ \begin{aligned} & \frac{\partial}{\partial \epsilon}K \bigg\vert_{\epsilon=0} = 0 \\ \iff & \int_{z_j} {\eta(z_j) A d_{z_j}} - \int_{z_j} { {\eta(z_j) \log [\epsilon \eta(z_j) + q^*_j(z_j)]} + [\epsilon \eta(z_j) + q^*_j(z_j)] \frac{\eta(z_j)}{\epsilon \eta(z_j) + q^*_j(z_j)}d_{z_j} } = 0\\ \iff & \int_{z_j} {\eta(z_j) A d_{z_j}} - \int_{z_j}{[\eta(z_j)\log q^*_j(z_j) +\eta(z_j)]d_{z_j}} = 0; \quad \forall \eta(z_j) \\ \iff & \log q^*_j(z_j) = A-1 = \mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)] - 1 \\ \iff & q^*_j(z_j) \propto \exp{\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]} \end{aligned} $$

Complete example of Bayesian Gaussian Mixture

TDB

ELBO#

Mean field Variational Family#

Proof#

Complete example of Bayesian Gaussian Mixture#

ELBO

Mean field Variational Family

Proof

Complete example of Bayesian Gaussian Mixture