This post is a note I take from while reading Blei et al 2018.
Goal:
- Motivation of variational inference
- Understand the derivation of ELBO and its intiution
- Walk through the derivation, some of which was skip the in original paper
- Implementation of CAVI
ELBO
Goal is to find \(q(z)\) to approximate \(p(z|x)\)
The KL-divergence
$$ \begin{equation} \begin{aligned} KL[q(z)||p(z | x)] &= \int_z{q(z)\log{\frac{p(z|x)}{q(z)}} dz} \end{aligned} \end{equation} $$
However, this quantity is intractable to compute hence, we’re unable to optimize this quantity directly.
$$ \begin{equation} \begin{aligned} KL[q(z)||p(z | x)] &= - \int_z{q(z)\log{\frac{p(z|x)}{q(z)}} dz} \\ &= -\int_z{ q(z) \log { \frac{\log p(z, x)}{q(z) p(x)} } }\\ &= -\int_z{q(z)[\log{\frac{p(z,x)}{q(z)}} - \log p(x)]dz} \\ &= -\int_z{ q(z) \log \frac{p(z, x)}{q(z)}dz } + \int_z{q(z)\log p(x) dz} \\ & =: -\texttt{ELBO}[q] + \log p(x) \\ \iff \texttt{ELBO}[q] &= -KL(q||p) + \log p(x) \end{aligned} \end{equation} $$
Because \(\log p(x)\) is a constant, by maximizing \(\text{ELBO}[q]\), we minimize \(KL(q||p)\) by proxy. Rewrite ELBO:
$$ \begin{equation} \begin{aligned} \texttt{ELBO}(q) &= \int_z{q(z)\log \frac{p(z, x)}{q(z)}} \\ &= \mathbb{E}_{z\sim q}[\log p(z, x)] - \mathbb{E}_{z\sim q}[\log q(z)] \end{aligned} \end{equation} $$
Mean field Variational Family
Mean-field variational family made a strong assumption of independence between it’s latent variable
$$ q(\mathbf{z}) = \prod_{j} {q_j(z_j)} $$
Coordinate ascent variational inference is a common method to solve mean-field variational inference problem. Holding other latent variable fixed, the \(j^{th}\) latent variable is given by:
$$ q^*_{j}(z_j) = \text{exp}{\mathbb{E}_{-j}[\log p(z_j | z_{-j}, \mathbf{x})]} \propto \exp{\mathbb{E}_{-j} [\log p(z_j, z_{-j}, \mathbf{x})]} $$
Proof
$$ \begin{equation} \begin{aligned} q^*_j(z_j) &= \texttt{arg}\max_{q_j(z_j)} \quad{\texttt{ELBO}(q)} \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_q[\log p(z_j, z_{-j}, x)] - \mathbb{E}_q[\log q(z_j, z_{-j})] \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\mathbb{E}_{-j}[\log q_j(z_j) + \log q_{-j}(z_{-j})]] \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\log q_j(z_j)] + const \\ &= \texttt{arg}\max_{q_j(z_j)} \quad \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\log q_j(z_j)] \end{aligned} \end{equation} $$
We need to find function \(q_j(z_j)\) that maximize \(\text{ELBO}(q)\)
Assuming \(q_j(z_j)= \epsilon \eta(z_j) + q^*_j(z_j)\)
$$ \begin{aligned} K(\epsilon) &= \mathbb{E}_j[\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]] - \mathbb{E}_j[\log q_j(z_j)] \\ &= \int_{z_j} q_j(z_j) A d_{z_j} - \int_{z_j}q_j(z_j)\log q_z(z_j) d_{z_j} \\ &= \int_{z_j} [\epsilon \eta(z_j) + q^*_j(z_j)] A d_{z_j} - \int_{z_j}[\epsilon \eta(z_j) + q^*_j(z_j)] \log [\epsilon \eta(z_j) + q^*_j(z_j)] d_{z_j} \end{aligned} $$
Evaluate the partial derivative of \(K\) wrt \(\epsilon\) we have:
$$ \begin{aligned} & \frac{\partial}{\partial \epsilon}K \bigg\vert_{\epsilon=0} = 0 \\ \iff & \int_{z_j} {\eta(z_j) A d_{z_j}} - \int_{z_j} { {\eta(z_j) \log [\epsilon \eta(z_j) + q^*_j(z_j)]} + [\epsilon \eta(z_j) + q^*_j(z_j)] \frac{\eta(z_j)}{\epsilon \eta(z_j) + q^*_j(z_j)}d_{z_j} } = 0\\ \iff & \int_{z_j} {\eta(z_j) A d_{z_j}} - \int_{z_j}{[\eta(z_j)\log q^*_j(z_j) +\eta(z_j)]d_{z_j}} = 0; \quad \forall \eta(z_j) \\ \iff & \log q^*_j(z_j) = A-1 = \mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)] - 1 \\ \iff & q^*_j(z_j) \propto \exp{\mathbb{E}_{-j}[\log p(z_j, z_{-j}, x)]} \end{aligned} $$
Complete example of Bayesian Gaussian Mixture
TDB