# Langevin Dynamics

## Problem Statement

To sample from $$p(\mathbf{x})$$, where $$p(\mathbf{x})$$ is not easy to sample from, but the gradient $$\nabla_\mathbf{x} \log p(\mathbf{x})$$ is tractable.

## Langevin Dynamics

The transition kernel $$T$$ of Langevin dynamics is given by the following equation: \begin{align} &\mathbf{x}^{(t+1)} = \mathbf{x}^{(t)} + \frac{\epsilon^2}{2}\cdot \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t)}) + \epsilon\cdot \mathbf{z}^{(t)} \\ &\text{where}\;\,\mathbf{z}^{(t)} \sim \mathcal{N}(\mathbf{0},\mathbf{I}) \end{align} and then Metropolis-Hastings algorithm is adopted to determine whether or not the new sample $$\mathbf{x}^{(t+1)}$$ should be accepted. The acceptance rate is given by: \begin{align} \rho(\mathbf{x}^{(t)}, \mathbf{x}^{(t+1)}) &= \min\bigg\{ 1, \exp\bigg( -\log p(\mathbf{x}^{(t)}) + \log p(\mathbf{x}^{(t+1)}) \\ &\qquad\qquad\qquad\qquad+\frac{1}{2} \left\| \mathbf{z}^{(t)} \right\|^2_2 -\frac{1}{2} \left\| \mathbf{z}^{(t+1)} \right\|^2_2 \bigg) \bigg\} \\ \mathbf{z}^{(t+1)} &= -\mathbf{z}^{(t)} - \frac{\epsilon}{2}\cdot \left[ \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t)}) + \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t+1)}) \right] \end{align}

### Langevin Dynamics as Metropolis-Hastings Algorithm

One may also substitute out $$\mathbf{z}^{(t)}$$ and $$\mathbf{z}^{(t+1)}$$ using $$\mathbf{x}^{(t)}$$ and $$\mathbf{x}^{(t+1)}$$, such that the whole Langevin dynamics can be viewed as a strict Metropolis-Hastings transition kernel, as follows: \begin{align} \mathbf{z}^{(t)} &= \frac{1}{\epsilon}\left[ \mathbf{x}^{(t+1)} - \mathbf{x}^{(t)} - \frac{\epsilon^2}{2} \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t)}) \right] \\ \mathbf{z}^{(t+1)} &= \frac{1}{\epsilon}\left[ \mathbf{x}^{(t)} - \mathbf{x}^{(t+1)} - \frac{\epsilon^2}{2} \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t+1)}) \right] \end{align} Also, we can write $$\rho(\mathbf{x}^{(t)}, \mathbf{x}^{(t+1)})$$ as: $\rho(\mathbf{x}^{(t)}, \mathbf{x}^{(t+1)}) = \min\left\{ 1, \frac{\exp\left( \log p(\mathbf{x}^{(t+1)}) \right)}{\exp\left( \log p(\mathbf{x}^{(t)}) \right)} \cdot \frac{\exp\left( -\frac{1}{2}\left\| \mathbf{z}^{(t+1)} \right\|^2_2 \right)}{\exp\left( -\frac{1}{2}\left\| \mathbf{z}^{(t)} \right\|^2_2 \right)} \right\}$ where: \begin{align} \exp\left( -\frac{1}{2}\left\| \mathbf{z}^{(t+1)} \right\|^2_2 \right) &= \exp\left( -\frac{1}{2\epsilon^2} \left\| \mathbf{x}^{(t)} - \left( \mathbf{x}^{(t+1)} + \frac{\epsilon^2}{2} \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t+1)}) \right) \right\|^2_2 \right) \\ \exp\left( -\frac{1}{2}\left\| \mathbf{z}^{(t)} \right\|^2_2 \right) &= \exp\left( -\frac{1}{2\epsilon^2} \left\| \mathbf{x}^{(t+1)} - \left( \mathbf{x}^{(t)} + \frac{\epsilon^2}{2} \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t)}) \right) \right\|^2_2 \right) \end{align} The ratio $$\exp\left( -\frac{1}{2}\left\| \mathbf{z}^{(t+1)} \right\|^2_2 \right) / \exp\left( -\frac{1}{2}\left\| \mathbf{z}^{(t)} \right\|^2_2 \right)$$ can be viewed as $$q(\mathbf{x}^{(t)}|\mathbf{x}^{(t+1)}) / q(\mathbf{x}^{(t+1)}|\mathbf{x}^{(t)})$$, where: \begin{align} q(\mathbf{x}^{(t)}|\mathbf{x}^{(t+1)}) &= \mathcal{N}\left(\mathbf{x}^{(t)}\,\bigg|\,\mathbf{x}^{(t+1)} + \frac{\epsilon^2}{2} \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t+1)}), \epsilon^2\right) \\ q(\mathbf{x}^{(t+1)}|\mathbf{x}^{(t)}) &= \mathcal{N}\left(\mathbf{x}^{(t+1)}\,\bigg|\,\mathbf{x}^{(t)} + \frac{\epsilon^2}{2} \nabla_{\mathbf{x}} \log p(\mathbf{x}^{(t)}), \epsilon^2\right) \end{align} which are two Normal distributions with mean in the form of $$\mathbf{x}+\frac{\epsilon^2}{2}\nabla_{\mathbf{x}} \log p(\mathbf{x})$$ and variance being $$\epsilon^2$$. Also, since $$p(\mathbf{x}) = \exp\left( \log p(\mathbf{x}) \right)$$, we finally obtain: $\rho(\mathbf{x}^{(t)}, \mathbf{x}^{(t+1)}) = \min\left\{ 1, \frac{p(\mathbf{x}^{(t+1)})}{p(\mathbf{x}^{(t)})} \cdot \frac{q(\mathbf{x}^{(t)}|\mathbf{x}^{(t+1)})}{q(\mathbf{x}^{(t+1)}|\mathbf{x}^{(t)})} \right\}$ which is exactly the acceptance rate of Metropolis-Hastings algorithm, with target distribution $$p(\mathbf{x})$$ and proposal distribution $$q(\mathbf{x}^{\star}|\mathbf{x})$$.

### Relationship with Hamiltonian Monte Carlo

Langevin dyanmics is actually a Hamiltonian dynamics system, with the potential and kinetic energy chosen as: \begin{align} U(q) &= -\log \pi(q) \\ K(p|q) &= K(p) = -\log \mathcal{N}(p|0,I) = \frac{1}{2} p^\top p + \text{const} = \frac{1}{2}\left\| p \right\|^2_2 + \text{const} \end{align} and the integrator for the Hamiltonian equations chosen as one-step LeapFrog method.

The Hamiltonian equations, under this assumption, are given by: \begin{align} \frac{dq}{dt} &= \frac{\partial K}{\partial p} = p \\ \frac{dp}{dt} &= -\frac{\partial U}{\partial q} = \nabla \log \pi(q) \end{align}

Using one-step LeapFrog integrator, we obtain: \begin{align} p_{0.5} &= p_0 - \frac{\epsilon}{2}\cdot \frac{\partial U}{\partial q}(q_0) \\ q_1 &= q_0 + \epsilon \cdot \frac{\partial K}{\partial p}(p_{0.5}) = q_0 - \frac{\epsilon^2}{2} \cdot \frac{\partial U}{\partial q}(q_0) + \epsilon \cdot p_0 \\ p_1 &= p_{0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_1) = p_0 - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_0) - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_1) \end{align} Negate $$p_1$$, we obtain the candidate state $$(q_1,-p_1)$$. The Metropolis-Hastings acceptance rate is then given by:

\begin{align} \rho((q_0,p_0),(q_1,-p_1)) &= \min\left\{ 1, \frac{\pi(q_1,-p_1)}{\pi(q_0,p_0)} \right\} \\ &= \min\left\{ 1, \frac{\exp\left( -H(q_1,-p_1) \right)}{\exp\left( -H(q_0,p_0) \right)} \right\} \\ &= \min\left\{ 1, \exp\left( U(q_0) - U(q_1) + K(p_0) - K(-p_1) \right) \right\} \\ &= \min\left\{ 1, \exp\left( -\log \pi(q_0) + \log \pi(q_1) + \frac{1}{2} \left\| p_0 \right\|^2_2 - \frac{1}{2} \left\| -p_1 \right\|^2_2 \right) \right\} \end{align} Substitute $$q_0 = \mathbf{x}^{(t)}$$, $$q_1 = \mathbf{x}^{(t+1)}$$ and $$p_0=\mathbf{z}^{(t)}$$, $$-p_1=\mathbf{z}^{(t+1)}$$, we obtain the transition kernel and the acceptance rate for Langevin dynamics in the previous section.