# Hamiltonian Dynamics

## Problem Statement

To sample from $$\pi(q)$$, where $$\pi(q)$$ is not easy to sample from, but given a sample $$q$$, the density of $$\pi(q)$$ is easy to evaluate, or at least the unnormalized density $$\tilde{\pi}(q)$$ is easy to calculate.

Note: as a convention of the Hamiltonian Monte Carlo method literature, the notations for vectors and matrices are not bold.

## Hamiltonian Dynamics

### Basic Elements

The Hamiltonian Monte Carlo method uses Hamiltonian dynamics system to derive the transition kernel $$T$$. It auguments the state space $$q$$ with a momentum $$p$$, forming the extended phase space $$(q, p)$$. A Hamiltonian system is then constructed for this phase space, with Hamiltonian function $$H(q,p)$$, i.e., the total energy of this system.

The density for the phase space, $$\pi(q,p)$$ is then given by: \begin{align} \pi(q, p) &= \exp\left( -H(q,p) \right) \end{align} satisfying $$\int \pi(q,p)\,dp = \pi(q)$$.

Since $$H(q,p)$$ is the energy function of a Hamiltonian system, it can further be decomposed into kinetic and potential energies: \begin{align} H(q,p) &= -\log \pi(q,p) \\ &= -\log \pi(p|q) - \log \pi(q) \\ &= K(p,q) + U(q) \end{align} where $$K(p,q) = -\log \pi(p|q)$$ is the Kinetic energy, and $$U(q) = -\log \pi(q)$$ is the potential energy.

### Transition Kernel

The evolution of a Hamiltonian system over time is given by the following differential equations: \begin{align} \frac{d q}{dt} &= \frac{\partial H}{\partial p} = \frac{\partial K}{\partial p} \\ \frac{d p}{dt} &= -\frac{\partial H}{\partial q} = -\frac{\partial K}{\partial q} - \frac{\partial U}{\partial q} \end{align} Let $$z=(q,p)$$, the Hamiltonian equations can also be rewritten in the following form: \begin{align} \frac{dz}{\mathrm{d}t} = J \,\nabla H(z) \end{align} where $$\nabla H$$ is the gradient of $$H$$, and: $J = \begin{bmatrix} 0 & I \\ -I & 0 \\ \end{bmatrix}$

The transition kernel $$T$$ is then given by the following procedure:

1. Lift $$q^{(t)}$$ to $$(q^{(t)},p^{(t)})$$, by sampling $$p^{(t)} \sim \pi(p|q^{(t)})$$.
2. Simulate the Hamiltonian system, by integrating the Hamiltonian differential equations, to obtain candidate state $$(q^*, p^*)$$.
3. For reversibility, negate $$p^{\star}$$, such that the candidate state becomes $$(q^{\star},-p^{\star})$$ (see below).
4. To correct numerical errors, apply Metropolis-Hastings acceptance rate on the candidate state $$(q^{\star}, -p^{\star})$$, to obtain the final state $$(q^{(t+1)},p^{(t+1)})$$.
5. Discard $$p^{(t+1)}$$ and use $$q^{(t+1)}$$ as the final sample of the Markov chain.

### Choices for $$K(p,q)$$ (or equivalently, $$\pi(p|q)$$)

1. Euclidean-Gaussian Kinetic Energy

This is perhaps the most commonly used kinetic energy, given by:

\begin{align} K(p|q) &= \frac{1}{2} p^T M^{-1} p + \log \left| M \right| + \text{const} \\ \pi(p|q) &\propto \exp\left( -\frac{1}{2} p^T M^{-1} p \right) \end{align}

Note $$\pi(p|q) = \mathcal{N}(p\,|\,0,M)$$ is a Gaussian distribution independent of $$q$$.

2. Riemannian-Gaussian Kinetic Energy \begin{align} K(p|q) &= \frac{1}{2} \cdot p^T \cdot \Sigma^{-1}(q) \cdot p + \log \left| \Sigma(q) \right| + \text{const} \\ \pi(p|q) &\propto \exp\left( -\frac{1}{2} p^T \cdot \Sigma(q)^{-1}\cdot p \right) \end{align}

Note $$\pi(p|q) = \mathcal{N}(p|0,\Sigma(q))$$ is a Gaussian distribution dependent of $$q$$. Such dependence can reflect the local geometry of the target distribution, thus can be better than Euclidean-Gaussian Kinetic Energy.

### Numerical Integrator for the Hamiltonian Equations

Numerical integrators are adopted to simulate a Hamiltonian dynamics system. Given the size of simulation time step (denoted as $$\epsilon$$), and the total number of simulation steps to run (denoted as $$n$$), a numerical integrator transforms an initial state $$(q(t),p(t))$$ to $$(q(t+n\epsilon), p(t+n\epsilon))$$.

For simplicity, in this section, the initial state fed into the numerical integrator is denoted as $$(q_0, p_0)$$ (which is $$(q(t),p(t))$$), while the transformed state after the $$k$$-th simulation step is denoted as $$(q_k, p_k)$$ (which is $$(q(t+k\epsilon),p(t+k\epsilon))$$).

1. Euler's Method

Each step of Euler's method is given by: \begin{align} p_{k+1} &= p_k + \epsilon \cdot \frac{dp}{dt}(t + k\epsilon) = p_k - \epsilon\cdot\left[ \frac{\partial K}{\partial q}(p_k, q_k) + \frac{\partial U}{\partial q}(q_k) \right] \\ q_{k+1} &= q_k + \epsilon \cdot \frac{dq}{dt}(t+k\epsilon) = q_k + \epsilon\cdot\frac{\partial K}{\partial p}(p_k,q_k) \end{align}

However, Euler's method is not sympletic, and is prone to divergence.

2. LeapFrog Method

When $$K(p,q)$$ is independent of $$q$$, i.e., $$K(p,q) = K(p)$$ and $$\pi(p|q) = \pi(p)$$, LeapFrog method can be used, which is sympletic and better than Euler's method. We first state the Hamiltonian equations under this new assumption: \begin{align} \frac{d q}{dt} &= \frac{\partial H(q,p)}{\partial p} = \frac{\partial K}{\partial p} \\ \frac{d p}{dt} &= -\frac{\partial H}{\partial q} = -\frac{\partial U}{\partial q} \end{align} Each step of the LeapFrog method is then given by: \begin{align} p_{k+0.5} &= p_k + \frac{\epsilon}{2} \cdot \frac{dp}{dt}(t+k\epsilon) = p_k - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_k) \\ q_{k+1} &= q_k + \epsilon \cdot \frac{dq}{dt}(t+(k+0.5)\epsilon) = q_k + \epsilon \cdot \frac{\partial K}{\partial p}(p_{k+0.5}) \\ p_{k+1} &= p_{k+0.5} + \frac{\epsilon}{2} \cdot \frac{dp}{dt}(t+\epsilon) = p_{k+0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_{k+1}) \end{align} Note in a practical implementation, when the desired number of steps $$n$$ is larger than 1, the intermediate states $$p_k$$ can be omitted, with only $$p_{k+0.5}$$ actually computed. The whole LeapFrog method with $$n$$ simulation steps can be given by:

• $$p_{0.5} = p_0 - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_0)$$
• For each $$k = 0 \dots n - 2$$
• $$q_{k+1} = q_k + \epsilon \cdot \frac{\partial K}{\partial p}(p_{k+0.5})$$
• $$p_{k+1.5} = p_{k+0.5} - \epsilon \cdot \frac{\partial U}{\partial q}(q_{k+1})$$
• $$q_n = q_{n-1} + \epsilon \cdot \frac{\partial K}{\partial p}(p_{n-0.5})$$
• $$p_n = p_{n-0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_n)$$

### Metropolis-Hastings Acceptance Rate

After applying a reversible numerical integrator, such that the candidate $$(q^{(t)},p^{(t)}) \to (q^{\star},-p^{\star})$$ has been proposed, Metropolis-Hastings acceptance rate should be calculated, in order to determine whether or not to accept the new state. The rate is given by: \begin{align} \rho((q^{(t)}, p^{(t)}), (q^{\star}, -p^{\star})) &= \min\left\{ 1, \frac{\pi(q^{\star},-p^{\star})}{\pi(q,p)} \frac{Q(q^{(t)},p^{(t)}|q^{\star},-p^{\star})}{Q(q^{\star},-p^{\star}|q^{(t)},p^{(t)})} \right\} \\ &= \min\left\{ 1, \frac{\pi(q^{\star},-p^{\star})}{\pi(q,p)} \right\} \\ &= \min\left\{ 1, \frac{\exp\left(-H(q^{\star},-p^{\star}) \right)}{\exp\left(-H(q^{(t)},p^{(t)}) \right)} \right\} \\ &= \min\left\{ 1, \exp\left( H(q^{(t)},p^{(t)}) - H(q^{\star},-p^{\star}) \right) \right\} \end{align}

The proposal distribution $$Q(q^{(t)},p^{(t)}|q^{\star},-p^{\star}) \equiv 1$$ and $$Q(q^{\star},-p^{\star}|q^{(t)},p^{(t)}) \equiv 1$$, since the integrator is reversible and deterministic.