## Problem Statement

To sample from \(\pi(q)\), where \(\pi(q)\) is not easy to sample from, but given a sample \(q\), the density of \(\pi(q)\) is easy to evaluate, or at least the *unnormalized density* \(\tilde{\pi}(q)\) is easy to calculate.

*Note: as a convention of the Hamiltonian Monte Carlo method literature, the notations for vectors and matrices are not bold.*

## Hamiltonian Dynamics

### Basic Elements

The Hamiltonian Monte Carlo method uses Hamiltonian dynamics system to derive the transition kernel \(T\). It auguments the state space \(q\) with a momentum \(p\), forming the extended *phase space* \((q, p)\). A Hamiltonian system is then constructed for this phase space, with Hamiltonian function \(H(q,p)\), *i.e.*, the total energy of this system.

The density for the phase space, \(\pi(q,p)\) is then given by: \[ \begin{align} \pi(q, p) &= \exp\left( -H(q,p) \right) \end{align} \] satisfying \(\int \pi(q,p)\,dp = \pi(q)\).

Since \(H(q,p)\) is the energy function of a Hamiltonian system, it can further be decomposed into kinetic and potential energies: \[ \begin{align} H(q,p) &= -\log \pi(q,p) \\ &= -\log \pi(p|q) - \log \pi(q) \\ &= K(p,q) + U(q) \end{align} \] where \(K(p,q) = -\log \pi(p|q)\) is the Kinetic energy, and \(U(q) = -\log \pi(q)\) is the potential energy.

### Transition Kernel

The evolution of a Hamiltonian system over time is given by the following differential equations: \[ \begin{align} \frac{d q}{dt} &= \frac{\partial H}{\partial p} = \frac{\partial K}{\partial p} \\ \frac{d p}{dt} &= -\frac{\partial H}{\partial q} = -\frac{\partial K}{\partial q} - \frac{\partial U}{\partial q} \end{align} \] Let \(z=(q,p)\), the Hamiltonian equations can also be rewritten in the following form: \[ \begin{align} \frac{dz}{\mathrm{d}t} = J \,\nabla H(z) \end{align} \] where \(\nabla H\) is the gradient of \(H\), and: \[ J = \begin{bmatrix} 0 & I \\ -I & 0 \\ \end{bmatrix} \]

The transition kernel \(T\) is then given by the following procedure:

- Lift \(q^{(t)}\) to \((q^{(t)},p^{(t)})\), by sampling \(p^{(t)} \sim \pi(p|q^{(t)})\).
- Simulate the Hamiltonian system, by integrating the Hamiltonian differential equations, to obtain candidate state \((q^*, p^*)\).
- For reversibility, negate \(p^{\star}\), such that the candidate state becomes \((q^{\star},-p^{\star})\) (see below).
- To correct numerical errors, apply Metropolis-Hastings acceptance rate on the candidate state \((q^{\star}, -p^{\star})\), to obtain the final state \((q^{(t+1)},p^{(t+1)})\).
- Discard \(p^{(t+1)}\) and use \(q^{(t+1)}\) as the final sample of the Markov chain.

### Choices for \(K(p,q)\) (or equivalently, \(\pi(p|q)\))

**1. Euclidean-Gaussian Kinetic Energy**

This is perhaps the most commonly used kinetic energy, given by:

\[ \begin{align} K(p|q) &= \frac{1}{2} p^T M^{-1} p + \log \left| M \right| + \text{const} \\ \pi(p|q) &\propto \exp\left( -\frac{1}{2} p^T M^{-1} p \right) \end{align} \]

Note \(\pi(p|q) = \mathcal{N}(p\,|\,0,M)\) is a Gaussian distribution independent of \(q\).

**2. Riemannian-Gaussian Kinetic Energy** \[ \begin{align} K(p|q) &= \frac{1}{2} \cdot p^T \cdot \Sigma^{-1}(q) \cdot p + \log \left| \Sigma(q) \right| + \text{const} \\ \pi(p|q) &\propto \exp\left( -\frac{1}{2} p^T \cdot \Sigma(q)^{-1}\cdot p \right) \end{align} \]

Note \(\pi(p|q) = \mathcal{N}(p|0,\Sigma(q))\) is a Gaussian distribution dependent of \(q\). Such dependence can reflect the local geometry of the target distribution, thus can be better than Euclidean-Gaussian Kinetic Energy.

### Numerical Integrator for the Hamiltonian Equations

Numerical integrators are adopted to simulate a Hamiltonian dynamics system. Given the size of simulation time step (denoted as \(\epsilon\)), and the total number of simulation steps to run (denoted as \(n\)), a numerical integrator transforms an initial state \((q(t),p(t))\) to \((q(t+n\epsilon), p(t+n\epsilon))\).

For simplicity, in this section, the initial state fed into the numerical integrator is denoted as \((q_0, p_0)\) (which is \((q(t),p(t))\)), while the transformed state after the \(k\)-th simulation step is denoted as \((q_k, p_k)\) (which is \((q(t+k\epsilon),p(t+k\epsilon))\)).

**1. Euler's Method**

Each step of Euler's method is given by: \[ \begin{align} p_{k+1} &= p_k + \epsilon \cdot \frac{dp}{dt}(t + k\epsilon) = p_k - \epsilon\cdot\left[ \frac{\partial K}{\partial q}(p_k, q_k) + \frac{\partial U}{\partial q}(q_k) \right] \\ q_{k+1} &= q_k + \epsilon \cdot \frac{dq}{dt}(t+k\epsilon) = q_k + \epsilon\cdot\frac{\partial K}{\partial p}(p_k,q_k) \end{align} \]

However, Euler's method is not *sympletic*, and is prone to divergence.

**2. LeapFrog Method**

When \(K(p,q)\) is independent of \(q\), *i.e.*, \(K(p,q) = K(p)\) and \(\pi(p|q) = \pi(p)\), LeapFrog method can be used, which is sympletic and better than Euler's method. We first state the Hamiltonian equations under this new assumption: \[ \begin{align} \frac{d q}{dt} &= \frac{\partial H(q,p)}{\partial p} = \frac{\partial K}{\partial p} \\ \frac{d p}{dt} &= -\frac{\partial H}{\partial q} = -\frac{\partial U}{\partial q} \end{align} \] Each step of the LeapFrog method is then given by: \[ \begin{align} p_{k+0.5} &= p_k + \frac{\epsilon}{2} \cdot \frac{dp}{dt}(t+k\epsilon) = p_k - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_k) \\ q_{k+1} &= q_k + \epsilon \cdot \frac{dq}{dt}(t+(k+0.5)\epsilon) = q_k + \epsilon \cdot \frac{\partial K}{\partial p}(p_{k+0.5}) \\ p_{k+1} &= p_{k+0.5} + \frac{\epsilon}{2} \cdot \frac{dp}{dt}(t+\epsilon) = p_{k+0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_{k+1}) \end{align} \] Note in a practical implementation, when the desired number of steps \(n\) is larger than 1, the intermediate states \(p_k\) can be omitted, with only \(p_{k+0.5}\) actually computed. The whole LeapFrog method with \(n\) simulation steps can be given by:

- \(p_{0.5} = p_0 - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_0)\)
- For each \(k = 0 \dots n - 2\)
- \(q_{k+1} = q_k + \epsilon \cdot \frac{\partial K}{\partial p}(p_{k+0.5})\)
- \(p_{k+1.5} = p_{k+0.5} - \epsilon \cdot \frac{\partial U}{\partial q}(q_{k+1})\)

- \(q_n = q_{n-1} + \epsilon \cdot \frac{\partial K}{\partial p}(p_{n-0.5})\)
- \(p_n = p_{n-0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_n)\)

### Metropolis-Hastings Acceptance Rate

After applying a *reversible* numerical integrator, such that the candidate \((q^{(t)},p^{(t)}) \to (q^{\star},-p^{\star})\) has been proposed, Metropolis-Hastings acceptance rate should be calculated, in order to determine whether or not to accept the new state. The rate is given by: \[ \begin{align} \rho((q^{(t)}, p^{(t)}), (q^{\star}, -p^{\star})) &= \min\left\{ 1, \frac{\pi(q^{\star},-p^{\star})}{\pi(q,p)} \frac{Q(q^{(t)},p^{(t)}|q^{\star},-p^{\star})}{Q(q^{\star},-p^{\star}|q^{(t)},p^{(t)})} \right\} \\ &= \min\left\{ 1, \frac{\pi(q^{\star},-p^{\star})}{\pi(q,p)} \right\} \\ &= \min\left\{ 1, \frac{\exp\left(-H(q^{\star},-p^{\star}) \right)}{\exp\left(-H(q^{(t)},p^{(t)}) \right)} \right\} \\ &= \min\left\{ 1, \exp\left( H(q^{(t)},p^{(t)}) - H(q^{\star},-p^{\star}) \right) \right\} \end{align} \]

The proposal distribution \(Q(q^{(t)},p^{(t)}|q^{\star},-p^{\star}) \equiv 1\) and \(Q(q^{\star},-p^{\star}|q^{(t)},p^{(t)}) \equiv 1\), since the integrator is reversible and deterministic.

### Further Reading Materials

See Betancourt (2017), Neal (2011), Carroll (2019) and Young (n.d.).

# References

Betancourt, Michael. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” *arXiv:1701.02434 [Stat]*, January. http://arxiv.org/abs/1701.02434.

Carroll, Colin. 2019. “Hamiltonian Monte Carlo from Scratch.” *Colin Carroll*. https://colindcarroll.com/2019/04/11/hamiltonian-monte-carlo-from-scratch/.

Neal, Radford M. 2011. “MCMC Using Hamiltonian Dynamics.” *Handbook of Markov Chain Monte Carlo* 2 (11).

Young, Peter. n.d. “The Leapfrog Method and Other ‘Symplectic’ Algorithms for Integrating Newton’s Laws of Motion,” 15.