Hamiltonian Dynamics

Problem Statement

To sample from \(\pi(q)\), where \(\pi(q)\) is not easy to sample from, but given a sample \(q\), the density of \(\pi(q)\) is easy to evaluate, or at least the unnormalized density \(\tilde{\pi}(q)\) is easy to calculate.

Note: as a convention of the Hamiltonian Monte Carlo method literature, the notations for vectors and matrices are not bold.

Hamiltonian Dynamics

Basic Elements

The Hamiltonian Monte Carlo method uses Hamiltonian dynamics system to derive the transition kernel \(T\). It auguments the state space \(q\) with a momentum \(p\), forming the extended phase space \((q, p)\). A Hamiltonian system is then constructed for this phase space, with Hamiltonian function \(H(q,p)\), i.e., the total energy of this system.

The density for the phase space, \(\pi(q,p)\) is then given by: \[ \begin{align} \pi(q, p) &= \exp\left( -H(q,p) \right) \end{align} \] satisfying \(\int \pi(q,p)\,dp = \pi(q)\).

Since \(H(q,p)\) is the energy function of a Hamiltonian system, it can further be decomposed into kinetic and potential energies: \[ \begin{align} H(q,p) &= -\log \pi(q,p) \\ &= -\log \pi(p|q) - \log \pi(q) \\ &= K(p,q) + U(q) \end{align} \] where \(K(p,q) = -\log \pi(p|q)\) is the Kinetic energy, and \(U(q) = -\log \pi(q)\) is the potential energy.

Transition Kernel

The evolution of a Hamiltonian system over time is given by the following differential equations: \[ \begin{align} \frac{d q}{dt} &= \frac{\partial H}{\partial p} = \frac{\partial K}{\partial p} \\ \frac{d p}{dt} &= -\frac{\partial H}{\partial q} = -\frac{\partial K}{\partial q} - \frac{\partial U}{\partial q} \end{align} \] Let \(z=(q,p)\), the Hamiltonian equations can also be rewritten in the following form: \[ \begin{align} \frac{dz}{\mathrm{d}t} = J \,\nabla H(z) \end{align} \] where \(\nabla H\) is the gradient of \(H\), and: \[ J = \begin{bmatrix} 0 & I \\ -I & 0 \\ \end{bmatrix} \]

The transition kernel \(T\) is then given by the following procedure:

  1. Lift \(q^{(t)}\) to \((q^{(t)},p^{(t)})\), by sampling \(p^{(t)} \sim \pi(p|q^{(t)})\).
  2. Simulate the Hamiltonian system, by integrating the Hamiltonian differential equations, to obtain candidate state \((q^*, p^*)\).
  3. For reversibility, negate \(p^{\star}\), such that the candidate state becomes \((q^{\star},-p^{\star})\) (see below).
  4. To correct numerical errors, apply Metropolis-Hastings acceptance rate on the candidate state \((q^{\star}, -p^{\star})\), to obtain the final state \((q^{(t+1)},p^{(t+1)})\).
  5. Discard \(p^{(t+1)}\) and use \(q^{(t+1)}\) as the final sample of the Markov chain.

Choices for \(K(p,q)\) (or equivalently, \(\pi(p|q)\))

1. Euclidean-Gaussian Kinetic Energy

This is perhaps the most commonly used kinetic energy, given by:

\[ \begin{align} K(p|q) &= \frac{1}{2} p^T M^{-1} p + \log \left| M \right| + \text{const} \\ \pi(p|q) &\propto \exp\left( -\frac{1}{2} p^T M^{-1} p \right) \end{align} \]

Note \(\pi(p|q) = \mathcal{N}(p\,|\,0,M)\) is a Gaussian distribution independent of \(q\).

2. Riemannian-Gaussian Kinetic Energy \[ \begin{align} K(p|q) &= \frac{1}{2} \cdot p^T \cdot \Sigma^{-1}(q) \cdot p + \log \left| \Sigma(q) \right| + \text{const} \\ \pi(p|q) &\propto \exp\left( -\frac{1}{2} p^T \cdot \Sigma(q)^{-1}\cdot p \right) \end{align} \]

Note \(\pi(p|q) = \mathcal{N}(p|0,\Sigma(q))\) is a Gaussian distribution dependent of \(q\). Such dependence can reflect the local geometry of the target distribution, thus can be better than Euclidean-Gaussian Kinetic Energy.

Numerical Integrator for the Hamiltonian Equations

Numerical integrators are adopted to simulate a Hamiltonian dynamics system. Given the size of simulation time step (denoted as \(\epsilon\)), and the total number of simulation steps to run (denoted as \(n\)), a numerical integrator transforms an initial state \((q(t),p(t))\) to \((q(t+n\epsilon), p(t+n\epsilon))\).

For simplicity, in this section, the initial state fed into the numerical integrator is denoted as \((q_0, p_0)\) (which is \((q(t),p(t))\)), while the transformed state after the \(k\)-th simulation step is denoted as \((q_k, p_k)\) (which is \((q(t+k\epsilon),p(t+k\epsilon))\)).

1. Euler's Method

Each step of Euler's method is given by: \[ \begin{align} p_{k+1} &= p_k + \epsilon \cdot \frac{dp}{dt}(t + k\epsilon) = p_k - \epsilon\cdot\left[ \frac{\partial K}{\partial q}(p_k, q_k) + \frac{\partial U}{\partial q}(q_k) \right] \\ q_{k+1} &= q_k + \epsilon \cdot \frac{dq}{dt}(t+k\epsilon) = q_k + \epsilon\cdot\frac{\partial K}{\partial p}(p_k,q_k) \end{align} \]

However, Euler's method is not sympletic, and is prone to divergence.

2. LeapFrog Method

When \(K(p,q)\) is independent of \(q\), i.e., \(K(p,q) = K(p)\) and \(\pi(p|q) = \pi(p)\), LeapFrog method can be used, which is sympletic and better than Euler's method. We first state the Hamiltonian equations under this new assumption: \[ \begin{align} \frac{d q}{dt} &= \frac{\partial H(q,p)}{\partial p} = \frac{\partial K}{\partial p} \\ \frac{d p}{dt} &= -\frac{\partial H}{\partial q} = -\frac{\partial U}{\partial q} \end{align} \] Each step of the LeapFrog method is then given by: \[ \begin{align} p_{k+0.5} &= p_k + \frac{\epsilon}{2} \cdot \frac{dp}{dt}(t+k\epsilon) = p_k - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_k) \\ q_{k+1} &= q_k + \epsilon \cdot \frac{dq}{dt}(t+(k+0.5)\epsilon) = q_k + \epsilon \cdot \frac{\partial K}{\partial p}(p_{k+0.5}) \\ p_{k+1} &= p_{k+0.5} + \frac{\epsilon}{2} \cdot \frac{dp}{dt}(t+\epsilon) = p_{k+0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_{k+1}) \end{align} \] Note in a practical implementation, when the desired number of steps \(n\) is larger than 1, the intermediate states \(p_k\) can be omitted, with only \(p_{k+0.5}\) actually computed. The whole LeapFrog method with \(n\) simulation steps can be given by:

  • \(p_{0.5} = p_0 - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_0)\)
  • For each \(k = 0 \dots n - 2\)
    • \(q_{k+1} = q_k + \epsilon \cdot \frac{\partial K}{\partial p}(p_{k+0.5})\)
    • \(p_{k+1.5} = p_{k+0.5} - \epsilon \cdot \frac{\partial U}{\partial q}(q_{k+1})\)
  • \(q_n = q_{n-1} + \epsilon \cdot \frac{\partial K}{\partial p}(p_{n-0.5})\)
  • \(p_n = p_{n-0.5} - \frac{\epsilon}{2} \cdot \frac{\partial U}{\partial q}(q_n)\)

Metropolis-Hastings Acceptance Rate

After applying a reversible numerical integrator, such that the candidate \((q^{(t)},p^{(t)}) \to (q^{\star},-p^{\star})\) has been proposed, Metropolis-Hastings acceptance rate should be calculated, in order to determine whether or not to accept the new state. The rate is given by: \[ \begin{align} \rho((q^{(t)}, p^{(t)}), (q^{\star}, -p^{\star})) &= \min\left\{ 1, \frac{\pi(q^{\star},-p^{\star})}{\pi(q,p)} \frac{Q(q^{(t)},p^{(t)}|q^{\star},-p^{\star})}{Q(q^{\star},-p^{\star}|q^{(t)},p^{(t)})} \right\} \\ &= \min\left\{ 1, \frac{\pi(q^{\star},-p^{\star})}{\pi(q,p)} \right\} \\ &= \min\left\{ 1, \frac{\exp\left(-H(q^{\star},-p^{\star}) \right)}{\exp\left(-H(q^{(t)},p^{(t)}) \right)} \right\} \\ &= \min\left\{ 1, \exp\left( H(q^{(t)},p^{(t)}) - H(q^{\star},-p^{\star}) \right) \right\} \end{align} \]

The proposal distribution \(Q(q^{(t)},p^{(t)}|q^{\star},-p^{\star}) \equiv 1\) and \(Q(q^{\star},-p^{\star}|q^{(t)},p^{(t)}) \equiv 1\), since the integrator is reversible and deterministic.

Further Reading Materials

See Betancourt (2017), Neal (2011), Carroll (2019) and Young (n.d.).

References

Betancourt, Michael. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat], January. http://arxiv.org/abs/1701.02434.

Carroll, Colin. 2019. “Hamiltonian Monte Carlo from Scratch.” Colin Carroll. https://colindcarroll.com/2019/04/11/hamiltonian-monte-carlo-from-scratch/.

Neal, Radford M. 2011. “MCMC Using Hamiltonian Dynamics.” Handbook of Markov Chain Monte Carlo 2 (11).

Young, Peter. n.d. “The Leapfrog Method and Other ‘Symplectic’ Algorithms for Integrating Newton’s Laws of Motion,” 15.