Metropolis-Hastings Algorithm

Problem Statement

To sample from \(p(x)\), where \(p(x)\) is not easy to sample from, but given a sample \(x\), the density of \(p(x)\) is easy to evaluate, or at least the unnormalized density \(\tilde{p}(x)\) is easy to calculate.

Metropolis-Hastings Algorithm

Given \(x^{(t)}\), \(X^{(t+1)}\) is generated by the following procedure, with a proposal distribution \(q(y|x)\):

\[ \begin{align} Y^{(t)} &\sim q(y|x^{(t)}) \\ X^{(t+1)} &= \begin{cases} Y^{(t)} & \text{with probability} & \rho(x^{(t)}, Y^{(t)}) \\ x^{(t)} & \text{with probability} & 1 - \rho(x^{(t)}, Y^{(t)}) \end{cases} \end{align} \] where \[ \rho(x,y) = \min\left\{ 1, \frac{p(y)}{p(x)} \frac{q(x|y)}{q(y|x)} \right\} \] Note the transition kernel of such a chain can be formulated as: \[ \begin{align} T(x,y) &= \rho(x,y)\,q(y|x) + (1-r(x))\,\delta_x(y) \\ r(x) &= \int \rho(x,y)\,q(y|x)\,dy \end{align} \] where \(\delta_x\) is the Dirac mass in \(x\). It is easy to verify that \(T\) is reversible.

Metropolis-Hastings algorithm is easy to implement in multiple situations, in any one of the following situations:

  1. When \(q\) is reversible, i.e., \(q(x|y) = q(y|x)\), and \(p(x)\) or \(\tilde{p}(x)\) is easy to calculate.
  2. When \(p(y) / q(y|x)\) is independent of \(x\), and easy to calculate.