# Metropolis-Hastings Algorithm

## Problem Statement

To sample from $$p(x)$$, where $$p(x)$$ is not easy to sample from, but given a sample $$x$$, the density of $$p(x)$$ is easy to evaluate, or at least the unnormalized density $$\tilde{p}(x)$$ is easy to calculate.

## Metropolis-Hastings Algorithm

Given $$x^{(t)}$$, $$X^{(t+1)}$$ is generated by the following procedure, with a proposal distribution $$q(y|x)$$:

\begin{align} Y^{(t)} &\sim q(y|x^{(t)}) \\ X^{(t+1)} &= \begin{cases} Y^{(t)} & \text{with probability} & \rho(x^{(t)}, Y^{(t)}) \\ x^{(t)} & \text{with probability} & 1 - \rho(x^{(t)}, Y^{(t)}) \end{cases} \end{align} where $\rho(x,y) = \min\left\{ 1, \frac{p(y)}{p(x)} \frac{q(x|y)}{q(y|x)} \right\}$ Note the transition kernel of such a chain can be formulated as: \begin{align} T(x,y) &= \rho(x,y)\,q(y|x) + (1-r(x))\,\delta_x(y) \\ r(x) &= \int \rho(x,y)\,q(y|x)\,dy \end{align} where $$\delta_x$$ is the Dirac mass in $$x$$. It is easy to verify that $$T$$ is reversible.

Metropolis-Hastings algorithm is easy to implement in multiple situations, in any one of the following situations:

1. When $$q$$ is reversible, i.e., $$q(x|y) = q(y|x)$$, and $$p(x)$$ or $$\tilde{p}(x)$$ is easy to calculate.
2. When $$p(y) / q(y|x)$$ is independent of $$x$$, and easy to calculate.