# Density ratio trick

In machine learning, the ratio $$p(x)/q(x)$$ is sometimes the only quantity needed to be estimated. When this is the case, it might be sufficient to learn the ratio $$r(x) = p(x)/q(x)$$ instead of the exact density of both $$p(x)$$ and $$q(x)$$.

## Density estimation on KL-divergence

### Classifier Approach

The KL divergence $$\mathrm{KL}\left[p(x)\|q(x)\right]$$ is defined by:

$\mathrm{KL}\left[p(x)\|q(x)\right] = \int p(x) \log \frac{p(x)}{q(x)} \,\mathrm{d}x$

If we sample $$x$$ with equal probability from $$p(x)$$ and $$q(x)$$, and we train a classifier $$p(x\in p|x) = \frac{1}{1+\exp(-r(x))}$$ to distinguish whether $$x$$ is sampled from $$p(x)$$ or $$q(x)$$, then the optimal classifier $$p^\star(x\in p|x)$$ should be:

$p^\star(x\in p|x) = \frac{p(x)}{p(x)+q(x)}$

That is, the optimal $$r^\star(x)$$ equals to:

$r^\star(x) = \log \frac{p(x)}{q(x)}$

Thus we can just train this classifier to estimate the KL divergence, if we can sample $$x$$ from both $$p(x)$$ and $$q(x)$$.