Density ratio trick

In machine learning, the ratio \(p(x)/q(x)\) is sometimes the only quantity needed to be estimated. When this is the case, it might be sufficient to learn the ratio \(r(x) = p(x)/q(x)\) instead of the exact density of both \(p(x)\) and \(q(x)\).

Density estimation on KL-divergence

Classifier Approach

The KL divergence \(\mathrm{KL}\left[p(x)\|q(x)\right]\) is defined by:

\[ \mathrm{KL}\left[p(x)\|q(x)\right] = \int p(x) \log \frac{p(x)}{q(x)} \,\mathrm{d}x \]

If we sample \(x\) with equal probability from \(p(x)\) and \(q(x)\), and we train a classifier \(p(x\in p|x) = \frac{1}{1+\exp(-r(x))}\) to distinguish whether \(x\) is sampled from \(p(x)\) or \(q(x)\), then the optimal classifier \(p^\star(x\in p|x)\) should be:

\[ p^\star(x\in p|x) = \frac{p(x)}{p(x)+q(x)} \]

That is, the optimal \(r^\star(x)\) equals to:

\[ r^\star(x) = \log \frac{p(x)}{q(x)} \]

Thus we can just train this classifier to estimate the KL divergence, if we can sample \(x\) from both \(p(x)\) and \(q(x)\).