# Overview

The density $$p_m(\mathbf{x};\theta)$$ of an undirected graphical model is often given by a tractable, but unnormalized density function $$\tilde{p}_m(\mathbf{x};\theta)$$:

$p_m(\mathbf{x};\theta) = \frac{1}{Z(\theta)} \, \tilde{p}_m(\mathbf{x};\theta)$

where $$Z(\theta)=\int \tilde{p}_m(\mathbf{x};\theta)\,d\mathbf{x}$$ is the partition function of $$\tilde{p}_m(\mathbf{x};\theta)$$, usually not exactly tractable.

The expectation of the energy function $$U(\mathbf{x};\theta)$$ over a given dataset $$\mathbf{X} = \{\mathbf{x}_1,\dots,\mathbf{x}_N\}$$ is the average negative log-likelihood, i.e.: $U(\mathbf{X};\theta) = -\frac{1}{N} \sum_{i=1}^N \log p_m(\mathbf{x}_i;\theta) = \log Z(\theta) - \frac{1}{N} \sum_{i=1}^N \log \tilde{p}_m(\mathbf{x}_i;\theta)$ Usually, one would expect to obtain the optimal parameters $$\theta^{\star}$$, which minimizes $$U(\mathbf{X};\theta)$$. However, since $$Z(\theta)$$ cannot be exactly computed, the estimation of $$U(\mathbf{X};\theta)$$ requires special techniques, which briefly falls into the following categories:

1. To approximate $$Z(\theta)$$, e.g., by using Monte Carlo methods.
2. To directly find the optimum $$\theta^{\star}$$ of $$U(\mathbf{X};\theta)$$ without having to estimate $$Z(\theta)$$.