The density \(p_m(\mathbf{x};\theta)\) of an undirected graphical model is often given by a tractable, but unnormalized density function \(\tilde{p}_m(\mathbf{x};\theta)\):

\[ p_m(\mathbf{x};\theta) = \frac{1}{Z(\theta)} \, \tilde{p}_m(\mathbf{x};\theta) \]

where \(Z(\theta)=\int \tilde{p}_m(\mathbf{x};\theta)\,d\mathbf{x}\) is the partition function of \(\tilde{p}_m(\mathbf{x};\theta)\), usually not exactly tractable.

The expectation of the energy function \(U(\mathbf{x};\theta)\) over a given dataset \(\mathbf{X} = \{\mathbf{x}_1,\dots,\mathbf{x}_N\}\) is the average negative log-likelihood, i.e.: \[ U(\mathbf{X};\theta) = -\frac{1}{N} \sum_{i=1}^N \log p_m(\mathbf{x}_i;\theta) = \log Z(\theta) - \frac{1}{N} \sum_{i=1}^N \log \tilde{p}_m(\mathbf{x}_i;\theta) \] Usually, one would expect to obtain the optimal parameters \(\theta^{\star}\), which minimizes \(U(\mathbf{X};\theta)\). However, since \(Z(\theta)\) cannot be exactly computed, the estimation of \(U(\mathbf{X};\theta)\) requires special techniques, which briefly falls into the following categories:

  1. To approximate \(Z(\theta)\), e.g., by using Monte Carlo methods.
  2. To directly find the optimum \(\theta^{\star}\) of \(U(\mathbf{X};\theta)\) without having to estimate \(Z(\theta)\).

Estimating \(Z(\theta)\)

Find the Optimum \(\theta^{\star}\) without Estimating \(Z(\theta)\)