Image Segmentation


For each pixel \(x_{ij}\) on an image, predict its segmentation class \(c_{ij}\).

Supervised Methods

Fully convolutional networks for semantic segmentation

Figure 1: The Architecture of "Fully convolutional networks for semantic segmentation" (view pdf)

Long, Shelhamer, and Darrell (2015) proposed to use deconvolutional layers to up-sample intermediate feature maps at different levels from a pre-trained convolutional neural network, in order to compose the pixel-wise classification output.


Figure 2: The Architecture of "U-Net" (view pdf)
  • Hierarchical deconvolution at different levels.
  • Weighted cross-entropy loss: to balance the loss of different classes, and to enforce the net to put emphasis on the cell boundaries. \[ \begin{align} \mathcal{L} &= \sum w(\mathbf{x}) \log p(\mathbf{x}) \\ w(\mathbf{x}) &= w_c(\mathbf{x}) + w_0 \cdot \exp \left( -\frac{d_1(\mathbf{x}) + d_2(\mathbf{x})^2}{2 \sigma^2} \right) \end{align} \] where \(w_c(\mathbf{x})\) is the normalizing weight for the class of the pixel, while \(w_0\) is a hyper-parameter. \(d_1\) and \(d_2\) is the distance from the background pixel \(\mathbf{x}\) to the closest and second closest cell.



Let \(n_{ij}\) be the number of pixels of class \(i\) being predicted to belong to class \(j\). Suppose there are \(k\) different classes, and \(t_i = \sum_{j} n_{ij}\) be the total number of pixels of class \(i\). Then we have the following metrics for image segmentation (Long, Shelhamer, and Darrell 2015):

  • Pixel accuracy \[ \text{Pixel Acc} = \frac{\sum_i n_{ii}}{\sum_i t_i} \]

  • Mean accuracy \[ \text{Mean Acc} = \frac{1}{k} \sum_i \frac{n_{ii}}{t_i} \]

  • Mean IoU (Intersection over Union): \[ \text{Mean IoU} = \frac{1}{k} \sum_{i} \frac{n_ii}{t_i + \sum_j n_{ji} - n_{ii}} \]

  • Weighted IoU \[ \text{Weighted IoU} = \frac{1}{\sum_j t_j} \cdot\frac{\sum_i t_i n_{ii}}{t_i + \sum_j n_{ji} -n_{ii}} \]


Long, Jonathan, Evan Shelhamer, and Trevor Darrell. 2015. “Fully Convolutional Networks for Semantic Segmentation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–40.