# Image Segmentation

## Overview

For each pixel $$x_{ij}$$ on an image, predict its segmentation class $$c_{ij}$$.

## Supervised Methods

### Fully convolutional networks for semantic segmentation

Long, Shelhamer, and Darrell (2015) proposed to use deconvolutional layers to up-sample intermediate feature maps at different levels from a pre-trained convolutional neural network, in order to compose the pixel-wise classification output.

### U-Net

• Hierarchical deconvolution at different levels.
• Weighted cross-entropy loss: to balance the loss of different classes, and to enforce the net to put emphasis on the cell boundaries. \begin{align} \mathcal{L} &= \sum w(\mathbf{x}) \log p(\mathbf{x}) \\ w(\mathbf{x}) &= w_c(\mathbf{x}) + w_0 \cdot \exp \left( -\frac{d_1(\mathbf{x}) + d_2(\mathbf{x})^2}{2 \sigma^2} \right) \end{align} where $$w_c(\mathbf{x})$$ is the normalizing weight for the class of the pixel, while $$w_0$$ is a hyper-parameter. $$d_1$$ and $$d_2$$ is the distance from the background pixel $$\mathbf{x}$$ to the closest and second closest cell.

## Evaluation

### Metrics

Let $$n_{ij}$$ be the number of pixels of class $$i$$ being predicted to belong to class $$j$$. Suppose there are $$k$$ different classes, and $$t_i = \sum_{j} n_{ij}$$ be the total number of pixels of class $$i$$. Then we have the following metrics for image segmentation (Long, Shelhamer, and Darrell 2015):

• Pixel accuracy $\text{Pixel Acc} = \frac{\sum_i n_{ii}}{\sum_i t_i}$

• Mean accuracy $\text{Mean Acc} = \frac{1}{k} \sum_i \frac{n_{ii}}{t_i}$

• Mean IoU (Intersection over Union): $\text{Mean IoU} = \frac{1}{k} \sum_{i} \frac{n_ii}{t_i + \sum_j n_{ji} - n_{ii}}$

• Weighted IoU $\text{Weighted IoU} = \frac{1}{\sum_j t_j} \cdot\frac{\sum_i t_i n_{ii}}{t_i + \sum_j n_{ji} -n_{ii}}$

# References

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. 2015. “Fully Convolutional Networks for Semantic Segmentation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–40.