Implementing the "GrabCut" Segmentation Technique as a
Plugin for the GIMP
Problem Statement
The aim of this project is to provide the open source community with a quick, easy to use segmentation tool. The GNU Image Manipulation Program (the GIMP) is an open source image editing package that requires just such a tool. The aim of this project can therefore be met by creating a tool for the GIMP that performs image segmentation.
So what exactly is image segmentation?
Image editing requires the ability to quickly and easily extract foreground objects from digital images. A foreground object refers to any object of interest in an image. The background of the image refers to all pixels in the image that are not part of the foreground object. The process of separating an image into foreground and background parts is known as image segmentation. In the pictures below a soldier has been set as a foreground object in the picture, and segmented to eliminate the background. Humans perform image segmentation very easily by using object recognition. Children going to preschool learn how to segment images by cutting an object out of a picture along a dotted line. We find image segmentation to be an extremely simple task, but it is definitely not a trivial task for computers.
Original Image | Segmented Image |
In order for computers to perform image segmentation the algorithm performing the segmentation needs to use information encapsulated in the digital image to calculate the best segmentation. Computers have no means of intelligently recognising objects, and so many different methods have been developed in order to segment images. The segmentation process in based on various features found in the image. This might be colour information that is used to create histograms, or information about the pixels that indicate edges or boundaries or texture information.
Edge detection and region detection methods are most commonly used to segment images. Boundary information is the edge information in an image which can be gained by noting the difference between the colours of nearby pixels. A large difference in colour will indicate an edge. A region is the area enclosed by an edge, and the colour of neighbouring pixels in a region are similar.
Region and edge detection methods are sufficient for segmenting images with clear boundaries, however if the edges of the object to be segmented are fuzzy these methods do not produce satisfactory segmentation results. In order to produce the best segmentation of an image with fuzzy boundaries it is necessary to perform matting. Matting is a technique which smoothes the boundaries of segmented objects, and makes their appearance more natural.
Pixels in digital images have individual alpha values. An alpha value is the opacity that a pixels has, and is allowed to vary between 0 and 255. Thus a pixel with an alpha value of 0 will be totally transparent, and an alpha value of 255 will cause the pixel to be totally opaque. Without matting, a segmented object will have all its pixels set to an alpha value of 255. Matting is the process which smoothes the boundaries of objects by setting alpha values along the objects boundary to change smoothly from 0 to 255. In the alpha matte below white represents alpha values of 255, black represents alpha values of 0 and values in between are represented by shades of grey.
The original image | The alpha matte after segmentation |
"GrabCut"
Most segmentation techniques make use of either edge or region information contained in the image in order to perform segmentation. "GrabCut" is an innovative segmentation technique that uses both region and boundary information contained in an image in order to perform segmentation. "GrabCut" also performs image segmentation in a novel way by using graphs to store region and boundary information. A Min-Cut/Max-Flow algorithm, which is a graph cut technique, is used to segment the graph and in doing so segment the image. "GrabCut" also includes a matting technique which is used to calculate the alpha matte for boundaries of segmented regions.
How "GrabCut" Works
"GrabCut" is a segmentation technique that uses graph cuts to perform segmentation. Like most segmentation techniques "GrabCut" uses information encapsulated in the image. Most segmentation techniques make use of either edge information or region information in the image. "GrabCut" makes use of both edge and region information. This information is used to create an energy function which, when minimized, produces the best segmentation.
In order to perform segmentation a graph is built, where nodes in the graph represent pixels in the image. In addition two special nodes are also created. These are the Sink and Source nodes. Every pixel node in the graph is connected to the Source and Sink node. The Source node represents the foreground of the image, and the Sink node the background. In order to segment the image the Source and Sink nodes must be separated.
The energy function is incorporated into the graph as weights between pixel nodes and weights between pixel and Source or Sink nodes. Weights between pixel nodes are determined by edge information in the image. Thus a strong indication of an edge between two pixels (a large difference in pixel colour) results in a very small weight between two pixel nodes. The region information determines the weights between pixels nodes and the Source and Sink nodes. These weights are calculated by determining the probability of the pixel node being part of the background or foreground region.
In order for foreground and background regions to be created, some pixels in the image need to be labelled before segmentation as either foreground or background. This is referred to as the clue marking stage. Any pixels that are labelled during this stage are set as hard constraints. This means that during the segmentation process, hard labelled pixels cannot change their labelling.
A Min-cut/Max-Flow algorithm is used to segment the graph. This algorithm determines the minimum cost cut that will separate the Source and Sink nodes. The cost of the cut is determined by the sum of all the weights of the links that are cut. Once the Source and Sink nodes are separated, all pixel nodes connected to the Source node become part of the foreground, and the rest become part of the background.
The figure below shows the process of labelling an image, creating the graph, and then segmenting the graph to produce a segmentation of the image.
A simplified diagram of the "GrabCut" approach |
"My Results"
I have implemented the "GrabCut" segmentation technique as a plugin for the GIMP, and here follow a few examples of the type of results that have been obtained:
Segmentation Results:
Input image with clues drawn | Segmentation results |
Matting Results:
Matting is performed along the right hand side of the leaf | Matting performed on the horses mane |