1. Insights
  2. AI Data
  3. Article
  • Share on Facebook
  • Share via email

Your complete guide to image segmentation

Posted March 2, 2022 - Updated February 23, 2023
A city street which includes tall buildings, cars, streetlights and pedestrians. Each object is colored differently to represent instance segmentation for autonomous vehicles.

Computer vision (CV) has advanced rapidly over the last few years. At its core, it is the technology that allows machines to process their surroundings as humans do. While the human brain is naturally capable of multi-tasking and making quick decisions, transferring this capability to machines has required decades of research and experimentation. Today, we can build computer vision models that can detect objects, determine shapes, predict object movements and take necessary actions based on data. Self-driving cars, aerial mapping, surveillance applications and various other extended reality technologies like AR and VR are all a result of the progress made in computer vision models.

The most popular method used for training CV applications or implementing perception models is via labeling objects in an image, commonly referred to as object detection. The more granular method of training models at the pixel level is segmentation. Read on to explore the different types of image segmentation.

What is image segmentation?

All image annotations initiate a particular function for a machine learning (ML) algorithm’s output. To understand image segmentation, let’s look at the different types of annotations using simple examples.

This image contains four separate images labeled A, B, C and D. 

Image A in the top left-hand corner shows a dog with a ball in it's mouth running in a field. There is a pink bounding box around the entire photo with a label reading "Dog." The caption under the image says "A - Image classification". 

Image B in the top right-hand corner shows a dog with a ball in it's mouth running in a field. There is a pink bounding box around just the dog with a label reading "Dog". The caption under the image says "B - Object detection and localization." 

Image C in the bottom left-hand corner shows a dog and a cat sitting together on a bench. There is a pink bounding box around the dog with a label reading "Dog" and a green bounding box around the cat with a label reading "Cat." The caption under the image says "C - Multi-object detection and localization." 

Image D in the bottom right-hand corner shows a dog and a cat sitting together on a bench. There are green outlines around both the dog and the cat separately with corresponding labels reading "Dog" and "Cat." The caption under the image says "D - Semantic segmentation."

Image A has a single dog classification tag that helps object classifier models identify if a particular image in a sequence contains a dog.

Image B has only one dog identified via a 2D bounding box annotation that allows an object detection and localization model to predict a dog and its location in the given image.

Image C has both a dog and a cat with multiple 2D bounding boxes that could help a multi-object detection and localization model to detect the animals and understand where exactly they are located in the image.

Although the 2D bounding box annotations in Image B and Image C help models detect object classes and predict accurate locations, they do not provide an accurate representation of what the image consists of. This is where segmentation becomes critical for complex computer vision models.

Segmentation partitions each pixel in a given image to provide an accurate representation of the object shapes. Every pixel in the image belongs to at least one class, as opposed to object detection where the bounding boxes of objects can overlap. This provides a more granular understanding of the contents of the image. The goal here is to recognize and understand what an image contains at the pixel level. To this effect, annotators are given the task of separating an image into multiple sections and classifying every pixel in each segment to a corresponding class label.

For example, in Image B and Image C, you’ll see that object detection only showcases the classes but tells us nothing about the shape. In Image D, however, segmentation gives us pixel-wise information about the objects along with the class.

The essential guide to AI training data

Discover best practices for the sourcing, labeling and analyzing of training data from TELUS Digital, a leading provider of AI data solutions.

Access the guide

Different types of image segmentation tasks

Image segmentation tasks can be classified into three groups based on the amount and type of information they convey: semantic, instance and panoptic segmentation. Let’s explore the different types of image segmentation tasks.

Semantic (non-instance) segmentation

Semantic segmentation, also known as non-instance segmentation, helps specify the shape, size and form of the objects in addition to their location. It is primarily used in cases where a model needs to definitively know whether or not an image contains an object of interest and also what sections of the image don’t belong to the object. Pixels belonging to a particular class are simply classified as such with no other information or context taken into consideration.

Instance segmentation

Instance segmentation tracks the presence, location, number, size and shape of objects in an image. The goal here is to understand the image more accurately with every pixel. Therefore, the pixels are classified based on “instances” rather than classes to segregate overlapping or similar objects based on their boundaries.

Non-instance vs. instance segmentation

The key difference between non-instance and instance segmentation can be understood with the help of the following example.

This image contains two separate images labelled E and F. 

Image E on the left-hand side shows seven dogs laying and sitting on stairs. Each dog is outlined and highlighted in pink and labeled "Dog." The caption under the photo reads "E- Non-instance segmentation." 

Image F on the right-hand side shows seven dogs laying and sitting on stairs. Each dog is outlined and highlighted with it's own unique color. The dog furthest to the left is highlighted and outlined with the color pink and labeled "Dog 1." The next dog is highlighted and outlined with the color green and labeled "Dog 2." The next dog is highlighted and outlined with the color orange and labeled "Dog 3." The next dog is highlighted and outlined with the color light blue and labeled "Dog 4." The next dog is highlighted and outlined with the color dark blue and labeled "Dog 5." The next dog is highlighted and outlined with the color yellow and labeled "Dog 6." The next dog is highlighted and outlined with the color purple and labeled "Dog 7." 

The caption under the photo reads "F - Instance segmentation."

In Image E, there are no instances. All objects are simply labeled as “Dog” and the pixels are marked in pink. In Image F, the instances are uniquely marked. All objects are labeled independently as Dog 1, Dog 2 and so on, along with varying colors to segregate the objects of interest.

Pan-optic segmentation

Panoptic segmentation is by far the most informative task since it blends both semantic and instance segmentation, providing granular information for advanced ML algorithms.

Take a look at the different types of segmentation in the example below.

This image has four separate images labelled G, H, I, J. 

Image G in the top left-hand corner shows a city street which includes tall buildings, cars, buses and pedestrians. The caption below the image reads "G - Original Image." 

Image H in the top right-hand corner shows a city street which includes tall buildings, cars, buses and pedestrians. All of the cars are outlined and highlighted in a yellow color with the label "Car." All buildings are outlined and highlighted in a blue color with the label "Building." All buses are outlined and highlighted in a darker blue color with the label "Bus." The road is highlighted and outlined with a pink color and labeled "Road." The sky is outlined and highlighted in an orange color and labeled "Sky." The caption below the image reads "H - Semantic segmentation." 

Image I in the bottom left-hand corner shows a city street which includes tall buildings, cars, buses and pedestrians. All of the cars are outlined and highlighted in separate colors and labeled Car 1, Car 2, Car 3 and so on. All pedestrians are outlined and highlighted in separate colors. All buses are outlined and highlighted in separate colors and labeled Bus 1, Bus 2 and so on. The caption below the image reads "I - Instance segmentation."

Image J in the bottom right-hand corner shows a city street which includes tall buildings, cars, buses and pedestrians. All of the cars are outlined and highlighted in separate colors and labeled Car 1, Car 2, Car 3 and so on. All pedestrians are outlined and highlighted in separate colors. All buses are outlined and highlighted in separate colors and labeled Bus 1, Bus 2 and so on. The road is highlighted and outlined with a pink color and labeled "Road." The sky is outlined and highlighted in an orange color and labeled "Sky." The caption below the image reads "J - Panoptic segmentation."

Image H showcases segmented classes without instances and is tagged as Car, Building, Road, Sky. Each pixel of an object class is assigned a different color to classify the object pixels.

Image I illustrates instance segmentation where segmented classes are related to specific objects. For example, all cars are tagged as Car 1, Car 2, Car 3 (instances) and each pixel is assigned a different color to segregate the object region.

Image J has a combination of segmented classes along with instances Car 1, Car 2, Car 3, and non-instance classes like Sky, Road, etc.

Popular computer vision applications that use image segmentation

Segmentation is used for the granular understanding of images in a variety of industries. It is especially popular in the autonomous driving industry, as self-driving cars perform complex robotics tasks and require a deep understanding of their surroundings. In this area of research and experimentation, information about every pixel is critical and may influence the accuracy of the perception model.

A city street which includes tall buildings, cars, streetlights and pedestrians. Each object is colored differently. The caption below the image reads: "Instance segmentation for autonomous vehicles."

Another common application is the use of full-pixel, non-instanced segmentation when training perception models to identify objects of interest from faraway cameras for geospatial applications.

A famers field containing a growing crop, a plain dirt field and small shrubs. The growing crop is outlined and highlighted in a yellow color. The dirt field is outlined and highlighted in a pink color and the shrubs are outlined and highlighted in a blue color. The caption below the image reads: "Non-instance segmentation for geospatial application."

Other geospatial applications using semantic segmentation include geosensing for land usage mapping via satellite imagery, traffic management, city planning and road monitoring. Land cover information is also critical for various applications that monitor areas of deforestation and urbanization. Typically, each image pixel is segmented and classified into a specific type of land cover, for example, urban areas, agricultural land, water bodies, etc.

A satellite image of farmers fields. Each field is outlined and highlighted with a blue color. The caption below the image reads: "Non-instance segmentation for geo-sensing application."

Semantic segmentation of crops and weeds assists precision farming robotic initiatives in real-time to trigger weeding actions. These advanced computer vision systems significantly reduce manual monitoring of agricultural activities.

A view of a farmers field from the sky with a combine harvesting crops. The crops are outlined and highlighted with a green color. The combine is outlined and highlighted with an orange color. The previously harvested field, which now only contains dirt, is outlined and highlighted with a pink color. The trees, shrubs and grass outside of the field are outlined and highlighted with a yellow color. The caption below the image reads: "Non-instance segmentation for precision agriculture."

For fashion eCommerce brands, semantic segmentation enables the automation of tasks like clothing parsing that are typically very complex. Fine-grained clothing categorization requires higher levels of judgment based on the semantics of the clothing, variability of human-poses, and the potentially large number of classes and attributes involved.

A young female model sitting on a street curb. The model's face, hair and legs are outlined and highlighted with the color blue. The model's sweatshirt is outlined and highlighted with the color pink. The model's boots are outlined and highlighted with the color yellow. The model's handbag is outlined and highlighted with the color red. The model's shorts are outlined and highlighted with the color green. 

The caption below the image reads: "Instance segmentation for eCommerce fashion application."

Facial feature recognition is another common area of interest. The algorithms help estimate gender, expression/emotion, age, ethnicity and more by studying facial features. Factors like varying lighting conditions, facial expressions, orientation, occlusion and image resolution increase the complexity of these segmentation tasks.

A closeup photo of a middle aged man's face. The man's lips are outlined in red and labeled "lips," the man's teeth are outlined in purple and labeled "mouth," the man's face is outlined in red and the man's neck is outlined in purple. The caption below the image reads: "Non-instance segmentation for facial recognition application."

Computer vision technologies are also growing in popularity in the healthcare industry in relation to cancer research. A common use case where segmentation is applied is when instances are used for detecting the shapes of cancerous cell(s) to expedite diagnosis processes.

A view of cells through a microscope. Portions of each cell are outlined and highlighted in yellow. The caption of the image reads: "Instance segmentation for cancer cell identification."

Looking to start your segmentation project? Reach out to our experts who can help you create accurately labeled data with speed and scale.


Check out our solutions

Test and improve your machine learning models via our global AI Community of 1 million+ annotators and linguists.

Learn more