Indian Traffic Semantic Segmentation.

7 min readMar 17, 2021

*Image Source* : https://www.cogitotech.com/semantic-segmentation/

Computer Vision :

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then react to what they “see.”

Source : https://www.sas.com/en_in/insights/analytics/computer-vision.html

What are the basic major goals of the computer vision???

In this our goal is to segment the Indian Traffic Images obtained from the video which contains the traffic sites observed in various cities of India.

What is Semantic Segmentation??

The Image Semantic Segmentation consists in classifying each pixel of an image (or just several ones) into an instance, each instance (or category) corresponding to an object or a part of the image (road, sky, …). This task is part of the concept of scene understanding. Since, we’re predicting for every pixel in the image, this task is commonly referred to as dense prediction.

One important thing to note is that we’re not separating instances of the same class; we only care about the category of each pixel. In other words, semantic segmentation does not essentially distinguish two objects of the same category. There is another type of segmentation called Instance Segmentation which does separate every objects in spite of the category.

Major use cases of Semantic Segmentation :

Autonomous Vehicles.
Medical Diagnostics.
Satellite Imaging Systems.

Information about Dataset :

While several datasets for autonomous navigation have become available in recent years, they have tended to focus on structured driving environments. This usually corresponds to well-delineated infrastructure such as lanes, a small number of well-defined categories for traffic participants, low variation in object or background appearance and strong adherence to traffic rules. We propose a novel dataset for road scene understanding in unstructured environments where the above assumptions are largely not satisfied. It consists of 10,000 images, finely annotated with 34 classes collected from 182 drive sequences on Indian roads. The label set is expanded in comparison to popular benchmarks such as Cityscapes, to account for new classes.

The dataset consists of images obtained from a front facing camera attached to a car. The car was driven around Hyderabad, Bangalore cities and their outskirts. The images are mostly of 1080p resolution, but there is also some images with 720p and other resolutions.

Indian Driving Dataset-Dataset Source.

Segmentation Task :

Structure of Data :

|--- data
|-----| ---- images
|-----| ------|----- Scene 1
|-----| ------|--------| ----- Frame 1 (image 1)
|-----| ------|--------| ----- Frame 2 (image 2)
|-----| ------|--------| ----- ...
|-----| ------|----- Scene 2
|-----| ------|--------| ----- Frame 1 (image 1)
|-----| ------|--------| ----- Frame 2 (image 2)
|-----| ------|--------| ----- ...
|-----| ------|----- .....
|-----| ---- masks
|-----| ------|----- Scene 1
|-----| ------|--------| ----- json 1 (labeled objects in image 1)
|-----| ------|--------| ----- json 2 (labeled objects in image 1)
|-----| ------|--------| ----- ...
|-----| ------|----- Scene 2
|-----| ------|--------| ----- json 1 (labeled objects in image 1)
|-----| ------|--------| ----- json 2 (labeled objects in image 1)
|-----| ------|--------| ----- ...
|-----| ------|----- .....

Pre-processing :

Creating Dataframe of Images and it’s respective json paths.

# This snippet will create a dataframe with two columns [‘images’, ‘json’]
# The column ‘images’ will have path to images
# The column ‘json’ will have path to json filespath = ‘data’
G = []
for i in os.listdir(path):
lst = []
path1 = os.path.join(path, i)
print(path1)
for j in os.listdir(path1):
path2 = os.path.join(path1, j)
for k in os.listdir(path2):
path3 = os.path.join(path2, k)
lst.append(path3)
G.append(lst)data = pd.DataFrame(list(zip(G[0], G[1])), columns = ['images', 'json'])
data.head()

Structure of JSON file

Each File will have 3 attributes.

imgHeight: Which tells the height of the image.
imgWidth: Which tells the width of the image.
objects: It is a list of objects, each object will have multiple attributes,

label: The type of the object.
polygon: A list of two element lists, representing the coordinates of the polygon.

Unique Label Computation

From the given json file, we will be extracting the labels and find its respective number labels.

total_label = []
for file in data[‘json’].values:
f = open(file)
Data = json.load(f)
for obj in Data[‘objects’]:
total_label.append(obj[‘label’])
f.close()
total_label = set(total_label)
len(total_label)

label_clr = {‘road’:10, ‘parking’:20, ‘drivable fallback’:20,’sidewalk’:30,’non-drivable fallback’:40,’rail track’:40,\
‘person’:50, ‘animal’:50, ‘rider’:60, ‘motorcycle’:70, ‘bicycle’:70, ‘autorickshaw’:80,\
‘car’:80, ‘truck’:90, ‘bus’:90, ‘vehicle fallback’:90, ‘trailer’:90, ‘caravan’:90,\
‘curb’:100, ‘wall’:100, ‘fence’:110,’guard rail’:110, ‘billboard’:120,’traffic sign’:120,\
‘traffic light’:120, ‘pole’:130, ‘polegroup’:130, ‘obs-str-bar-fallback’:130,’building’:140,\
‘bridge’:140,’tunnel’:140, ‘vegetation’:150, ‘sky’:160, ‘fallback background’:160,’unlabeled’:0,\
‘out of roi’:0, ‘ego vehicle’:170, ‘ground’:180,’rectification border’:190,\
‘train’:200}

Here a number is given for each object types, if you see we are having 21 different set of objects.
Note that we have multiplied each object’s number with 10, that is just to make different objects look differently in the segmentation map.
Before you pass it to the models, you might need to divide the image array /10.

Extracting the polygons from the json files and Creating Image segmentations by drawing set of polygons.

From the given image width, height and polygons (co-ordinates), we draw the coloured mask to segment different objects.

n = []
out_path = ‘data/out’
final_path = os.path.join(out_path, ‘201’)
os.mkdir(final_path)
n.append(‘201’)
for files in tqdm(data[‘json’].values):
w, h, labels, vertexlist = get_poly(files)
img = Image.new(“RGB”, (w, h))
img1 = ImageDraw.Draw(img)
n.append(files[10:13])
final_path = os.path.join(out_path, files[10:13])
if (n[-2] != n[-1]):
os.mkdir(final_path)
for i in range(0,len(labels)):
if len(vertexlist[i])>1:
img1.polygon(vertexlist[i], fill = label_clr[labels[i]])
img=np.array(img)
im = Image.fromarray(img[:,:,0])
im.save(os.path.join(final_path, files[14:-5]+”.png”))

The above snippet creates the mask of every image and stored it as “.png” format and appended with the Dataframe created above.

This is how the mask gets created.

Applying Unet to segment the Images :

Reference paper for U-Net : https://arxiv.org/abs/1505.04597

We are importing the pretrained unet architecture from the segmentation models.

Ref : https://github.com/qubvel/segmentation_models

tf.keras.backend.clear_session()
os.environ[“SM_FRAMEWORK”] = “tf.keras”
import segmentation_models as sm
from segmentation_models import Unet
from segmentation_models.metrics import iou_score
#sm.set_framework(‘tf.keras’)
tf.keras.backend.set_image_data_format(‘channels_last’)
#from tensorflow.keras.applications.resnet import preprocess_input
# loading the unet model and using the resnet 50 and initialized weights with Imagenet weights.
# “classes” :different types of classes in the dataset.
backbone = ‘resnet50’
preprocess_input = sm.get_preprocessing(backbone)tf.keras.backend.clear_session()model = Unet(backbone_name = backbone, input_shape = IMAGE_SHAPE, classes = n_classes, activation = ‘softmax’, encoder_freeze = True,encoder_weights = ‘imagenet’, decoder_block_type = ‘upsampling’)

In this, ResNet50 is used as a backbone model for U-Net in both Downsampling and Upsampling.

Metrics and Loss

# Dice loss and IOU Score.
optim = tf.keras.optimizers.Adam(0.0001)
focal_loss = sm.losses.cce_dice_loss
model.compile(optim, focal_loss, metrics=[iou_score])

Dice Loss
Dice coefficient is essentially a measure of overlap between two samples. This measure ranges from 0 to 1 where a Dice coefficient of 1 denotes perfect and complete overlap.
Dice = 2|A∩B| / |A|+|B|

IOU Score
IoU metric measures the number of pixels common between the target and prediction masks divided by the total number of pixels present across both masks.
IoU = target ∩ prediction / target ∪ prediction

Training UNet Model

Model is trained for 50 epochs.

history = model.fit_generator(train_dataloader, steps_per_epoch = train_steps, epochs = 50,\
validation_data = val_dataloader, validation_steps = valid_steps,callbacks = callbacks)