Semantic Segmentation using a Fully Convolutional Neural Network
This repository contains a set of python scripts to train and test semantic segmentation using a fully convolutional neural network. The semantic segmentation network is based on the paper described by Jonathan Long et al.
How to Train the Model
- Since the network uses VGG-16 weights, first, you have to download VGG-16 pre-trained weights from https://www.cs.toronto.edu/~frossard/vgg16/vgg16_weights.npz and save in the the
- Download KITTI dataset and save it in the
- Next, open a command window and type
python fcn.pyand hit the enter key.
Please note that training checkpointing will be saved to
checkpoints/kitti folder and logs will be saved to
graphs/kitti folder. So by using
tensorboard --logdir=graphs/kitti command, you can start tensorboard to inspect the training process.
Following images show sample output we obtained with the trained model.
We implement the
FCN-8s model described in the paper by Jonathan Long et al. Following figure shows the architecture of the network. We generated this figure using TensorBoard.
Additionally, we would like to describe main functionalities of the
python scripts of this repository in the following table.
||This is the main script of the repository. The key methods of this script are:
||The script contains the loss function we optimize during the training.|
||This script contains some useful utility function for generating training and testing batches.|
||This script contains some useful utility functions to building fully convolutional network using VGG-16 pre-trained weights.|
The KITTI dataset
For training the semantic segmentation network, we used the KITTI dataset. The dataset consists of 289 training and 290 test images. It contains three different categories of road scenes:
- uu - urban unmarked (98/100)
- um - urban marked (95/96)
- umm - urban multiple marked lanes (96/94)
Training the Model
When it comes to training any deep learning algorithm, selecting suitable hyper-parameters play a big role. For this project, we carefully select following hyper-parameters
|Learning Rate||1e-5||We used
|Number of epochs||25||The training dataset is not too big and it has only 289 training examples. Hence, we use a moderate number of epochs.|
|Batch Size||8||Based on the size of the training dataset, we selected batch size of 8 images.|
In this project, we investigated how to use a fully convolutional neural network for semantic segmentation. We tested our model against KITTI dataset. The results indicate that our model is quite capable of separating road pixels form the rest. However, we would like to work on following additional ta to increase the accuracy of our model.
- Data Augmentation: During our testing, we have found that our mode failed to label road surface when inadequate lighting in the environment. We think data augmentation can be used to generate more training examples with different lighting conditions. So additional data generated using data augmentation will help us to overcome the above-mentioned issue.