Counting 3,834 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1


YoloV3 Implemented in TensorFlow 2.0

This repo provides a clean implementation of YoloV3 in TensorFlow 2.0 using all the best practices.

Key Features

  • TensorFlow 2.0
  • yolov3 with pre-trained Weights
  • yolov3-tiny with pre-trained Weights
  • Inference example
  • Transfer learning example
  • Eager mode training with tf.GradientTape
  • Graph mode training with
  • Functional model with tf.keras.layers
  • Input pipeline using
  • Tensorflow Serving
  • Vectorized transformations
  • GPU accelerated
  • Fully integrated with absl-py from
  • Clean implementation
  • Following the best practices
  • MIT License

demo demo




pip install -r requirements.txt


conda env create -f environment.yml
conda activate yolov3-tf2

Convert pre-trained Darknet weights

# yolov3
wget -O data/yolov3.weights

# yolov3-tiny
wget -O data/yolov3-tiny.weights
python --weights ./data/yolov3-tiny.weights --output ./checkpoints/ --tiny


# yolov3
python --image ./data/meme.jpg

# yolov3-tiny
python --weights ./checkpoints/ --tiny --image ./data/street.jpg

# webcam
python --video 0

# video file
python --video path_to_file.mp4 --weights ./checkpoints/ --tiny


You need to generate tfrecord following the TensorFlow Object Detection API. For example you can use Microsoft VOTT to generate such dataset. You can also use this script to create the pascal voc dataset.

python --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode eager_tf --transfer fine_tune

python --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode fit --transfer none

python --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode fit --transfer no_output

python --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 10 --mode eager_fit --transfer fine_tune --weights ./checkpoints/ --tiny

Tensorflow Serving

You can export the model to tf serving

python --output serving/yolov3/1/
# verify tfserving graph
saved_model_cli show --dir serving/yolov3/1/ --tag_set serve --signature_def serving_default

The inputs are preprocessed images (see dataset.transform_iamges)

outputs are

yolo_nms_0: bounding boxes
yolo_nms_1: scores
yolo_nms_2: classes
yolo_nms_3: numbers of valid detections

Benchmark (No Training Yet)

Numbers are obtained with rough calculations from

Macbook Pro 13 (2.7GHz i5)

Detection 416x416 320x320 608x608
YoloV3 1000ms 500ms 1546ms
YoloV3-Tiny 100ms 58ms 208ms

Desktop PC (GTX 970)

Detection 416x416 320x320 608x608
YoloV3 74ms 57ms 129ms
YoloV3-Tiny 18ms 15ms 28ms

AWS g3.4xlarge (Tesla M60)

Detection 416x416 320x320 608x608
YoloV3 66ms 50ms 123ms
YoloV3-Tiny 15ms 10ms 24ms

Darknet version of YoloV3 at 416x416 takes 29ms on Titan X. Considering Titan X has about double the benchmark of Tesla M60, Performance-wise this implementation is pretty comparable.

Implementation Details

Eager execution

Great addition for existing TensorFlow experts. Not very easy to use without some intermediate understanding of TensorFlow graphs. It is annoying when you accidentally use incompatible features like tensor.shape[0] or some sort of python control flow that works fine in eager mode, but totally breaks down when you try to compile the model to graph.

model(x) vs. model.predict(x)

When calling model(x) directly, we are executing the graph in eager mode. For model.predict, tf actually compiles the graph on the first run and then execute in graph mode. So if you are only running the model once, model(x) is faster since there is no compilation needed. Otherwise, model.predict or using exported SavedModel graph is much faster (by 2x).


Extremely useful for debugging purpose, you can set breakpoints anywhere. You can compile all the keras fitting functionalities with gradient tape using the run_eagerly argument in model.compile. From my limited testing, all training methods including GradientTape,, eager or not yeilds similar performance. But graph mode is still preferred since it's a tiny bit more efficient.


@tf.function is very cool. It's like an in-between version of eager and graph. You can step through the function by disabling tf.function and then gain performance when you enable it in production. Important note, you should not pass any non-tensor parameter to @tf.function, it will cause re-compilation on every call. I am not sure whats the best way other than using globals. (abseil)

Absolutely amazing. If you don't know already, is officially used by internal projects at Google. It standardizes application interface for Python and many other languages. After using it within Google, I was so excited to hear abseil going open source. It includes many decades of best practices learned from creating large size scalable applications. I literally have nothing bad to say about it, strongly recommend to everybody.

Loading pre-trained Darknet weights

very hard with pure functional API because the layer ordering is different in tf.keras and darknet. The clean solution here is creating sub-models in keras. Keras is not able to save nested model in h5 format properly, TF Checkpoint is recommended since its offically supported by TensorFlow.


It doesn't work very well for transfer learning. There are many articles and github issues all over the internet. I used a simple hack to make it work nicer on transfer learning with small batches.

Command Line Args Reference
  --output: path to output
    (default: './checkpoints/')
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './data/yolov3.weights')
  --classes: path to classes file
    (default: './data/coco.names')
  --image: path to input image
    (default: './data/girl.png')
  --output: path to output image
    (default: './output.jpg')
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './checkpoints/')
  --batch_size: batch size
    (default: '8')
    (an integer)
  --classes: path to classes file
    (default: './data/coco.names')
  --dataset: path to dataset
    (default: '')
  --epochs: number of epochs
    (default: '2')
    (an integer)
  --learning_rate: learning rate
    (default: '0.001')
    (a number)
  --mode: <fit|eager_fit|eager_tf>: fit:, eager_fit:, eager_tf: custom GradientTape
    (default: 'fit')
  --size: image size
    (default: '416')
    (an integer)
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --transfer: <none|darknet|no_output|frozen|fine_tune>: none: Training from scratch, darknet: Transfer darknet, no_output: Transfer all but output, frozen: Transfer and
    freeze all, fine_tune: Transfer all and freeze darknet only
    (default: 'none')
  --val_dataset: path to validation dataset
    (default: '')
  --weights: path to weights file
    (default: './checkpoints/')


It is pretty much impossible to implement this from the yolov3 paper alone. I had to reference the official (very hard to understand) and many un-official (many minor errors) repos to piece together the complete picture.