Densely Connected Convolutional Networks (DenseNets)
This repository contains the code for DenseNet introduced in the following paper
Densely Connected Convolutional Networks (CVPR 2017, Best Paper Award)
Gao Huang*, Zhuang Liu*, Laurens van der Maaten and Kilian Weinberger (* Authors contributed equally).
Now with much more memory efficient implementation! Please check the technical report and code for more infomation.
The code is built on fb.resnet.torch.
Citation
If you find DenseNet useful in your research, please consider citing:
@inproceedings{huang2017densely,
title={Densely connected convolutional networks},
author={Huang, Gao and Liu, Zhuang and van der Maaten, Laurens and Weinberger, Kilian Q },
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Other Implementations
Our [Caffe], Our memoryefficient [Caffe], Our memoryefficient [PyTorch],
[PyTorch] by Andreas Veit, [PyTorch] by Brandon Amos, [PyTorch] by Federico Baldassarre,
[MXNet] by Nicatio,
[MXNet] by Xiong Lin,
[MXNet] by miraclewkf,
[Tensorflow] by Yixuan Li,
[Tensorflow] by Laurent Mazare,
[Tensorflow] by Illarion Khlestov,
[Lasagne] by Jan Schlüter,
[Keras] by tdeboissiere,
[Keras] by Roberto de Moura Estevão Filho,
[Keras] by Somshubra Majumdar,
[Chainer] by Toshinori Hanya,
[Chainer] by Yasunori Kudo,
[Torch 3DDenseNet] by Barry Kui,
[Keras] by Christopher Masch.
Note that we only listed some early implementations here. If you would like to add yours, please submit a pull request.
Some Following up Projects
 MultiScale Dense Convolutional Networks for Efficient Prediction
 DSOD: Learning Deeply Supervised Object Detectors from Scratch
 CondenseNet: An Efficient DenseNet using Learned Group Convolutions
 Fully Convolutional DenseNets for Semantic Segmentation
Contents
Introduction
DenseNet is a network architecture where each layer is directly connected to every other layer in a feedforward fashion (within each dense block). For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. This connectivity pattern yields stateoftheart accuracies on CIFAR10/100 (with or without data augmentation) and SVHN. On the large scale ILSVRC 2012 (ImageNet) dataset, DenseNet achieves a similar accuracy as ResNet, but using less than half the amount of parameters and roughly half the number of FLOPs.
Figure 1: A dense block with 5 layers and growth rate 4.
Figure 2: A deep DenseNet with three dense blocks.
Usage
 Install Torch and required dependencies like cuDNN. See the instructions here for a stepbystep guide.
 Clone this repo:
git clone https://github.com/liuzhuang13/DenseNet.git
As an example, the following command trains a DenseNetBC with depth L=100 and growth rate k=12 on CIFAR10:
th main.lua netType densenet dataset cifar10 batchSize 64 nEpochs 300 depth 100 growthRate 12
As another example, the following command trains a DenseNetBC with depth L=121 and growth rate k=32 on ImageNet:
th main.lua netType densenet dataset imagenet data [dataFolder] batchSize 256 nEpochs 90 depth 121 growthRate 32 nGPU 4 nThreads 16 optMemory 3
Please refer to fb.resnet.torch for data preparation.
DenseNet and DenseNetBC
By default, the code runs with the DenseNetBC architecture, which has 1x1 convolutional bottleneck layers, and compresses the number of channels at each transition layer by 0.5. To run with the original DenseNet, simply use the options bottleneck false and reduction 1
Memory efficient implementation (newly added feature on June 6, 2017)
There is an option optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function (with small modifications from here). There are two extreme memory efficient modes (optMemory 3 or optMemory 4) which use a customized densely connected layer. With optMemory 4, the largest 190layer DenseNetBC on CIFAR can be trained on a single NVIDIA TitanX GPU (uses 8.3G of 12G) instead of fully using four GPUs with the standard (recursive concatenation) implementation .
More details about the memory efficient implementation are discussed here.
Results on CIFAR
The table below shows the results of DenseNets on CIFAR datasets. The "+" mark at the end denotes for standard data augmentation (random crop after zeropadding, and horizontal flip). For a DenseNet model, L denotes its depth and k denotes its growth rate. On CIFAR10 and CIFAR100 without data augmentation, a Dropout layer with drop rate 0.2 is introduced after each convolutional layer except the very first one.
Model  Parameters  CIFAR10  CIFAR10+  CIFAR100  CIFAR100+ 

DenseNet (L=40, k=12)  1.0M  7.00  5.24  27.55  24.42 
DenseNet (L=100, k=12)  7.0M  5.77  4.10  23.79  20.20 
DenseNet (L=100, k=24)  27.2M  5.83  3.74  23.42  19.25 
DenseNetBC (L=100, k=12)  0.8M  5.92  4.51  24.15  22.27 
DenseNetBC (L=250, k=24)  15.3M  5.19  3.62  19.64  17.60 
DenseNetBC (L=190, k=40)  25.6M    3.46    17.18 
Results on ImageNet and Pretrained Models
Torch
Models in the original paper
The Torch models are trained under the same setting as in fb.resnet.torch. The error rates shown are 224x224 1crop test errors.
Network  Top1 error  Torch Model 

DenseNet121 (k=32)  25.0  Download (64.5MB) 
DenseNet169 (k=32)  23.6  Download (114.4MB) 
DenseNet201 (k=32)  22.5  Download (161.8MB) 
DenseNet161 (k=48)  22.2  Download (230.8MB) 
Models in the tech report
More accurate models trained with the memory efficient implementation in the technical report.
Network  Top1 error  Torch Model 

DenseNet264 (k=32)  22.1  Download (256MB) 
DenseNet232 (k=48)  21.2  Download (426MB) 
DenseNetcosine264 (k=32)  21.6  Download (256MB) 
DenseNetcosine264 (k=48)  20.4  Download (557MB) 
Caffe
https://github.com/shicai/DenseNetCaffe.
PyTorch
PyTorch documentation on models. We would like to thank @gpleiss for this nice work in PyTorch.
Keras, Tensorflow and Theano
https://github.com/flyyufelix/DenseNetKeras.
MXNet
https://github.com/miraclewkf/DenseNet.
WideDenseNet for better Time/Accuracy and Memory/Accuracy Tradeoff
If you use DenseNet as a model in your learning task, to reduce the memory and time consumption, we recommend use a wide and shallow DenseNet, following the strategy of wide residual networks. To obtain a wide DenseNet we set the depth to be smaller (e.g., L=40) and the growthRate to be larger (e.g., k=48).
We test a set of WideDenseNetBCs and compared the memory and time with the DenseNetBC (L=100, k=12) shown above. We obtained the statistics using a single TITAN X card, with batch size 64, and without any memory optimization.
Model  Parameters  CIFAR10+  CIFAR100+  Time per Iteration  Memory 

DenseNetBC (L=100, k=12)  0.8M  4.51  22.27  0.156s  5452MB 
WideDenseNetBC (L=40, k=36)  1.5M  4.58  22.30  0.130s  4008MB 
WideDenseNetBC (L=40, k=48)  2.7M  3.99  20.29  0.165s  5245MB 
WideDenseNetBC (L=40, k=60)  4.3M  4.01  19.99  0.223s  6508MB 
Obersevations:
 WideDenseNetBC (L=40, k=36) uses less memory/time while achieves about the same accuracy as DenseNetBC (L=100, k=12).
 WideDenseNetBC (L=40, k=48) uses about the same memory/time as DenseNetBC (L=100, k=12), while is much more accurate.
Thus, for practical use, we suggest picking one model from those WideDenseNetBCs.
Updates
08/23/2017:
 Add supporting code, so one can simply git clone and run.
06/06/2017:

Support ultra memory efficient training of DenseNet with customized densely connected layer.

Support memory efficient training of DenseNet with standard densely connected layer (recursive concatenation) by fixing the shareGradInput function.
05/17/2017:
 Add WideDenseNet.
 Add keras, tf, theano link for pretrained models.
04/20/2017:
 Add usage of models in PyTorch.
03/29/2017:
 Add the code for imagenet training.
12/03/2016:
 Add Imagenet results and pretrained models.
 Add DenseNetBC structures.
Contact
liuzhuangthu at gmail.com
gh349 at cornell.edu
Any discussions, suggestions and questions are welcome!