Counting 2,653 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1


Attention Transfer

PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer"
Conference paper at ICLR2017:

What's in this repo so far:

  • Activation-based AT code for CIFAR-10 experiments
  • Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)


  • grad-based AT
  • Scenes and CUB activation-based AT code
  • Pretrained with activation-based AT ResNet-18

The code uses PyTorch Note that the original experiments were done using torch-autograd, we have so far validated that CIFAR-10 experiments are exactly reproducible in PyTorch, and are in process of doing so for ImageNet (results are very slightly worse in PyTorch, due to hyperparameters).


    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {Paying More Attention to Attention: Improving the Performance of
             Convolutional Neural Networks via Attention Transfer},
    booktitle = {ICLR},
    url = {},
    year = {2017}}


First install PyTorch, then install torchnet:

pip install git+[email protected]

Then install OpenCV with Python bindings (e.g. conda install -c menpo opencv3), and other Python packages:

pip install -r requirements.txt



This section describes how to get the results in the table 1 of the paper.

First, train teachers:

python --save logs/resnet_40_1_teacher --depth 40 --width 1
python --save logs/resnet_16_2_teacher --depth 16 --width 2
python --save logs/resnet_40_2_teacher --depth 40 --width 2

To train with activation-based AT do:

python --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3

To train with KD:

python --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9

We plan to add AT+KD with decaying beta to get the best knowledge transfer results soon.


Pretrained model

We provide ResNet-18 pretrained model with activation based AT:

Model val error
ResNet-18 30.4, 10.8
ResNet-18-ResNet-34-AT 29.3, 10.0

Download link:

Model definition:

Convergence plot:

Train from scratch

Download pretrained weights for ResNet-34 (see also functional-zoo for more information):


Prepare the data following fb.resnet.torch and run training (e.g. using 2 GPUs):

python --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
                   --teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
                   --beta 1e+3