Bottom-Up and Top-Down Attention for Visual Question Answering
An tensorflow implementation of the winning entry of the 2017 VQA Challenge.
The model details are in "Tips and Tricks for Visual
Question Answering: Learnings from the 2017 Challenge"
This implementation is motivated from pytorch implementation link
This codes are collaboarted with vaicarran.
- This code do not use visual-genome dataset fore pretraining.
- More number of hidden neurons are used than original paper (512 >> 1024)
- batch normalization is used in classifier.
I checked the final results and it can be differ whether early-stopping is used.
|Model||Validation Accuracy||Training Time|
|Reported Model||63.15||12 - 18 hours (Tesla K40)|
|TF Model||61~64||< 1 hours (Tesla P40)|
Proposed Model (in paper)
Implemented Graph (tensorboard)
Learning curve (score)
- ./dataset.py : data preprocessing and tensorflow dataset modules.
- ./models/ops.py : tensorflow operation warpper.
- ./models/vqa_model.py : model class
- ./models/language_model.py : word embedding and question embedding
- ./models/top_down_attention.py : proposed top-down-attention module
Make sure you are on a machine with a NVIDIA GPU and Python 3 with about 100 GB disk space.
This code needs more memory than pytorch version (70~80 GB).
Some issues are resolving to increase memory efficiency.
If you resolve, i always welcome PR.
- tensorflow 1.13.0 - h5py
Data download and preprocessing module is from original_repo and [pytorch_repo]((https://github.com/hengyuan-hu/bottom-up-attention-vqa)
Make sure your dataset should be downloaded to
>> mkdir data >> sh tools/download.sh >> sh tools/process.sh
For various hyperparameter setting, refer the arguments in
>> python main.py
If you want to visualize your result and graph
>> tensorboard --logdir='./tensorboard' (--ip=YOUR_IP) (--port=YOUR_PORT)
Evaluation on validation data is available at every epoch.
Also early-stpping is enable to prevent from overfitting.
However, because of memory issue, the code in
main.py is annotated.