Counting 1,477 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

(tl;dr) 1M iterations checkpoint file | Released under MIT License

word_counts.txt (at this repository)

model.ckpt-1000000.index (at this repository. Place it in the same folder as the model.)

Show and Tell : A Neural Image Caption Generator

Pretrained model for Tensorflow implementation found at tensorflow/models of the image-to-text paper described at:

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

IEEE transactions on pattern analysis and machine intelligence (2016).

Full text available at:

Generating Captions


  1. Follow the steps at im2txt to clone the repository, install bazel, etc.
  2. Download the model checkpoint: 1M iterations checkpoint file | Released under MIT License
  3. Clone the repository: git clone
# Path to checkpoint file.
# Notice there's no data-00000-of-00001 in the CHECKPOINT_PATH environment variable
# Also make sure you place model.ckpt-1000000.index (which is cloned from the repository)
in the same location as

# Vocabulary file generated by the preprocessing script.
# Since the tokenizer could be of a different version, use the word_counts.txt file supplied. 

# JPEG image file to caption.

# Build the inference binary.
bazel build -c opt im2txt/run_inference

# Run inference to generate captions.
bazel-bin/im2txt/run_inference \
  --checkpoint_path=${CHECKPOINT_PATH} \
  --vocab_file=${VOCAB_FILE} \