Counting 2,784 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Last Commit
May. 25, 2018
Jun. 17, 2017

A TensorFlow Implementation of the Transformer: Attention Is All You Need


  • NumPy >= 1.11.1
  • TensorFlow >= 1.2 (Probably 1.1 should work, too, though I didn't test it)
  • regex
  • nltk

Why This Project?

I tried to implement the idea in Attention Is All You Need. They authors claimed that their model, the Transformer, outperformed the state-of-the-art one in machine translation with only attention, no CNNs, no RNNs. How cool it is! At the end of the paper, they promise they will make their code available soon, but apparently it is not so yet. I have two goals with this project. One is I wanted to have a full understanding of the paper. Often it's hard for me to have a good grasp before writing some code for it. Another is to share my code with people who are interested in this model before the official code is unveiled.

Differences with the original paper

I don't intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are

  • I used the IWSLT 2016 de-en dataset, not the wmt dataset because the former is much smaller, and requires no special preprocessing.
  • I constructed vocabulary with words, not subwords for simplicity. Of course, you can try bpe or word-piece if you want.
  • I parameterized positional encoding. The paper used some sinusoidal formula, but Noam, one of the authors, says they both work. See the discussion in reddit
  • The paper adjusted the learning rate to global steps. I fixed the learning to a small number, 0.0001 simply because training was reasonably fast enough with the small dataset (Only a couple of hours on a single GTX 1060!!).

File description

  • includes all hyper parameters that are needed.
  • creates vocabulary files for the source and the target.
  • contains functions regarding loading and batching data.
  • has all building blocks for encoder/decoder networks.
  • has the model.
  • is for evaluation.


wget -qO- --show-progress | tar xz; mv de-en corpora
  • STEP 2. Adjust hyper parameters in if necessary.
  • STEP 3. Run to generate vocabulary files to the preprocessed folder.
  • STEP 4. Run or download the pretrained files.

Training Loss and Accuracy

  • Training Loss

  • Training Accuracy


  • Run


I got a BLEU score of 17.14. (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results folder.

source: Sie war eine jährige Frau namens Alex
expected: She was a yearold woman named Alex
got: She was a woman named yearold name

source: Und als ich das hörte war ich erleichtert
expected: Now when I heard this I was so relieved
got: And when I heard that I was an

source: Meine Kommilitonin bekam nämlich einen Brandstifter als ersten Patienten
expected: My classmate got an arsonist for her first client
got: Because my first came from an in patients

source: Das kriege ich hin dachte ich mir
expected: This I thought I could handle
got: I'll go ahead and I thought

source: Aber ich habe es nicht hingekriegt
expected: But I didn't handle it
got: But I didn't it

source: Ich hielt dagegen
expected: I pushed back
got: I thought about it

source: Das ist es was Psychologen einen AhaMoment nennen
expected: That's what psychologists call an Aha moment
got: That's what a like a

source: Meldet euch wenn ihr in euren ern seid
expected: Raise your hand if you're in your s
got: Get yourself in your s

source: Ich möchte ein paar von euch sehen
expected: I really want to see some twentysomethings here
got: I want to see some of you

source: Oh yeah Ihr seid alle unglaublich
expected: Oh yay Y'all's awesome
got: Oh yeah you all are incredibly

source: Dies ist nicht meine Meinung Das sind Fakten
expected: This is not my opinion These are the facts
got: This is not my opinion These are facts