Counting 3,834 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Last Commit
May. 13, 2019
Nov. 23, 2017


A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.


  • Install python 3
  • Install pytorch == 0.2.0
  • Install requirements:
    pip install -r requirements.txt


I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred for the preprocessing code.

File description

  • includes all hyper parameters that are needed.
  • loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
  • contains all methods, including CBHG, highway, prenet, and so on.
  • contains networks including encoder, decoder and post-processing network.
  • is for training.
  • is for generating TTS sample.

Training the network

  • STEP 1. Download and extract LJSpeech data at any directory you want.
  • STEP 2. Adjust hyperparameters in, especially 'data_path' which is a directory that you extract files, and the others if necessary.
  • STEP 3. Run

Generate TTS wav file

  • STEP 1. Run Make sure the restore step.


  • You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.



  • Any comments for the codes are always welcome.