Counting 3,146 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Word-level language modeling RNN

This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the PTB dataset, provided. The trained model can then be used by the generate script to generate new text.

The model uses the nn.RNN module (and its sister modules nn.GRU and nn.LSTM) which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.

During training, if a keyboard interrupt (Ctrl-C) is received, training is stopped and the current model is evaluted against the test dataset.

The script accepts the following arguments:

optional arguments:
  -h, --help                 show this help message and exit
  --data DATA                location of the data corpus
  --model MODEL              type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
  --emsize EMSIZE            size of word embeddings
  --nhid NHID                humber of hidden units per layer
  --nlayers NLAYERS          number of layers
  --lr LR                    initial learning rate
  --clip CLIP                gradient clipping
  --epochs EPOCHS            upper epoch limit
  --batch-size N             batch size
  --bptt BPTT                sequence length
  --dropout DROPOUT          dropout applied to layers (0 = no dropout)
  --decay DECAY              learning rate decay per epoch
  --tied                     tie the word embedding and softmax weights
  --seed SEED                random seed
  --cuda                     use CUDA
  --log-interval N           report interval
  --result-path SAVE         path to save the final model
  --LAMBDA LAMBDA_VALUE      constant to multiply with center loss
  --ALPHA ALPHA_VALUE        learning rate to update embedding centroids

With these arguments, a variety of models can be tested. As an example, the following arguments produce slower but better models:

python --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied # Test perplexity of 72.78
python --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied --LAMBDA 10 # Test perplexity of 68.42
python --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied --LAMBDA 20 # Test perplexity of 68.87
python --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied --LAMBDA 40 # Test perplexity of 70.45

These perplexities are equal or better than Recurrent Neural Network Regularization (Zaremba et al. 2014) and are similar to Using the Output Embedding to Improve Language Models (Press & Wolf 2016 and Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016), though both of these papers have improved perplexities by using a form of recurrent dropout (variational dropout).