Python: 3.6 Tensorflow: 1.6 License: MIT

Author: Han Xiao

A collection of frequently-used deep learning blocks I have implemented in Tensorflow. It covers the core tasks in NLP such as embedding, encoding, matching and pooling. All implementations follow a modularized design pattern which I called the "block-design". More details can be found in my blog post.


  • Python >= 3.6
  • Tensorflow >= 1.6


A collection of sequence encoding blocks. Input is a sequence with shape of [B, L, D], output is another sequence in [B, L, D'], where B is batch size, L is the length of the sequence and D and D' are the dimensions.

Name Dependencies Description Reference
LSTM_encode a fast multi-layer bidirectional LSTM implementation based on CudnnLSTM. Expect to be 5~10x faster than the standard tf LSTMCell. However, it can only run on GPU. Tensorflow doc on CudnnLSTM
TCN_encode Res_DualCNN_encode a temporal convolution network described in the paper, basically a multi-layer dilated CNN with special padding to ensure the causality An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Res_DualCNN_encode CNN_encode a sub-block used by TCN_encode. It is a two-layer CNN with spatial dropout in-between, then followed by a residual connection and a layer-norm. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
CNN_encode a standard conv1d implementation on L axis, with the possibility to set different paddings Convolutional Neural Networks for Sentence Classification

A collection of sequence matching blocks, aka. attention. Input are two sequnces: context in the shape of [B, L_c, D], and query in the shape of [B, L_q, D]. The output is a sequence has the same length as context, i.e. with shape of [B, L_c, D]. Each position in the output should encodes the relevance of that position in context to the complete query.

Name Dependencies Description Reference
Attentive_match basic attention mechanism with different scoring functions, also supports future blinding. additive: Neural machine translation by jointly learning to align and translate; scaled: Attention is all you need
Transformer_match a multi-head attention block from "Attention is all you need" Attention is all you need
AttentiveCNN_match Attentive_match the light version of attentive convolution, with the possibility of future blinding to ensure causality. Attentive Convolution
BiDaf_match attention flow layer used in bidaf model. Bidirectional Attention Flow for Machine Comprehension

A collection of pooling blocks. It fuses/reduces on the time axis L. Input is a sequence with shape of [B, L, D], output is in [B, D].

Name Dependencies Description Reference
SWEM_pool do pooling on the input sequence, supports max/avg. pooling, hierarchical avg. max pooling. Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

There are also some convolution-based pooling blocks build on SWEM_pool, but they are for experimental purpose. Thus, I will not list them here.

A collection of positional encoding on the sequence.

Name Dependencies Description Reference
SinusPositional_embed generate a sinusoid signal that has the same length of the input sequence Attention is all you need
Positional_embed parameterize the absolute position of the tokens in the input sequence A Convolutional Encoder Model for Neural Machine Translation

A collection of multi-task learning blocks. So far only the "cross-stitch block" is available.

Name Dependencies Description Reference
CrossStitch a cross-stitch block, modeling the correlation & self-correlation of two tasks Cross-stitch Networks for Multi-task Learning
Stack_CrossStitch CrossStitch stacking multiple cross-stitch blocks together with shared/separated input Cross-stitch Networks for Multi-task Learning

A collection of auxiliary functions, e.g. masking, normalizing, slicing.


Run for a simple test on toy data.