Author: Han Xiao https://hanxiao.github.io
A collection of frequently-used deep learning blocks I have implemented in Tensorflow. It covers the core tasks in NLP such as embedding, encoding, matching and pooling. All implementations follow a modularized design pattern which I called the "block-design". More details can be found in my blog post.
- Python >= 3.6
- Tensorflow >= 1.6
A collection of sequence encoding blocks. Input is a sequence with shape of
[B, L, D], output is another sequence in
[B, L, D'], where
B is batch size,
L is the length of the sequence and
D' are the dimensions.
||a fast multi-layer bidirectional LSTM implementation based on
||Tensorflow doc on
||a temporal convolution network described in the paper, basically a multi-layer dilated CNN with special padding to ensure the causality||An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling|
||a sub-block used by
||An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling|
||Convolutional Neural Networks for Sentence Classification|
A collection of sequence matching blocks, aka. attention. Input are two sequnces:
context in the shape of
[B, L_c, D], and
query in the shape of
[B, L_q, D]. The output is a sequence has the same length as
context, i.e. with shape of
[B, L_c, D]. Each position in the output should encodes the relevance of that position in
context to the complete
||basic attention mechanism with different scoring functions, also supports future blinding.||
||a multi-head attention block from "Attention is all you need"||Attention is all you need|
||the light version of attentive convolution, with the possibility of future blinding to ensure causality.||Attentive Convolution|
||attention flow layer used in bidaf model.||Bidirectional Attention Flow for Machine Comprehension|
A collection of pooling blocks. It fuses/reduces on the time axis
L. Input is a sequence with shape of
[B, L, D], output is in
||do pooling on the input sequence, supports max/avg. pooling, hierarchical avg. max pooling.||Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms|
There are also some convolution-based pooling blocks build on
SWEM_pool, but they are for experimental purpose. Thus, I will not list them here.
A collection of positional encoding on the sequence.
||generate a sinusoid signal that has the same length of the input sequence||Attention is all you need|
||parameterize the absolute position of the tokens in the input sequence||A Convolutional Encoder Model for Neural Machine Translation|
A collection of multi-task learning blocks. So far only the "cross-stitch block" is available.
||a cross-stitch block, modeling the correlation & self-correlation of two tasks||Cross-stitch Networks for Multi-task Learning|
||stacking multiple cross-stitch blocks together with shared/separated input||Cross-stitch Networks for Multi-task Learning|
A collection of auxiliary functions, e.g. masking, normalizing, slicing.
app.py for a simple test on toy data.