Learning Approximate Inference Networks for Structured Prediction
First prepare the dataset figment. http://cistern.cis.lmu.de/figment/
Download entity dataset, entity embeddings (around 2gb) into data/figment. Be sure to unzip entity dataset.
Prepare the bibtex dataset. http://mulan.sourceforge.net/datasets.html
Download the bookmarks dataset from here.
Place it in
data/bookmarks. There is no need to run any proprocessing script.
python -m mains.infnet --config configs/figment.json
python -m mains.infnet --config configs/bibtex.json
python -m mains.infnet --config configs/bookmarks.json
base/: Contains the base model and base trainer. The model and trainer are inherited from here.
configs/: Contains the configuration files stored in json format. All hyper-parameters are stored here.
data/: Contains scripts to process the data files and store them into pickle format
data_loader/: Contains the class DataGenerator which is used to get data from the pipeline. Since most of our models are small, a naive implementation was fine. In case of bigger datasets, it might be worth looking into the tensorflow dataset api.
mains/: Contains the main file to be called which is
infnet.pyThis takes in the configuration file, and uses it to initiliaze which model, trainer, hyper-parameteres to choose, which parameters to save for tensorboard etc.
models/: Contains model definitions, each of which is a class. There are 4 such classes. EnergyNet, InferenceNet, FeatureNet, Spen. The first three are simple feed forward networks, and the last one is the actual model which is used and combines all the different networks together.
trainers/: Contains the trainer, which schedules the training, evaluation, tensorboard logging among different things.
utils/: Contains utility function like the process_config which is used to parse the configuration file.
run.py: are all used for hyper-parameter tuning.
Config File Information
exp_name: Name of the experiment
data: Contains info about the data
dataset: Name of the dataset
data_dir: Path for the top directory of data
splits: Splits for train, validation, test.
embeddings: True/False. True if pre-trained embeddings are available.
vocab: Same as above
data_generator: Name of the data generator defined in
tensorboard_train: Set True to save tensorboard in the stage2
tensorboard_infer: Same as above in stage3
feature_size: Size of hidden layer in Feature Network
label_measurements: Same as above in Energy Network
type_vocab_size: Number of output labels.
entities_vocab_size: Lookup table for embeddings.
embeddings_tune: Set to true if embeddings vector to be updated.
max_to_keep: Requiredby tensorflow saver that will be used in saving the checkpoints.
num_epochs: Number of epochs in each stage
train: Info about how to train
diff_type: \nabla operator in the paper
batch_size: batch size for training
state_size: Not required. Kept for historical reasons.
hidden_units: Hidden units in inference and feature net (depends on embeddings is true or false)
lr_*: learning rate for optimization of corresponding variable
lambda_*: lambda regularization for optimization of corresponding variable
lambda_pretrain_bias: How much to weigh the pretrained network (another term in paper).
wgan_mode: Improved WGAN penalty or not.
lamb_wgan: regularization for wgan penalty.
ssvm: Implementation of SPEN 2016 by Belanger. Not complete since we couldn't find implementation of entropic gradient descent.
enable: To be ssvm or not to be
steps: Number of optimization steps in ssvm
eval: True if ssvm inference to be used.
lr_inference: learning rate for ssvm inference
eval_print: What all to print for evaluation
f1: f1 score
pretrain: energy / loss of pretrain
infnet: energy / loss of inference network
f1_score_mode: Set to examples to compute F1 score averaged over examples. Do label for F1 score averaged over labels. The paper does it over examples.
threshold: Threshold adjusted on validation set
time_taken: For time evaluation. Only training/inference step time. Not whole time.
A major thanks to Lifu Tu and Kevin Gimpel (authors of the paper we have implemented) for sharing their Theano code and responding promptly to our queries on the paper. We thank Lifu for sharing his Theano Code. We also thank David Belanger for the Bookmarks dataset and his original SPEN implementation.
We also thank the authors of tensorflow template https://github.com/MrGemy95/Tensorflow-Project-Template which served as a starting point for our project.
Lifu Tu and Kevin Gimpel. Learning approximate inference networks for structured prediction. CoRR, abs/1803.03376, 2018. URL http://arxiv.org/abs/1803.03376