Counting 2,899 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Author
Last Commit
Jun. 7, 2018
Created
May. 29, 2018

Scikit Learn Model Persistence via MsgPack

Introduction

Scikit learn suggests using Pickle to store model after training

There are known issues with this approach

  • security - pickle contains byte codes
  • maintainability - require same version of sklearn
  • slow - because it contains byte codes not only trained weights
UserWarning: Trying to unpickle estimator MLPClassifier from version 0.18 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.

Our approach

To persist a model instance, we construct a dictionary containing

  • keyword params used to construct the instance
  • the value of trainable parameters (ex. weights)
  • other needed instance properties

The we use MsgPack to store this dictionary

Supported classes

it's very easy to add more

Performance

We have seen more than 25x faster loading for MLPClassifier and 150x faster loading for TfidfVectorizer

And in terms of size, 7x smaller files for MLPClassifier and 50x smaller for TfidfVectorizer

Usage

import sklearn_msgpack
sklearn_msgpack.save_to_file('tmp.mpack', clf)
# ...
clf = sklearn_msgpack.load_from_file('tmp.mpack')