Counting 3,834 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Last Commit
Jun. 7, 2018
May. 29, 2018

Scikit Learn Model Persistence via MsgPack


Scikit learn suggests using Pickle to store model after training

There are known issues with this approach

  • security - pickle contains byte codes
  • maintainability - require same version of sklearn
  • slow - because it contains byte codes not only trained weights
UserWarning: Trying to unpickle estimator MLPClassifier from version 0.18 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.

Our approach

To persist a model instance, we construct a dictionary containing

  • keyword params used to construct the instance
  • the value of trainable parameters (ex. weights)
  • other needed instance properties

The we use MsgPack to store this dictionary

Supported classes

it's very easy to add more


We have seen more than 25x faster loading for MLPClassifier and 150x faster loading for TfidfVectorizer

And in terms of size, 7x smaller files for MLPClassifier and 50x smaller for TfidfVectorizer


import sklearn_msgpack
sklearn_msgpack.save_to_file('tmp.mpack', clf)
# ...
clf = sklearn_msgpack.load_from_file('tmp.mpack')