Counting 2,870 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

CaffeOnSpark

1238

CaffeOnSpark brings deep learning to Hadoop and Spark clusters

algebird

1673

Abstract Algebra for Scala

deeplearning4j

9137

Deep Learning for Java, Scala & Clojure on Hadoop & Spark With GPUs - From Skymind

elassandra

945

Elassandra = cassandra + elasticsearch

fastText

14511

Library for fast text representation and classification.

zipline

7172

Zipline, a Pythonic Algorithmic Trading Library

xgboost

12391

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow

smile

3836

Statistical Machine Intelligence & Learning Engine

h2o-3

3155

Open Source Fast Scalable Machine Learning API For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA...)

mldb

433

MLDB is the Machine Learning Database

dask

3006

Versatile parallel programming with task scheduling

librdkafka

1981

The Apache Kafka C/C++ library

persistent-rnn

539

Fast Recurrent Networks Library

warp-ctc

3256

Fast parallel CTC.

thrift

4947

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between multiple languages

distributed

560

Distributed computation in Python

incubator-systemml

688

SystemML is a flexible, scalable machine learning system

pagmo

193

A C++ / Python platform to perform parallel computations of optimisation tasks (global and local) via the asynchronous generalized island model. State of the art optimization algorithms are included. A common interface is provided to other optimization frameworks/algorithms such as NLOPT, SciPy, SNOPT, IPOPT, GSL

machinelearning

2743

A cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers

thunder

705

scalable analysis of images and time series

avsc

563

Avro for JavaScript :zap:

incubator-predictionio

11325

PredictionIO, a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray.

aerosolve

4375

A machine learning package built for humans.

swift

2540

Swift for TensorFlow documentation repository.

sod

501

An Embedded, Modern Computer Vision & Machine Learning Library

sketches-core

505

Core Sketch Library.

translate

248

A PyTorch library for machine translation that provides training for sequence-to-sequence models

incubator-beam

1924

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines

ml5-library

1140

Friendly machine learning for the web!

jargon

56

Tokenizers and lemmatizers for Go