Counting 2,153 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Caffe2

6527

A lightweight, modular, and scalable deep learning framework.

horovod

1116

Distributed training framework for TensorFlow.

mxnet

12411

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

mapd-core

1305

The MapD Core database

LightGBM

4369

A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms

wallaroo

562

Ultrafast and elastic data processing

luigi

8229

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

lab

4371

A customisable 3D platform for agent-based AI research

skale-engine

202

High performance distributed data processing engine

thrill

327

An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

vitess

5274

Vitess is a database clustering system for horizontal scaling of MySQL.

nupic

5321

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

ClickHouse

3289

ClickHouse is a free analytic DBMS for big data.

plaidml

727

PlaidML is a framework for making deep learning work everywhere.

hadoop

4538

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware

elasticsearch

27340

Open Source, Distributed, RESTful Search Engine

Paddle

6028

PArallel Distributed Deep LEarning

spark

15460

Spark is a fast and general cluster computing system for Big Data

druid

5824

Column oriented distributed data store ideal for powering interactive applications

kylo

308

A data lake management software platform and framework for enabling scalable enterprise-class data lakes on Apache Hadoop and Spark

stroom

68

Stroom is a highly scalable data storage, processing and analysis platform.

grakn

698

A Hyper-Relational Database for Knowledge-Oriented System

mesos

3457

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

alluxio

3258

Alluxio, formerly Tachyon, A Virtual Distributed Storage at Memory Speed

hue

2607

Let’s Big Data. Hue is an open source Web interface for analyzing data with Hadoop and Spark.

cassandra

4038

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key

presto

6859

Distributed SQL query engine for big data

flink

3044

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities

incubator-airflow

6719

Airflow is a platform to programmatically author, schedule and monitor workflows

caffe

21753

Caffe: a fast open framework for deep learning.