Counting 3,146 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

beringei

2729

Beringei is a high performance, in-memory storage engine for time series data.

lab

5320

A customisable 3D platform for agent-based AI research

drill

1035

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems

incubator-impala

201

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters

LightGBM

6567

A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms

ncnn

4683

A high-performance neural network inference framework optimized for the mobile platform

chainer

4131

A flexible framework of neural networks for deep learning

gpdb

2714

Greenplum Database

horovod

3773

Distributed training framework for TensorFlow.

keras

33776

Deep Learning for humans

Caffe2

8293

A lightweight, modular, and scalable deep learning framework.

elasticsearch

34595

Open Source, Distributed, RESTful Search Engine

skale-engine

295

High performance distributed data processing engine

mace

2494

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

vitess

6574

Vitess is a database clustering system for horizontal scaling of MySQL.

pai

473

A platform for cluster management and resource scheduling for AI that incorporates the mature design with a proven track record in Microsoft's large scale production environment

luigi

10079

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

onyx

1785

Distributed, masterless, high performance, fault tolerant data processing

mlflow

2198

Open source platform for the complete machine learning lifecycle

Paddle

7550

PArallel Distributed Deep LEarning

kylo

510

A data lake management software platform and framework for enabling scalable enterprise-class data lakes on Apache Hadoop and Spark

bookkeeper

412

A scalable, fault tolerant and low latency storage service optimized for append-only workloads.

ClickHouse

4761

ClickHouse is a free analytic DBMS for big data.

PyTorch

18927

A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

spark

18874

Spark is a fast and general cluster computing system for Big Data

pachyderm

3069

Containerized Data Analytics

presto

8112

Distributed SQL query engine for big data

incubator-airflow

9441

Airflow is a platform to programmatically author, schedule and monitor workflows

calcite

836

Apache Calcite is a dynamic data management framework.

hive

2113

The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL