Counting 1,868 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

crate

1809

A distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

grakn

605

A Hyper-Relational Database for Knowledge-Oriented System

universe

6113

Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.

hadoop

4024

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware

luigi

7887

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

incubator-airflow

6257

Airflow is a platform to programmatically author, schedule and monitor workflows

hbase

1520

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable

hive

1550

The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL

distributedlog

2177

A high performance replicated log service. (The development is moved to Apache Incubator)

neon

3254

Fast, scalable, easy-to-use Python based Deep Learning Framework by Nervana™

spark

14694

Spark is a fast and general cluster computing system for Big Data

elasticsearch

25875

Open Source, Distributed, RESTful Search Engine

caffe

20743

Caffe: a fast open framework for deep learning.

vitess

5071

Vitess is a database clustering system for horizontal scaling of MySQL.

vespa

2229

An engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.

CNTK

12746

Microsoft Cognitive Toolkit (CNTK)

Theano

7075

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

chainer

3039

A flexible framework of neural networks for deep learning

chainer

3039

A flexible framework of neural networks for deep learning

druid

5561

Column oriented distributed data store ideal for powering interactive applications

alluxio

3174

Alluxio, formerly Tachyon, A Virtual Distributed Storage at Memory Speed

Paddle

5586

PArallel Distributed Deep LEarning

LightGBM

3926

A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms

lab

4190

A customisable 3D platform for agent-based AI research

arrow

1305

Arrow is a set of technologies that enable big-data systems to process and move data fast

pinot

1636

A realtime distributed OLAP datastore

presto

6593

Distributed SQL query engine for big data

Caffe2

6040

A lightweight, modular, and scalable deep learning framework.

incubator-hawq

290

A Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop

storm

4552

Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation