Counting 3,663 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

tensorflow

123617

An Open Source Machine Learning Framework for Everyone

pachyderm

3546

Containerized Data Analytics

norikra

360

Schemaless Stream Processing (Complex Event Processing) Server with SQL

Gaffer

1509

A large-scale entity and relation database supporting very large graphs containing rich, aggregated properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet.

elasticsearch

39383

Open Source, Distributed, RESTful Search Engine

keras

39371

Deep Learning for humans

storm

5602

Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation

tidb

17846

TiDB is a distributed HTAP database compatible with the MySQL protocol

incubator-hawq

471

A Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop

gpdb

3078

Greenplum Database

pinot

2205

A realtime distributed OLAP datastore

spark

21065

Spark is a fast and general cluster computing system for Big Data

geode

1502

Apache Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures

pai

841

A platform for cluster management and resource scheduling for AI that incorporates the mature design with a proven track record in Microsoft's large scale production environment

vitess

7685

Vitess is a database clustering system for horizontal scaling of MySQL.

vespa

2762

An engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.

onyx

1856

Distributed, masterless, high performance, fault tolerant data processing

caffe

27580

Caffe: a fast open framework for deep learning.

flink

7980

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities

luigi

11164

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

PyTorch

26100

A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

incubator-airflow

11557

Airflow is a platform to programmatically author, schedule and monitor workflows

root

774

A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++ but integrated with other languages such as Python and R.

ncnn

5841

A high-performance neural network inference framework optimized for the mobile platform

horovod

5733

Distributed training framework for TensorFlow.

horovod

5733

Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet.

calcite

1104

Apache Calcite is a dynamic data management framework.

ParlAI

4304

A framework for training and evaluating AI models on a variety of openly available dialog datasets.

Paddle

8321

PArallel Distributed Deep LEarning

beringei

2838

Beringei is a high performance, in-memory storage engine for time series data.