Counting 3,742 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

hbase

2727

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable

cassandra

5096

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key

peloton

388

Unified Resource Scheduler to co-schedule mixed types of workloads such as batch, stateless and stateful jobs in a single cluster for better resource utilization.

mace

3190

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

zookeeper

6166

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

hadoop

8875

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware

pilosa

1639

An open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.

incubator-metron

608

Metron integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis

incubator-hawq

471

A Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop

kylo

622

A data lake management software platform and framework for enabling scalable enterprise-class data lakes on Apache Hadoop and Spark

torchnet

971

Torch on steroids

aresdb

1872

A GPU-powered real-time analytics storage and query engine.

skale-engine

334

High performance distributed data processing engine

Gaffer

1512

A large-scale entity and relation database supporting very large graphs containing rich, aggregated properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet.

mapd-core

1851

The MapD Core database

cstore_fdw

1258

Columnar store for analytics with Postgres, developed by Citus Data

NSDb

37

A time-series database streaming oriented optimized for the serving layer.

citus

3415

Scalable PostgreSQL for multi-tenant and real-time workloads

horovod

6029

Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet.

Caffe2

8449

A lightweight, modular, and scalable deep learning framework.

storm

5641

Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation

incubator-impala

276

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters

impala

276

A distributed, parallel C++ query engine that lets you analyze, transform and combine data stored in Apache Hadoop clusters

ignite

2454

The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

bfs

2461

The Baidu File System.

apex-core

319

Apache Apex is a unified platform for big data stream and batch processing

neon

3755

Fast, scalable, easy-to-use Python based Deep Learning Framework by Nervana™

federated

356

A framework for implementing federated learning

vespa

2810

An engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.

Lasagne

3611

Lightweight library to build and train neural networks in Theano