Counting 2,975 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

calcite

768

Apache Calcite is a dynamic data management framework.

Lasagne

3480

Lightweight library to build and train neural networks in Theano

spark

18083

Spark is a fast and general cluster computing system for Big Data

Caffe2

8159

A lightweight, modular, and scalable deep learning framework.

horovod

3164

Distributed training framework for TensorFlow.

arrow

2185

Arrow is a set of technologies that enable big-data systems to process and move data fast

ignite

1830

The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

wallaroo

1024

Ultrafast and elastic data processing

storm

5171

Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation

caffe

24874

Caffe: a fast open framework for deep learning.

curator

1097

Curator is a set of Java libraries that make using Apache ZooKeeper much easier

hadoop

7654

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware

kafka

8952

Kafka™ is used for building real-time data pipelines and streaming apps

snappydata

757

SnappyData: OLTP + OLAP Database built on Apache Spark

CNTK

14809

Microsoft Cognitive Toolkit (CNTK)

incubator-airflow

8692

Airflow is a platform to programmatically author, schedule and monitor workflows

nupic

5632

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

lab

5194

A customisable 3D platform for agent-based AI research

PyTorch

17142

A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

amazon-dsstne

4161

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

zookeeper

4738

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

bookkeeper

371

A scalable, fault tolerant and low latency storage service optimized for append-only workloads.

Paddle

7242

PArallel Distributed Deep LEarning

incubator-impala

176

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters

LightGBM

6028

A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms

mesos

3783

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

incubator-metron

468

Metron integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis

mapd-core

1524

The MapD Core database

hive

1978

The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL

accumulo

386

Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval