Counting 3,301 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

PyTorch

21207

A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system

incubator-metron

540

Metron integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis

spark

19476

Spark is a fast and general cluster computing system for Big Data

kafka

10119

Kafka™ is used for building real-time data pipelines and streaming apps

ClickHouse

5466

ClickHouse is a free analytic DBMS for big data.

storm

5399

Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation

LightGBM

7167

A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms

cassandra

4791

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key

keras

35372

Deep Learning for humans

horovod

4266

Distributed training framework for TensorFlow.

ignite

2082

The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

alluxio

3747

Alluxio, formerly Tachyon, A Virtual Distributed Storage at Memory Speed

caffe

26188

Caffe: a fast open framework for deep learning.

root

687

A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++ but integrated with other languages such as Python and R.

stroom

89

Stroom is a highly scalable data storage, processing and analysis platform.

zookeeper

5340

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

lucene-solr

2063

Apache Solr is a search engine server that uses Apache Lucene

hadoop

8187

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware

plaidml

1557

PlaidML is a framework for making deep learning work everywhere.

CNTK

15387

Microsoft Cognitive Toolkit (CNTK)

ncnn

4957

A high-performance neural network inference framework optimized for the mobile platform

incubator-airflow

9900

Airflow is a platform to programmatically author, schedule and monitor workflows

hive

2199

The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL

arrow

2689

Arrow is a set of technologies that enable big-data systems to process and move data fast

sqoop

481

Sqoop allows easy imports and exports of data sets between databases and HDFS

flink

4911

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities

hbase

2360

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable

Lasagne

3546

Lightweight library to build and train neural networks in Theano

citus

3051

Scalable PostgreSQL for multi-tenant and real-time workloads

lab

5419

A customisable 3D platform for agent-based AI research