Counting 3,463 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

parquet-mr

746

Parquet-MR contains the java implementation of the Parquet format

kylin

1897

Apache Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets

incubator-myriad

139

Deploy Apache YARN Applications Using Apache Mesos

lucene-solr

2257

Apache Solr is a search engine server that uses Apache Lucene

mesos

4023

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

spark

20348

Spark is a fast and general cluster computing system for Big Data

LightGBM

7653

A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms

flink

5958

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities

hive

2328

The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL

ignite

2254

The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

incubator-airflow

10769

Airflow is a platform to programmatically author, schedule and monitor workflows

cassandra

4916

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key

kafka

10903

Kafka™ is used for building real-time data pipelines and streaming apps

calcite

1004

Apache Calcite is a dynamic data management framework.

hbase

2520

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable

accumulo

540

Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval

arrow

3170

Arrow is a set of technologies that enable big-data systems to process and move data fast

caffe

26896

Caffe: a fast open framework for deep learning.

hadoop

8488

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware

snappydata

831

SnappyData: OLTP + OLAP Database built on Apache Spark

neon

3715

Fast, scalable, easy-to-use Python based Deep Learning Framework by Nervana™

CNTK

15698

Microsoft Cognitive Toolkit (CNTK)

chainer

4490

A flexible framework of neural networks for deep learning

tensorflow

119164

An Open Source Machine Learning Framework for Everyone

storm

5520

Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation

tidb

17027

TiDB is a distributed HTAP database compatible with the MySQL protocol

pilosa

1569

An open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.

keras

37633

Deep Learning for humans

Caffe2

8406

A lightweight, modular, and scalable deep learning framework.

horovod

5050

Distributed training framework for TensorFlow.