Counting 3,384 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

chainer

4382

A flexible framework of neural networks for deep learning

universe

6949

Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.

Knet.jl

756

The Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU operation and automatic differentiation using dynamic computational graphs for models defined in plain Julia.

citus

3106

Scalable PostgreSQL for multi-tenant and real-time workloads

vespa

2651

An engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.

incubator-hawq

445

A Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop

kafka-monitor

1084

Kafka Monitor is a framework to implement and execute long-running kafka system tests in a real cluster

kylin

1784

Apache Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets

Caffe2

8394

A lightweight, modular, and scalable deep learning framework.

samza

497

Apache Samza is a distributed stream processing framework

accumulo

419

Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval

mesos

3980

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks

ClickHouse

5695

ClickHouse is a free analytic DBMS for big data.

terrapin

166

Terrapin is a low latency serving system providing random access over large data sets, generated by Hadoop jobs and stored on HDFS clusters

lab

5472

A customisable 3D platform for agent-based AI research

zookeeper

5529

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services

incubator-myriad

137

Deploy Apache YARN Applications Using Apache Mesos

impala

225

A distributed, parallel C++ query engine that lets you analyze, transform and combine data stored in Apache Hadoop clusters

root

706

A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++ but integrated with other languages such as Python and R.

drill

1073

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems

gobblin

1449

Universal data ingestion framework for Hadoop.

spliceengine

103

The SpliceSQL Engine

parquet-mr

727

Parquet-MR contains the java implementation of the Parquet format

pilosa

1543

An open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.

pinot

2060

A realtime distributed OLAP datastore

cstore_fdw

1183

Columnar store for analytics with Postgres, developed by Citus Data

kudu

755

Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data

wallaroo

1224

Build and scale real-time data applications as easily as writing a Python script

camus

775

LinkedIn's previous generation Kafka to HDFS pipeline.

trafodion

178

A webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.