Parquet-MR contains the java implementation of the Parquet format
Apache Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
Deploy Apache YARN Applications Using Apache Mesos
Apache Solr is a search engine server that uses Apache Lucene
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks
Spark is a fast and general cluster computing system for Big Data
A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities
The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL
The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
Airflow is a platform to programmatically author, schedule and monitor workflows
Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key
Kafka™ is used for building real-time data pipelines and streaming apps
Apache Calcite is a dynamic data management framework.
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable
Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval
Arrow is a set of technologies that enable big-data systems to process and move data fast
Caffe: a fast open framework for deep learning.
Apache Hadoop is a framework for running applications on large cluster built of commodity hardware
SnappyData: OLTP + OLAP Database built on Apache Spark
Fast, scalable, easy-to-use Python based Deep Learning Framework by Nervana™
Microsoft Cognitive Toolkit (CNTK)
A flexible framework of neural networks for deep learning
An Open Source Machine Learning Framework for Everyone
Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation
TiDB is a distributed HTAP database compatible with the MySQL protocol
An open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
Deep Learning for humans
A lightweight, modular, and scalable deep learning framework.
Distributed training framework for TensorFlow.