Open Source, Distributed, RESTful Search Engine
Containerized Data Analytics
Universal data ingestion framework for Hadoop.
Siddhi CEP is a lightweight, easy-to-use Open Source Complex Event Processing Engine (CEP) under Apache Software License v20
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines
Acceleration package for neural networks on multi-core CPUs
Vitess is a database clustering system for horizontal scaling of MySQL.
Distributed SQL query engine for big data
Let’s Big Data. Hue is an open source Web interface for analyzing data with Hadoop and Spark.
A realtime distributed OLAP datastore
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
Caffe: a fast open framework for deep learning.
Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation
The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities
Airflow is a platform to programmatically author, schedule and monitor workflows
Apache Hadoop is a framework for running applications on large cluster built of commodity hardware
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Column oriented distributed data store ideal for powering interactive applications
A Python package to manage extremely large amounts of data
Spark is a fast and general cluster computing system for Big Data
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable
Base classes to use when writing tests with Spark
Microsoft Cognitive Toolkit (CNTK)
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks
Kafka™ is used for building real-time data pipelines and streaming apps
Bare bones Python implementations of some of the foundational Machine Learning models and algorithms.
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.
Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key