A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms
A python package that provides Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system
A data lake management software platform and framework for enabling scalable enterprise-class data lakes on Apache Hadoop and Spark
An engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.
A framework for training and evaluating AI models on a variety of openly available dialog datasets.
PArallel Distributed Deep LEarning
ClickHouse is a free analytic DBMS for big data.
Beringei is a high performance, in-memory storage engine for time series data.
Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems
The MapD Core database
A platform for cluster management and resource scheduling for AI that incorporates the mature design with a proven track record in Microsoft's large scale production environment
Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform
An open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
Let’s Big Data. Hue is an open source Web interface for analyzing data with Hadoop and Spark.
Open Source, Distributed, RESTful Search Engine
Containerized Data Analytics
Schemaless Stream Processing (Complex Event Processing) Server with SQL
Distributed, masterless, high performance, fault tolerant data processing
Infinispan is an open source data grid platform and highly scalable NoSQL cloud data store.
Airflow is a platform to programmatically author, schedule and monitor workflows
Apache Solr is a search engine server that uses Apache Lucene
The Baidu File System.
Apache Hadoop is a framework for running applications on large cluster built of commodity hardware
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
The Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
Vitess is a database clustering system for horizontal scaling of MySQL.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
A realtime distributed OLAP datastore