|Build Status||Code Coverage|
Powering In-Memory Analytics
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
- The Arrow Columnar In-Memory Format
- C++ libraries
- C bindings using GLib
- Go libraries
- Java libraries
- Plasma Object Store: a shared-memory blob store, part of the C++ codebase
- Python libraries
- Ruby libraries
- Rust libraries
What's in the Arrow libraries?
The reference Arrow libraries contain a number of distinct software components:
- Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
- Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
- Low-overhead IO interfaces to files on disk, HDFS (C++ only)
- Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:
- Join the mailing list: send an email to [email protected]. Share your ideas and use cases for the project.
- Follow our activity on JIRA
- Learn the format
- Contribute code to one of the reference implementations
How to Contribute
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the github.com/apache/arrow repository.
If you’d like to report a bug but don’t have time to fix it, you can still post it on JIRA, or email the mailing list [email protected]
To contribute a patch:
- Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
- Create a JIRA for your patch on the Arrow Project JIRA.
- Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/arrow/pull/240).
- Make sure that your code passes the unit tests. You can find instructions how to run the unit tests for each Arrow component in its respective README file.
- Add new unit tests for your code.
Thank you in advance for your contributions!