Hail is an open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data. Hail is used throughout academia and industry as the analytical engine for major studies, projects, and services, including the Genome Aggregation Database (gnomad.broadinstitute.org) and Neale lab mega-GWAS (nealelab.is/uk-biobank).
Unlike the Python and R scientific computing stacks, Hail:
- scales from laptop to large compute cluster or cloud, with the same code
- is designed to work with datasets that do not fit in memory
- has first-class support for multi-dimensional structured data, like genomic data as in this tutorial
Hail's methods are primarily written in Python, using primitives for distributed queries and linear algebra implemented in Scala, Spark, and increasingly C++. We welcome the scientific community to leverage Hail to develop, share, and apply new methods at scale!
See the homepage for more info on using Hail.
Hail is committed to open-source development. If you'd like to contribute to the development of methods or infrastructure, please:
- see the For Software Developers section of the installation guide for info on compiling Hail
- chat with us about development in our Zulip chatroom
- visit the Development Forum for longer-form discussions
Hail uses a continuous deployment approach to software development, which means we frequently add new features. We update users about changes to Hail via the Discussion Forum. We recommend creating an account on the Discussion Forum so that you can subscribe to these updates as well.
The Hail team is embedded in the Neale lab at the Stanley Center for Psychiatric Research of the Broad Institute of MIT and Harvard and the Analytic and Translational Genetics Unit of Massachusetts General Hospital.
Contact the Hail team at
Follow Hail on Twitter @hailgenetics.
If you use Hail for published work, please cite the software:
- Hail, https://github.com/hail-is/hail