DeepChem aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.
Table of contents:
- Getting Started
- Contributing to DeepChem
- DeepChem Publications
- Corporate Supporters
- About Us
deepchem currently supports both Python 2.7 and Python 3.5, and is supported on 64 bit Linux and Mac OSX. Please make sure you follow the directions below precisely. While you may already have system versions of some of our dependencies, there is no guarantee that
deepchem will work with alternate versions than those specified below.
Note that when using Ubuntu 16.04 server or similar environments, you may need to ensure libxrender is provided via e.g.:
sudo apt-get install -y libxrender-dev
Using a conda environment
You can install deepchem in a new conda environment using the conda commands in scripts/install_deepchem_conda.sh Installing via this script will ensure that you are installing from the source.
git clone https://github.com/deepchem/deepchem.git # Clone deepchem source code from GitHub cd deepchem
If you don't want GPU support:
bash scripts/install_deepchem_conda.sh deepchem # If you don't want GPU support
If you want GPU support:
gpu=1 bash scripts/install_deepchem_conda.sh deepchem # If you want GPU support
gpu=0 bash scripts/install_deepchem_conda.sh deepchem will also install CPU supported
source activate deepchem python setup.py install # Manual install nosetests -a '!slow' -v deepchem --nologcapture # Run tests
This creates a new conda environment
deepchem and installs in it the dependencies that
are needed. To access it, use the
conda activate deepchem command (if your conda version >= 4.4) and use
source activate deepchem command (if your conda version < 4.4).
Easy Install via Conda
conda install -c deepchem -c rdkit -c conda-forge -c omnia deepchem=2.1.0
Easy Install installs the latest stable version of
deepchem and does not install from source. If you need to install from source make sure you follow the steps here.
Using a Docker Image
Using a docker image requires an NVIDIA GPU. If you do not have a GPU please follow the directions for using a conda environment In order to get GPU support you will have to use the nvidia-docker plugin.
# This will the download the latest stable deepchem docker image into your images docker pull deepchemio/deepchem # This will create a container out of our latest image with GPU support nvidia-docker run -i -t deepchemio/deepchem # You are now in a docker container whose python has deepchem installed # For example you can run our tox21 benchmark cd deepchem/examples python benchmark.py -d tox21 # Or you can start playing with it in the command line pip install jupyter ipython import deepchem as dc
Question: I'm seeing some failures in my test suite having to do with MKL
Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.
Answer: This is a general issue with the newest version of
scikit-learnenabling MKL by default. This doesn't play well with many linux systems. See BVLC/caffe#3884 for discussions. The following seems to fix the issue
conda install nomkl numpy scipy scikit-learn numexpr conda remove mkl mkl-service
Afterwards you can go through other tutorials, and look through our examples in the
examples directory. To apply
deepchem to a new problem, try starting from one of the existing examples or tutorials and modifying it step by step to work with your new use-case. If you have questions or comments you can raise them on our gitter.
Accepted input formats for deepchem include csv, pkl.gz, and sdf files. For example, with a csv input, in order to build models, we expect the following columns to have entries for each row in the csv file.
- A column containing SMILES strings .
- A column containing an experimental measurement.
- (Optional) A column containing a unique compound identifier.
Here's an example of a potential input file.
|Compound ID||measured log solubility in mols per litre||smiles|
Here the "smiles" column contains the SMILES string, the "measured log solubility in mols per litre" contains the experimental measurement and "Compound ID" contains the unique compound identifier.
 Anderson, Eric, Gilman D. Veith, and David Weininger. "SMILES, a line notation and computerized interpreter for chemical structures." US Environmental Protection Agency, Environmental Research Laboratory, 1987.
Most machine learning algorithms require that input data form vectors.
However, input data for drug-discovery datasets routinely come in the
format of lists of molecules and associated experimental readouts. To
transform lists of molecules into vectors, we need to subclasses of DeepChem
dc.data.DataLoader such as
dc.data.SDFLoader. Users can subclass
load arbitrary file formats. All loaders must be
dc.feat.Featurizer object. DeepChem provides a number of
different subclasses of
dc.feat.Featurizer for convenience.
In depth performance tables for DeepChem models are available on MoleculeNet.ai
Join us on gitter at https://gitter.im/deepchem/Lobby. Probably the easiest place to ask simple questions or float requests for new features.
- Computational Modeling of β-secretase 1 (BACE-1) Inhibitors using Ligand Based Approaches
- Low Data Drug Discovery with One-Shot Learning
- MoleculeNet: A Benchmark for Molecular Machine Learning
- Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
DeepChem is possible due to notable contributions from many people including Peter Eastman, Evan Feinberg, Joe Gomes, Karl Leswing, Vijay Pande, Aneesh Pappu, Bharath Ramsundar and Michael Wu (alphabetical ordering). DeepChem was originally created by Bharath Ramsundar with encouragement and guidance from Vijay Pande.
DeepChem started as a Pande group project at Stanford, and is now developed by many academic and industrial collaborators. DeepChem actively encourages new academic and industrial groups to contribute!
DeepChem is supported by a number of corporate partners who use DeepChem to solve interesting problems.
DeepChem has transformed how we think about building QSAR and QSPR models when very large data sets are available; and we are actively using DeepChem to investigate how to best combine the power of deep learning with next generation physics-based scoring methods.
DeepCrystal was an early adopter of DeepChem, which we now rely on to abstract away some of the hardest pieces of deep learning in drug discovery. By open sourcing these efficient implementations of chemically / biologically aware deep-learning systems, DeepChem puts the latest research into the hands of the scientists that need it, materially pushing forward the field of in-silico drug discovery in the process.