# Deep Learning - The Straight Dope

## Abstract

This repo contains an incremental sequence of notebooks designed to teach deep learning, MXNet, and the `gluon`

interface. Our goal is to leverage the strengths of Jupyter notebooks to present prose, graphics, equations, and code together in one place. If we're successful, the result will be a resource that could be simultaneously a book, course material, a prop for live tutorials, and a resource for plagiarising (with our blessing) useful code. To our knowledge there's no source out there that teaches either (1) the full breadth of concepts in modern deep learning or (2) interleaves an engaging textbook with runnable code. We'll find out by the end of this venture whether or not that void exists for a good reason.

Another unique aspect of this book is its authorship process. We are developing this resource fully in the public view and are making it available for free in its entirety. While the book has a few primary authors to set the tone and shape the content, we welcome contributions from the community and hope to coauthor chapters and entire sections with experts and community members. Already we've received contributions spanning typo corrections through full working examples.

## Implementation in MXNet

Throughout this book, we rely upon MXNet to teach core concepts, advanced topics, and a full complement of applications. MXNet is widely used in production environments owing to its strong reputation for speed. Now with `gluon`

, MXNet's new imperative interface (alpha), doing research in MXNet is easy.

## Dependencies

To run these notebooks, you'll want to build MXNet from source. Fortunately, this is easy (especially on Linux) if you follow these instructions. You'll also want to install Jupyter and use Python 3 (because it's 2017).

## Table of contents

### Part 1: Crashcourse

- 0 - Preface
- 1 - Introduction
- 2 - Manipulating data with NDArray
- 3 - Linear Algebra
- 4 - Probability and Statistics
- 5 - Automatic differentiation via
`autograd`

### Part 2: Introduction to Supervised Learning

- 1 - Linear Regression
*(from scratch)* - 2 - Linear Regression
*(with*`gluon`

) - 3 - Multiclass Logistic Regression
*(from scratch)* - 4 - Multiclass Logistic Regression
*(with*`gluon`

) - 5 - Overfitting and regularization
*(from scratch)* L1 and L2 Regularization (in**Roadmap**`gluon`

)

### Part 3: Deep neural networks (DNNs)

- 1 - Multilayer Perceptrons
*(from scratch!)* - 2 - Multilayer Perceptrons
*(with*`gluon`

!) Dropout Regularization (from scratch)**Roadmap**Dropout Regularization (from with**Roadmap**`gluon`

)

### Part 3.5: Plumbing

- A look under the hood of
`mxnet.gluon`

- Writing custom layer
- Advanced Data IO

### Part 4: Convolutional neural networks (CNNs)

- 1 - Convolutional Neural Network
*(from scratch!)* - 2 - Convolutional Neural Network
*(with*`gluon`

!) Batch Normalization (from scratch)**Roadmap**Batch Normalization (from with**Roadmap**`gluon`

)

### Part 5: Recurrent neural networks (RNNs)

- 1 - Simple RNNs (from scratch)
Simple RNNs (with**Roadmap**`gluon`

)- 3 - LSTMS RNNs (from scratch)
LSTMS (with**Roadmap**`gluon`

)GRUs (from scratch)**Roadmap**GRUs (with**Roadmap**`gluon`

)Dropout for recurrent nets**Roadmap**Zoneout regularization**Roadmap**

### Part 6: Computer vision (CV)

Network of networks (inception & co)**Roadmap**Residual networks**Roadmap**- Object detection
Fully-convolutional networks**Roadmap**Siamese (conjoined?) networks**Roadmap**Embeddings (pairwise and triplet losses)**Roadmap**Inceptionism / visualizing feature detectors**Roadmap**Style transfer**Roadmap**

### Part 7: Natural language processing (NLP)

Word embeddings (Word2Vec)**Roadmap**Sentence embeddings (SkipThought)**Roadmap**Sentiment analysis**Roadmap**Sequence-to-sequence learning (machine translation)**Roadmap**Sequence transduction with attention (machine translation)**Roadmap**Named entity recognition**Roadmap**Image captioning**Roadmap**

### Part 8: Unsupervised Learning

Introduction to autoencoders**Roadmap**Convolutional autoencoders (introduce upconvolution)**Roadmap**Denoising autoencoders**Roadmap**Variational autoencoders**Roadmap**Clustering**Roadmap**

### Part 9: Adversarial learning

Two Sample Tests**Roadmap**Finding adversarial examples**Roadmap**Adversarial training**Roadmap**

### Part 10: Generative adversarial networks (GANs)

Introduction to GANs**Roadmap**DCGAN**Roadmap**Wasserstein-GANs**Roadmap**Energy-based GANS**Roadmap**Conditional GANs**Roadmap**Image transduction GANs (Pix2Pix)**Roadmap**Learning from Synthetic and Unsupervised Images**Roadmap**

### Part 11: Deep reinforcement learning (DRL)

Introduction to reinforcement learning**Roadmap**Deep contextual bandits**Roadmap**Deep Q-networks**Roadmap**Policy gradient**Roadmap**Actor-critic gradient**Roadmap**

### Part 12: Variational methods and uncertainty

Dropout-based uncertainty estimation (BALD)**Roadmap**Weight uncertainty (Bayes-by-backprop)**Roadmap**Variational autoencoders**Roadmap**

### Part 13: Optimization

SGD**Roadmap**Momentum**Roadmap**AdaGrad**Roadmap**RMSProp**Roadmap**Adam**Roadmap**AdaDelta**Roadmap**SGLD / SGHNT**Roadmap**

### Part 14: Optimization, Distributed and high-performance learning

Distributed optimization (Asynchronous SGD, ...)**Roadmap**- Training with Multiple GPUs
- Fast & flexible: combining imperative & symbolic nets with HybridBlocks
Training with Multiple Machines**Roadmap**Combining imperative deep learning with symbolic graphs**Roadmap**

### Part 15: Hacking MXNet

**Custom Operators**- ...

### Part 16: Audio Processing

Intro to automatic speech recognition**Roadmap**Connectionist temporal classification (CSC) for unaligned sequences**Roadmap**Combining static and sequential data**Roadmap**

### Part 17: Recommender systems

Latent factor models**Roadmap**Deep latent factor models**Roadmap**Bilinear models**Roadmap**Learning from implicit feedback**Roadmap**

### Part 18: Time series

Forecasting**Roadmap**Modeling missing data**Roadmap**Combining static and sequential data**Roadmap**

### Appendix 1: Cheatsheets

**Roadmap**`gluon`

PyTorch to MXNet**Roadmap**Tensorflow to MXNet**Roadmap**Math to MXNet**Roadmap**

## Choose your own adventure

I've designed these tutorials so that you can traverse the curriculum in one of three ways.

- Anarchist - Choose whatever you want to read, whenever you want to read it.
- Imperialist - Proceed through all tutorials in order. In this fashion you will be exposed to each model first from scratch, writing all the code ourselves but for the basic linear algebra primitives and automatic differentiation.
- Capitalist - If you don't care how things work (or already know) and just want to see working code in
`gluon`

, you can skip (*from scratch!*) tutorials and go straight to the production-like code using the high-level`gluon`

front end.

## Authors

This evolving creature is a collaborative effort. So far, some amount of credit (and blame) can be shared by:

- Zachary C. Lipton (@zackchase)
- Mu Li (@mli)
- Alex Smola (@smolix)
- Eric Junyuan Xie (@piiswrong)

## Inspiration

In creating these tutorials, I have drawn inspitation from some the resources that me to learn machine learning and how to program with Theano and PyTorch:

- Soumith Chintala's
*Deep Learning with PyTorch: A 60 Minute Blitz* - Alec Radford's
*Bare-bones intro to Theano* - Video of Alec's intro to deep learning with Theano
- Chris Bishop's
*Pattern Recognition and Machine Learning*

## Contribute

- Already, in the short time this project has been off the ground, we've gotten some helpful PRs from the community with pedagogical suggestions, typo corrections, and other useful fixes. If you're inclined, please contribute!