Practical_RL
A course on reinforcement learning in the wild. Taught oncampus at HSE(russian) and maintained to be friendly to online students (both english and russian).
Manifesto:
 Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
 Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
 Gitcourse. Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! Pullrequest it!
Course info
 Lecture slides are here.
 Telegram chat room for YSDA & HSE students is here
 Online student survival guide
 Installing the libraries  guide and issues thread
 Magical button that launches you into course environment:
 Anonymous feedback form for everything that didn't go through email.
 About the course
 A large list of RL materials  awesome rl
HSE and YSDA students
This section is stricly for oncampus HSE and YSDA students
 Anytask course is http://anytask.org/course/272
 HSE invite is
reHroOk
 YSDA invite is TBA
RL reading group
 Reading group chat room
 Everyone who wants to attend RL reading group ping Pavel Shvechikov 
[email protected]
Announcements
 2017.12.29  HSE track for fall'2017 is offically over. Next is spring'18 @ HSE & YSDA.
 2017.10.02  week4 homework is yet to be published, week3 and week4 deadlines are shifted one week into the future.
 2017.09.24  Week3 homework published, we're sorry for the delay
 2017.09.13  Gym website seems to have gone down indefinitely. Therefore,
 week0 homework: Bonus I counts as 2 points if you beat mean reward +5.0 for Taxiv1 or +0.95 on frozenlake8x8
 week1 homework: Instead of 1 point for task 2.2 and 3 points for 2.3 you get 4 points for 2.3.
 Since you can't submit, just ignore and instructions to do so. We'll push them this weekend to avoid merge conflicts for students.
 2017.09.04  first class just happened. Anytask submission form TBA
Syllabus
The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.

week0 Welcome to Reinforcement Learning
 Lecture: RL problems around us. Decision processes. Basic genetic algorithms
 Seminar: Welcome into openai gym, basic genetic algorithms
 Homework description  see week0/README.md

week1 RL as blackbox optimization
 Lecture: Recap on genetic algorithms; Evolutionary strategies. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
 Seminar: Tabular CEM for Taxiv0, deep CEM for box2d environments.
 Homework description  see week1/README.md

week2 Valuebased methods
 Lecture: Discounted reward MDP. Valuebased approach. Value iteration. Policy iteration. Discounted reward fails.
 Seminar: Value iteration.

week3 Modelfree reinforcement learning
 Lecture: Qlearning. SARSA. Offpolicy Vs onpolicy algorithms. Nstep algorithms. TD(Lambda).
 Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
 HSE Homework deadline: _23.59 13.10.17

week4_recap  deep learning recap
 Lecture: Deep learning 101
 Seminar: Simple image classification with convnets
 HSE Homework deadline: _23.59 13.10.17

week4 Approximate reinforcement learning
 Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.
 Seminar: Approximate Qlearning with experience replay. (CartPole, Atari)
 HSE Homework deadline: _23.59 20.10.17

week5 Exploration in reinforcement learning
 Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in modelbased RL, MCTS. "Deep" heuristics for exploration.
 Seminar: bayesian exploration for contextual bandits. UCB for MCTS.

week6 Policy gradient methods I
 Lecture: Motivation for policybased, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actorcritic (incl. GAE)
 Seminar: REINFORCE, advantage actorcritic

week7_recap Recurrent neural networks recap
 Lecture: Problems with sequential data. Recurrent neural netowks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping
 Seminar: characterlevel RNN language model

week7 Partially observable MDPs
 Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)
 Seminar: Deep kungfu & doom with recurrent A3C and DRQN

week8 Applications II
 Lecture: Reinforcement Learning as a general way to optimize nondifferentiable loss. G2P, machine translation, conversation models, image captioning, discrete GANs. Selfcritical sequence training.
 Seminar: Simple neural machine translation with selfcritical sequence training

week9 Policy gradient methods II
 Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG. Bonus: DPG for discrete action spaces.
 Seminar: Approximate TRPO for simple robotic tasks.
Course staff
Course materials and teaching by
 Fedor Ratnikov  lectures, seminars, hw checkups
 Oleg Vasilev  seminars, hw checkups, technical support
 Pavel Shvechikov  lectures, seminars, hw checkups, reading group
 Alexander Fritsler  lectures, seminars, hw checkups
Contributions
 Using pictures from Berkeley AI course
 Massively refering to CS294
 Sevaral tensorflow assignments by Scitator
 A lot of fixes from arogozhnikov
 Other awesome people: see github contributors
fall17 changes
 Better support for tensorflow & pytorch
 Our notation is now compatible with Sutton's
 Reworked & reballanced some assignments
 Added more practice on modelbased RL