What is Fit ML
Fit Machine Learning (FitML) is blog that houses a collection of python Machine Learning articles and examples, often focusing on Reinforcement Learning. Here, you will find code related to Q Learning, Actor-Critic, MDP, Bellman, OpenAI solutions and custom implemented approaches to solving some of the toughest and most interesting problems to date (Yes, I am "baised").
Who is Michel Aka
Michel is an AI researcher and a graduate from University of Montreal who currently works in the Healthcare industry.
Optimal Policy Tree Search
This is a RL technique which is characterized by computing the estimated value of expected sum of reward for n time steps ahead. This technique has the advantage of yeilding a better estimation of taking a specific policy, however it is computationally expensive and memorry inneficient. If one had a super computer and very large amount of memory, this technique would do extremely well for discrete action space problem/environments. I believe Alfa-Go uses a varient of this technique.
See examples and find out more about Optimal Policy Tree Search here .
As far as I know, I haven't seen anyone in the litterature implement this technique before.
The intuition behind Policy Gradient is that it optimizes the parameters of the network in the direction of higher expected sum of rewards. What if we could do the same in a computationally more effective way that also turns out to be more intuitive: enter what I am calling Selective Memory.
We chose what to commit to memory based on actual sum of rewards
Find out more here .
Q-Learning is a well knon Reinforcement Learning approach, popularized by Google Deep Mind, when they used it to master multiple early console era games. Q-Learning focuses on estimating the expected sum of rewards using the Bellman equation in order to determine which action to take. Q-Learning works especially well in discrete action space and on problems where the f(S)->Q is differentiable, this is not always the case.
Find out more about Q-Learning here .
Actor Critique Approaches
Actor Critique is an RL technique which combines Policy Gradient appraoch with a Critique (Q value estimator)
Find out more about Actor-Critique here .
Recommended Progression for the Newcomer
|Optimal Policy Tree Search||Cartpole_OPTS.py|
|Q-Learning / Deep-QN||