NIPS2017: Learning to run
This repository contains software required for participation in the NIPS 2017 Challenge: Learning to Run. See more details about the challenge here. Please read about the latest changes and the logistics of the second round here (last update October 14th).
In this competition, you are tasked with developing a controller to enable a physiologically-based human model to navigate a complex obstacle course as quickly as possible. You are provided with a human musculoskeletal model and a physics-based simulation environment where you can synthesize physically and physiologically accurate motion. Potential obstacles include external obstacles like steps, or a slippery floor, along with internal obstacles like muscle weakness or motor noise. You are scored based on the distance you travel through the obstacle course in a set amount of time.
To model physics and biomechanics we use OpenSim - a biomechanical physics environment for musculoskeletal simulations.
Anaconda is required to run our simulations. Anaconda will create a virtual environment with all the necessary libraries, to avoid conflicts with libraries in your operating system. You can get anaconda from here https://www.continuum.io/downloads. In the following instructions we assume that Anaconda is successfully installed.
We support Windows, Linux, and Mac OSX (all in 64-bit). To install our simulator, you first need to create a conda environment with the OpenSim package.
On Windows, open a command prompt and type:
conda create -n opensim-rl -c kidzik opensim git python=2.7 activate opensim-rl
On Linux/OSX, run:
conda create -n opensim-rl -c kidzik opensim git python=2.7 source activate opensim-rl
These commands will create a virtual environment on your computer with the necessary simulation libraries installed. Next, you need to install our python reinforcement learning environment. Type (on all platforms):
conda install -c conda-forge lapack git pip install git+https://github.com/stanfordnmbl/osim-rl.git
If the command
python -c "import opensim" runs smoothly, you are done! Otherwise, please refer to our FAQ section.
source activate opensim-rl activates the anaconda virtual environment. You need to type it every time you open a new terminal.
To execute 200 iterations of the simulation enter the
python interpreter and run the following:
from osim.env import RunEnv env = RunEnv(visualize=True) observation = env.reset(difficulty = 0) for i in range(200): observation, reward, done, info = env.step(env.action_space.sample())
env.action_space.sample() returns a random vector for muscle activations, so, in this example, muscles are activated randomly (red indicates an active muscle and blue an inactive muscle). Clearly with this technique we won't go too far.
Your goal is to construct a controller, i.e. a function from the state space (current positions, velocities and accelerations of joints) to action space (muscle excitations), that will enable to model to travel as far as possible in a fixed amount of time. Suppose you trained a neural network mapping observations (the current state of the model) to actions (muscle excitations), i.e. you have a function
action = my_controller(observation), then
# ... total_reward = 0.0 for i in range(200): # make a step given by the controller and record the state and the reward observation, reward, done, info = env.step(my_controller(observation)) total_reward += reward if done: break # Your reward is print("Total reward %f" % total_reward)
There are many ways to construct the function
my_controller(observation). We will show how to do it with a DDPG (Deep Deterministic Policy Gradients) algorithm, using
keras-rl. If you already have experience with training reinforcement learning models, you can skip the next section and go to evaluation.
Training your first model
Below we present how to train a basic controller using keras-rl. First you need to install extra packages:
conda install keras -c conda-forge pip install git+https://github.com/matthiasplappert/keras-rl.git git clone http://github.com/stanfordnmbl/osim-rl.git
keras-rl is an excellent package compatible with OpenAI, which allows you to quickly build your first models!
Go to the
scripts subdirectory from this repository
There are two scripts:
example.pyfor training (and testing) an agent using the DDPG algorithm.
submit.pyfor submitting the result to crowdAI.org
python example.py --visualize --train --model sample
and for the gait example (walk as far as possible):
python example.py --visualize --test --model sample
Note that it will take a while to train this model. You can find many tutorials, frameworks and lessons on-line. We particularly recommend:
Tutorials & Courses on Reinforcement Learning:
- Berkeley Deep RL course by Sergey Levine
- Intro to RL on Karpathy's blog
- Intro to RL by Tambet Matiisen
- Deep RL course of David Silver
- A comprehensive list of deep RL resources
Frameworks and implementations of algorithms:
OpenSim and Biomechanics:
- OpenSim Documentation
- Muscle models
- Publication describing OpenSim
- Publication describing Simbody (multibody dynamics engine)
This list is by no means exhaustive. If you find some resources particularly well-fit for this tutorial, please let us know!
Your task is to build a function
f which takes the current state
observation (a 41 dimensional vector) and returns the muscle excitations
action (18 dimensional vector) in a way that maximizes the reward.
The trial ends either if the pelvis of the model goes below
0.65 meters or if you reach
1000 iterations (corresponding to
10 seconds in the virtual environment). Your total reward is the position of the pelvis on the
x axis after the last iteration minus a penalty for using ligament forces. Ligaments are tissues which prevent your joints from bending too much - overusing these tissues leads to injuries, so we want to avoid it. The penalty in the total reward is equal to the sum of forces generated by ligaments over the trial, divided by
After each iteration you get a reward equal to the change of the
x axis of pelvis during this iteration minus the magnitude of the ligament forces used in that iteration.
You can test your model on your local machine. For submission, you will need to interact with the remote environment: crowdAI sends you the current
observation and you need to send back the action you take in the given state. You will be evaluated at three different levels of difficulty. For details, please refer to Details of the environment.
Assuming your controller is trained and is represented as a function
my_controller(observation) returning an
action you can submit it to crowdAI through interaction with an environment there:
import opensim as osim from osim.http.client import Client from osim.env import RunEnv # Settings remote_base = "http://grader.crowdai.org:1729" crowdai_token = "[YOUR_CROWD_AI_TOKEN_HERE]" client = Client(remote_base) # Create environment observation = client.env_create(crowdai_token) # IMPLEMENTATION OF YOUR CONTROLLER # my_controller = ... (for example the one trained in keras_rl) while True: [observation, reward, done, info] = client.env_step(my_controller(observation), True) print(observation) if done: observation = client.env_reset() if not observation: break client.submit()
In the place of
[YOUR_CROWD_AI_TOKEN_HERE] put your token from the profile page from crowdai.org website.
Note that during the submission, the environment will get restarted. Since the environment is stochastic, you will need to submit three trials -- this way we make sure that your model is robust.
In order to avoid overfitting to the training environment, the top participants (those who obtained 15.0 points or more) will be asked to resubmit their solutions in the second round of the challenge. Environments in the second round will have the same structure but 10 obstacles and different seeds. In each submission, there will be 10 simulation. Each participant will have a limit of 3 submissions. The final ranking will be based on the results from the second round.
- You are not allowed to use external datasets (e.g., kinematics of people walking)
- Organizers reserve the right to modify challenge rules as required.
Details of the environment
In order to create an environment, use:
from osim.env import RunEnv env = RunEnv(visualize = True)
visualize- turn the visualizer on and off
reset(difficulty = 2, seed = None)
Restart the enivironment with a given
difficulty level and a
0- no obstacles,
1- 3 randomly positioned obstacles (balls fixed in the ground),
2- same as
1but also strength of the psoas muscles (the muscles that help bend the hip joint in the model) varies. The muscle strength is set to z * 100%, where z is a normal variable with the mean 1 and the standard deviation 0.1
seed- starting seed for the random number generator. If the seed is
None, generation from the previous seed is continued.
Your solution will be graded in the environment with
difficulty = 2, yet it might be easier to train your model with
difficulty = 0 first and then retrain with a higher difficulty
Make one iteration of the simulation.
action- a list of length
18of continuous values in
[0,1]corresponding to excitation of muscles.
The function returns:
observation- a list of length
41of real values corresponding to the current state of the model. Variables are explained in the section "Physics of the model".
reward- reward gained in the last iteration. The reward is computed as a change in position of the pelvis along the x axis minus the penalty for the use of ligaments. See the "Physics of the model" section for details.
done- indicates if the move was the last step of the environment. This happens if either
1000iterations were reached or the pelvis height is below
info- for compatibility with OpenAI, currently not used.
Physics and biomechanics of the model
The model is implemented in OpenSim, which relies on the Simbody physics engine. Note that, given recent successes in model-free reinforcement learning, expertise in biomechanics is not required to successfully compete in this challenge.
To summarize briefly, the agent is a musculoskeletal model that include body segments for each leg, a pelvis segment, and a single segment to represent the upper half of the body (trunk, head, arms). The segments are connected with joints (e.g., knee and hip) and the motion of these joints is controlled by the excitation of muscles. The muscles in the model have complex paths (e.g., muscles can cross more than one joint and there are redundant muscles). The muscle actuators themselves are also highly nonlinear. For example, there is a first order differential equation that relates electrical signal the nervous system sends to a muscle (the excitation) to the activation of a muscle (which describes how much force a muscle will actually generate given the muscle's current force-generating capacity). Given the musculoskeletal structure of bones, joint, and muscles, at each step of the simulation (corresponding to 0.01 seconds), the engine:
- computes activations of muscles from the excitations vector provided to the
- actuates muscles according to these activations,
- computes torques generated due to muscle activations,
- computes forces caused by contacting the ground,
- computes velocities and positions of joints and bodies,
- generates a new state based on forces, velocities, and positions of joints.
In each action, the following 18 muscles are actuated (9 per leg):
- biceps femoris,
- gluteus maximus,
- rectus femoris,
- tibialis anterior. The action vector corresponds to these muscles in the same order (9 muscles of the right leg first, then 9 muscles of the left leg).
The observation contains 41 values:
- position of the pelvis (rotation, x, y)
- velocity of the pelvis (rotation, x, y)
- rotation of each ankle, knee and hip (6 values)
- angular velocity of each ankle, knee and hip (6 values)
- position of the center of mass (2 values)
- velocity of the center of mass (2 values)
- positions (x, y) of head, pelvis, torso, left and right toes, left and right talus (14 values)
- strength of left and right psoas: 1 for
difficulty < 2, otherwise a random normal variable with mean 1 and standard deviation 0.1 fixed for the entire simulation
- next obstacle: x distance from the pelvis, y position of the center relative to the the ground, radius.
For more details on the simulation framework, please refer to . For more specific information about the muscles model we use, please refer to  or to OpenSim documentation.
 Delp, Scott L., et al. "OpenSim: open-source software to create and analyze dynamic simulations of movement." IEEE transactions on biomedical engineering 54.11 (2007): 1940-1950.
 Thelen, D.G. "Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults." ASME Journal of Biomechanical Engineering 125 (2003): 70–77.
Frequently Asked Questions
I'm getting 'version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference' error
If you are getting this error:
ImportError: /opensim-rl/lib/python2.7/site-packages/opensim/libSimTKcommon.so.3.6: symbol _ZTVNSt7__cxx1119basic_istringstreamIcSt11char_traitsIcESaIcEEE, version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference
conda install libgcc.
Can I use languages other than python?
Do you have a docker container?
Yes, you can use https://hub.docker.com/r/stanfordnmbl/opensim-rl/ Note, that connecting a display to a docker can be tricky and it's system dependent. Nevertheless, for training your models the display is not necessary -- the docker container can be handy for using multiple machines.
Some libraries are missing. What is required to run the environment?
Most of the libraries by default exist in major distributions of operating systems or are automatically downloaded by the conda environment. Yet, sometimes things are still missing. The minimal set of dependencies under Linux can be installed with
sudo apt install libquadmath0 libglu1-mesa libglu1-mesa-dev libsm6 libxi-dev libxmu-dev liblapack-dev
Please, try to find equivalent libraries for your OS and let us know -- we will put them here.
Why there are no energy constraints?
Please refer to the issue https://github.com/stanfordnmbl/osim-rl/issues/34.
I have some memory leaks, what can I do?
I see only python3 environment for Linux. How to install Windows environment?
Please refer to https://github.com/stanfordnmbl/osim-rl/issues/29
How to visualize observations when running simulations on the server?
Please refer to https://github.com/stanfordnmbl/osim-rl/issues/59
I still have more questions, how can I contact you?
For questions related to the challenge please use the challenge forum. For issues and problems related to installation process or to the implementation of the simulation environment feel free to create an issue on GitHub.
This challenge would not be possible without: