Reinforcement Learning in Continuous Action Spaces | DDPG Tutorial (Pytorch)

In this tutorial we will code a deep deterministic policy gradient (DDPG) agent in Pytorch, to beat the continuous lunar lander environment.

DDPG combines the best of Deep Q Learning and Actor Critic Methods into an algorithm that can solve environments with continuous action spaces. We will have an actor network that learns the (deterministic) policy, coupled with a critic network to learn the action-value functions. We will make use of a replay buffer to maximize sample efficiency, as well as target networks to assist in algorithm convergence and stability.

To deal with the explore exploit dilemma, we will introduce noise into the agent’s action choice function. This noise is the Ornstein Uhlenbeck noise that models temporal correlations of brownian motion.

Keep in mind that the performance you see is from an agent that is still in training mode, i.e. it still has some noise in its action. A fully trained agent in evaluation mode will perform even better. You can fix this up in the code by adding a parameter to the choose action function, and omitting the noise if you pass in a variable to indicate you are in evaluation mode.

#DeepDeterministicPolicyGradients #DDPG #ContinuousLunarLander

Learn how to turn deep reinforcement learning papers into code:

Deep Q Learning:

Actor Critic Methods:

Curiosity Driven Deep Reinforcement Learning

Natural Language Processing from First Principles: Learning Fundamentals

Here are some books / courses I recommend (affiliate links):
Grokking Deep Learning in Motion:
Grokking Deep Learning:
Grokking Deep Reinforcement Learning:

Come hang out on Discord here:


Source of this AI Video

AI video(s) you might be interested in …