Policy Gradients Are Easy In Keras | Deep Reinforcement Learning Tutorial

Today you’re going to learn how to code a policy gradient agent in the Keras framework. As a bonus, you’ll get to see how to use custom loss functions. The policy gradient algorithm, REINFORCE specifically, is a Monte Carlo reinforcement learning method that approximates the optimal policy for the reinforcement learning agent.

It works by shifting the policy, a probability distribution for action selection, in the direction of the actions that produce the largest advantage. Here advantage is defined as the discounted sum of future rewards that follow a given time step.

While it’s incredibly powerful, it does have some drawbacks. In particular, it’s not very sample efficient. We throw out the memory with each episode; there is no replay memory buffer. Also, the agent’s policy is sensitive to perturbations in the network parameters, which results in instability in training.

Nevertheless, we’re able to get good performance in the Lunar Lander environment from the open ai gym.

#PolicyGradients #Keras #Reinforce

Learn how to turn deep reinforcement learning papers into code:

Deep Q Learning:

Actor Critic Methods:

Curiosity Driven Deep Reinforcement Learning

Natural Language Processing from First Principles:
https://www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP1-OCT-21Reinforcement Learning Fundamentals

Here are some books / courses I recommend (affiliate links):
Grokking Deep Learning in Motion: https://bit.ly/3fXHy8W
Grokking Deep Learning: https://bit.ly/3yJ14gT
Grokking Deep Reinforcement Learning: https://bit.ly/2VNAXql

Come hang out on Discord here:

Website: https://www.neuralnet.ai
Github: https://github.com/philtabor
Twitter: https://twitter.com/MLWithPhil

Source of this AI Video

AI video(s) you might be interested in …