Can AI Learn to Cooperate? Multi Agent Deep Deterministic Policy Gradients (MADDPG) in PyTorch

Multi agent deep deterministic policy gradients is one of the first successful algorithms for multi agent artificial intelligence. Cooperation and competition among AI agents is going to be critical as applications of deep learning expand in our daily lives. In this tutorial, we are going to read through the paper together and then code up the entire multi agent actor critic algorithm from scratch in the Pytorch framework.

The main innovation of this algorithm is the use of centralized execution and decentralized training. In brief, we’re going to give each agent’s critic network access to the observations and actions of all the agents in the simulation. The actor networks will only have access to their own perspective, hence the centralized execution.

We are going to use Open AI’s multi agent particle environment for training and testing our agents. I’ll show you how to get it from github and install the requirements in a virtual environment. We’ll cover some of the ways in which the new environments differ from the classic Open AI gym environments, and then we’re off to coding our agents.

You can read along with the paper here:

You can find the environment here:

Code for this tutorial is here:

Learn how to turn deep reinforcement learning papers into code:

Deep Q Learning:

Actor Critic Methods:

Curiosity Driven Deep Reinforcement Learning

Natural Language Processing from First Principles: Learning Fundamentals

Here are some books / courses I recommend (affiliate links):
Grokking Deep Learning in Motion:
Grokking Deep Learning:
Grokking Deep Reinforcement Learning:

Come hang out on Discord here:


time stamps:
0:00 Intro
02:28 Abstract
03:18 Paper Intro
08:13 Related Works
09:02 Markov Decision Processes
10:35 Q Learning Explained
15:25 Policy Gradients Explained
19:14 Why Multi Agent Actor Critic is Hard
20:15 DDPG Explained
24:21 MADDPG Explained
29:11 Experiments
37:57 How to Implement MADDPG
40:54 MADDPG Algorithm
42:23 Hyperparameters for MADDPG
43:42 Multi Agent Particle Environment
45:09 Environment Install & Testing
55:37 Coding the Replay Buffer
01:07:34 Actor & Critic Networks
01:15:36 Coding the Agent
01:26:05 Coding the MADDPG Class
01:39:23 Coding the Utility Function
01:40:13 Coding the Main Loop
01:46:58 Moment of Truth
01:52:09 Testing on Physical Deception
01:55:48 Conclusion & Results

Source of this AI Video

AI video(s) you might be interested in …