Teaching Robots to Walk with Proximal Policy Optimization (PPO) | Reinforcement Learning for Robots

Among the successes of modern bipedal robotics, deep reinforcement learning has been conspicuously absent. That is, until a group from Berkley applied Proximal Policy Optimization to teaching a bipedal robot named Cassie how to walk. They leveraged simulations in the MuJoCo simulator, coupled with judicious use of domain randomization, to get a robot to walk in the real world. In this video, we’ll analyze their paper and see how they did it.

