Asynchronous Methods for Deep Reinforcement Learning: Labyrinth

The video shows an agent collecting rewards in previously unseen mazes using only raw pixels as input. The agent was trained using the Asynchronous Advantage Actor-Critic (A3C) algorithm and was only rewarded for picking up apples and orange portals during training.
Paper link – http://arxiv.org/pdf/1602.01783.pdf

YouTube Source for this AI Video

AI video(s) you might be interested in …