Exploding And Vanishing Gradients
Training very deep networks can make your derivatives get very small or very large quickly. This problem is referred to as vanishing or exploding gradients, which makes training unstable. In this video we introduce two flags, track_grad_norm to identify vanishing and exploding gradients, and gradient_clip_val , which will clip the gradient norm computed over all model parameters together.
Follow along with this notebook: https://bit.ly/33YzC1P
Lightning Website: https://www.pytorchlightning.ai/
Follow us on Twitter: https://twitter.com/PyTorchLightnin