Accumulating Gradients

Batch size is one of the most important hyperparameters in deep learning training and has a major impact on the accuracy and performance of a model. The batch size you can use for training is limited by your memory. As s network get larger, the maximum batch size that can fit on a single GPU gets smaller. Accumulating gradients is a simple way to run larger batches of samples that do not fit into the GPU memory, and in this video we show how trivial it is using Lightning.

Follow along with this notebook:

Lightning Website:
Follow us on Twitter:

Source of this PyTorch Lightning Video

AI video(s) you might be interested in …