The PyTorch Distributed Stack is a set of PyTorch features that facilitate training models in distributed systems. PyTorch has introduced multiple new features in the distributed package to help support larger scale and higher efficiency for both data and model parallel. In this talk, Yanli Zhao (Software Engineer, Meta AI) shares tips on how to reduce memory footprint, fit larger models and achieve significant speedup with distributed systems with features like Zero Redundancy Optimizer, DistributedDataParallel, FullyShardedDataParallel, CUDA RDMA and ShardedTensor.

Source of this PyTorch AI Video

AI video(s) you might be interested in …