With support of the Deep Learning Institute from NVIDIA, a training workshop is offered to all RMACC attendees. Attendees will also be provided information on how to become certified instructors, such as with community support from
Cyberinfrastructure Community-wide Mentorship Network (CCMNet), in order to offer this course and other DLI materials for their own communities. Course content and learning objectives follow:
Modern deep learning challenges leverage increasingly larger datasets and more complex models. As a result, significant computational power is required to train models effectively and efficiently. Learning to distribute data across multiple GPUs during deep learning model training makes possible an incredible wealth of new applications utilizing deep learning.
Learning Objectives
- Understand how data parallel deep learning training is performed using multiple GPUs
- Achieve maximum throughput when training, for the best use of multiple GPUs
- Distribute training to multiple GPUs using Pytorch Distributed Data Parallel
- Understand and utilize algorithmic considerations specific to multi-GPU training performance and accuracy