Accelerating CUDA C++ Applications with Multiple GPUs (ACCAMG)

 

Course Overview

Computationally intensive CUDA C++ applications in high-performance computing, data science, bioinformatics, and deep learning can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computation can be scaled across multiple GPUs without increasing the cost of memory transfers. For organizations with multi-GPU servers, whether in the cloud or on NVIDIA DGX systems, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it's important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes.

Pré- requisitos

  • Professional experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling
  • Familiarity with the Linux command line
  • Experience using makefiles to compile C/C++ code

Suggested resources to satisfy prerequisites: Fundamentals of Accelerated Computing with CUDA C/C++, Ubuntu Command Line for Beginners (sections 1 through 5), Makefile Tutorial (through the Simple Examples section)

Objetivos do Curso

  • Use concurrent CUDA streams to overlap memory transfers with GPU computation
  • Utilize all available GPUs on a single node to scale workloads across all available GPUs
  • Combine the use of copy/compute overlap with multiple GPUs
  • Rely on the NVIDIA Nsight™ Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop

Follow On Courses

Preços & Delivery methods

Treinamento online

Duração
1 dia

Preço
  • Solicitar orçamento
Classroom training

Duração
1 dia

Preço
  • Solicitar orçamento

Agenda

Currently there are no training dates scheduled for this course.