Fundamentals of Accelerated Computing with CUDA C/C++ Training

This workshop teaches the fundamental tools and techniques for accelerating C/C++ applications to run on massively parallel GPUs with CUDA®. You’ll learn how to write code, configure code parallelization with CUDA, optimize memory migration between the CPU and GPU accelerator, and implement the workflow that you’ve learned on a new task—accelerating a fully functional, but CPU-only, particle simulator for observable massive performance gains. At the end of the workshop, you’ll have access to additional resources to create new GPU-accelerated applications on your own.

Request On-Site or Customized Course Info

Course Details

Duration

1 day

Prerequisites

Basic C/C++ competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
No previous knowledge of CUDA programming is assumed

Skills Gained

Write code to be executed by a GPU accelerator
Expose and express data and instruction-level parallelism in C/C++ applications using CUDA
Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching
Leverage command-line and visual profilers to guide your work
Utilize concurrent streams for instruction-level parallelism
Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach

Course Outline

Introduction
Accelerating Applications with CUDA C/C++
- Learn the essential syntax and concepts to be able to write GPU-enabled C/C++ applications with CUDA.
- Write, compile, and run GPU code.
- Control parallel thread hierarchy.
- Allocate and free memory for the GPU.
Managing Accelerated Application Memory with CUDA C/C++
- Learn the command-line profiler and CUDA-managed memory, focusing on observation-driven application improvements and a deep understanding of managed memory behavior.
- Profile CUDA code with the command-line profiler.
- Go deep on unified memory.
- Optimize unified memory management.
Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++
- Identify opportunities for improved memory management and instruction-level parallelism.
- Profile CUDA code with NVIDIA Nsight Systems.
- Use concurrent CUDA streams.
Final Review

Fundamentals of Accelerated Computing with CUDA C/C++ Training

Duration

Prerequisites

Skills Gained

Course Catalog

Upskilling and Reskilling

Resources

About Us

Contact