Python Edition: Fundamentals of Accelerated Computing with Modern CUDA

In this course, you’ll learn how to make Python fly with accelerated computing! Building on proven curricula from CUDA Python and modern CUDA C++ workshops, the tutorial uses CuPy for drop‑in NumPy acceleration, Numba CUDA for handcrafted kernels, nvmath‑python for fast math primitives, and the new cuda.cooperative APIs for cross‑block collaboration. Participants will explore GPU thread hierarchies, shared‑memory tiling, memory‑coalescing strategies, and other fundamentals that underlie high‑performance GPU code- all delivered through a Python‑first lens that preserves the language’s renowned readability and popularity.

  • Sept 23
    Magazinet Kongsberg
    1 day
    07:00 - 15:00 UTC
    Bryce Adelstein Lelbach
    -

The learning happens entirely inside interactive Jupyter notebooks, where you can tweak parameters, rerun cells, and visualize results in real time. Step‑by‑step labs culminate in profile‑driven tuning sessions that capture execution traces with NVIDIA Nsight Systems, spotlight memory bottlenecks, and quantify the speed‑ups your optimizations unlock - mirroring the disciplined workflow championed in advanced CUDA C++ training. By the end, you’ll walk away with a practical toolkit for transforming everyday Python scripts into GPU‑powered engines and a systematic approach to squeezing every last flop from modern accelerators.

In this class, you'll learn to:

- Spot workloads ripe for GPU speed‑ups and explain the CUDA thread-block-grid model.
- Swap in CuPy or Numba to accelerate NumPy code with minimal changes.
- Write and coordinate custom CUDA kernels- including cuda.cooperative launches -entirely in Python.
- Maximize throughput via coalesced memory access, shared‑memory tiling, and lean host/device transfers.
- Profile, diagnose, and iterate on performance using Nsight Systems directly from Jupyter notebooks.

The course material is open source and can be found here: https://github.com/NVIDIA/accelerated-computing-hub/tree/main/gpu-python-tutorial

Bryce Adelstein Lelbach
Principal Architect at NVIDIA

Bryce Adelstein Lelbach has spent over a decade developing programming languages, compilers, and software libraries. He is passionate about parallel programming and strives to make it more accessible for everyone.

Bryce is a Principal Architect at NVIDIA, where he leads programming language efforts and drives the technical roadmap for NVIDIA's compute compilers and libraries.

He is one of the leaders of the systems programming language community, having served as chair of the Standard C++ Library Evolution group and the US standards committee for programming languages (INCITS/PL22). He has been an organizer and program chair for many conferences over the years.

On the C++ Committee, he has personally worked on concurrency primitives, parallel algorithms, executors, and multidimensional arrays. He is one of the founding developers of the HPX parallel runtime system.

Outside of work, Bryce is passionate about airplanes and watches. He lives in Midtown Manhattan with his girlfriend and dog.

    NDC Conferences uses cookies to see how you use our website. We also have embeds from YouTube and Vimeo. How do you feel about that?