Name: PyTorch Symmetric Memory + NCCL Device APIs: A New Path Towards Multi-GPU Kernels - Ke Wen & Sylvain Jeaugey, NVIDIA
Start: 2026-04-08T13:30:00+0200
End: 2026-04-08T13:55:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

PyTorch Symmetric Memory + NCCL Device APIs: A New Path Towards Multi-GPU Kernels - Ke Wen & Sylvain Jeaugey, NVIDIA

Wednesday April 8, 2026 13:30 - 13:55 CEST

Master Stage

As large models shift toward inference and Mixture-of-Experts (MoE) architectures, small batch sizes and dynamic routing present new scaling challenges. Fused, customized multi-GPU kernels are emerging as the solution, but programming them for high performance remains difficult. This talk introduces a paradigm shift enabled by PyTorch Symmetric Memory and NCCL device APIs.

PyTorch Symmetric Memory provides a unified infrastructure for direct GPU-to-GPU memory access without CPU involvement. By leveraging symmetric tensor allocation and CUDA Graph-compatible signaling, it enables fine-grained, dynamic data exchange while bypassing traditional "send/receive" overhead.

We further demonstrate how NCCL device APIs simplify this model using in-kernel primitives for NVLink and GPU-Initiated Networking (GIN). We will showcase practical examples of compute-communication fusion, such as AllGather-Matmul, and customized patterns like deduplicated expert all-to-all.

These abstractions represent one of the most significant evolutions in the PyTorch and NCCL ecosystems, offering a versatile path to high-performance distributed programming.

Speakers

Ke Wen

Principal Software Architect, NVIDIA

Ke Wen works on distributed features, including Symmetric Memory, multi-GPU kernels, Expert Parallelism, inference, pipelining and graph analysis.

Sylvain Jeaugey

Distinguished Engineer, NVIDIA

Sylvain has been developing the NCCL library since its inception in 2015. He has been working on optimizing communication libraries for large parallel systems for more than 20 years.

PyTorch Symmetric Memory and NCCL Device APIs pdf

Wednesday April 8, 2026 13:30 - 13:55 CEST
Master Stage

Frameworks & Compilers

Audience Level Any

PyTorch Conference Europe 2026

Ke Wen

Sylvain Jeaugey

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event