Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 13:30 - 13:55 CEST


As large models shift toward inference and Mixture-of-Experts (MoE) architectures, small batch sizes and dynamic routing present new scaling challenges. Fused, customized multi-GPU kernels are emerging as the solution, but programming them for high performance remains difficult. This talk introduces a paradigm shift enabled by PyTorch Symmetric Memory and NCCL device APIs.

PyTorch Symmetric Memory provides a unified infrastructure for direct GPU-to-GPU memory access without CPU involvement. By leveraging symmetric tensor allocation and CUDA Graph-compatible signaling, it enables fine-grained, dynamic data exchange while bypassing traditional "send/receive" overhead.

We further demonstrate how NCCL device APIs simplify this model using in-kernel primitives for NVLink and GPU-Initiated Networking (GIN). We will showcase practical examples of compute-communication fusion, such as AllGather-Matmul, and customized patterns like deduplicated expert all-to-all.

These abstractions represent one of the most significant evolutions in the PyTorch and NCCL ecosystems, offering a versatile path to high-performance distributed programming.
Speakers
avatar for Ke Wen

Ke Wen

Principal Software Architect, NVIDIA
Ke Wen works on distributed features, including Symmetric Memory, multi-GPU kernels, Expert Parallelism, inference, pipelining and graph analysis.
avatar for Sylvain Jeaugey

Sylvain Jeaugey

Distinguished Engineer, NVIDIA
Sylvain has been developing the NCCL library since its inception in 2015. He has been working on optimizing communication libraries for large parallel systems for more than 20 years.
Wednesday April 8, 2026 13:30 - 13:55 CEST
Master Stage

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link