Name: Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams - Michael Lazos, Meta
Start: 2026-04-07T15:40:00+0200
End: 2026-04-07T16:05:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams - Michael Lazos, Meta

Tuesday April 7, 2026 15:40 - 16:05 CEST

Central Room

CUDA streams are a widely-used method for parallelizing GPU computation on NVIDIA GPUs. They have long been requested by our users and enable multiple key capabilities - overlapping communication and compute kernels, training on multiple batches in parallel and parallelizing kernels, all of which are needed for achieving SOTA training performance. Another key capability is activation offloading - this can be applied to any model to prevent OOMs by asynchronously storing activations in cpu memory until they are needed by the model.

Before this work, torch.compile previously would graph break on CUDA stream contexts, which can be costly for models that utilize streams. Although workarounds exist (e.g. wrapping stream manipulation into custom ops), these solutions add complexity and create friction in the user experience. By enabling seamless CUDA stream support in PT2, we allow our users to leverage the familiar eager APIs for stream assignment and synchronization directly within torch.compile. This not only simplifies the workflow but also ensures that models using custom streaming patterns can run efficiently out-of-the-box without manual intervention or code restructuring.

Speakers

Michael Lazos

Software Engineer, Meta

Michael Lazos is a software engineer at Meta where he contributes to torch.compile. His expertise spans both graph extraction with TorchDynamo and generating optimized kernels with the backend compiler TorchInductor. Previously, he was at Microsoft contributing to project Brainwave... Read More →

SOTA CUDA Streams PTC EU 26 pdf

Tuesday April 7, 2026 15:40 - 16:05 CEST
Central Room

Frameworks & Compilers

Audience Level Intermediate
Slides Attached Yes

PyTorch Conference Europe 2026

Michael Lazos

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event