Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 14:30 - 14:55 CEST


Custom kernels are essential for high-performance PyTorch workflows, but their integration often comes with a hidden cost. While torch.compile promises speedups, calling custom operations typically triggers graph-breaks: fallbacks to Eager mode that introduce overhead and negate your performance gains.

In this session, we provide a practical roadmap for making your extensions "compiler-aware". Using the Transformer Engine project as a case study, we will show how to utilize the custom_op extension point to bridge the gap between high-performance kernels and the torch.compile stack.

What you will learn:
• Identifying the Friction: How to profile and detect graph-breaks caused by custom extensions.
• The Registration Path: A walkthrough of the custom_op registration process for torch.compile.
• Solving the "Hard Parts": Strategies for handling complex Python-side logic that disrupts graph capture.
• Real-World Impact: How these integrations function within the Transformer Engine to maintain peak throughput.

Who should join: This talk is designed for developers building custom PyTorch extensions who want to understand how advanced operations fit into the compiled stack.
Speakers
avatar for Kshiteej Kalambarkar

Kshiteej Kalambarkar

Software Engineer Frameworks, NVIDIA
Kshiteej Kalambarkar is a software engineer at NVIDIA specializing in PyTorch and compiler technologies, with experience in torch.compile and custom kernel integration
avatar for Masaki Kozuki

Masaki Kozuki

Software Engineer, NVIDIA
Masaki Kozuki is working at NVIDIA on PyTorch.
avatar for Pawel Gadzinski

Pawel Gadzinski

Senior Performance Engineer - Deep Learning, NVIDIA
Pawel Gadzinski is a Deep Learning Performance Engineer at NVIDIA, where he works on the Transformer Engine library, enabling state-of-the-art techniques for accelerating transformer models on NVIDIA GPUs, with a focus on low-precision training.
Wednesday April 8, 2026 14:30 - 14:55 CEST
Master Stage

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link