Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 15:40 - 15:50 CEST


Unlocking state-of-the-art performance, this talk reveals how subgraph and custom operator autotuning in torch.compile deliver breakthrough speedups—surpassing previous SOTA for matmul and distributed collective ops.

DecomposeK is a novel subgraph optimization in PyTorch, designed to accelerate matrix multiplication when the inner dimension (K) is very large. DecomposeK achieves, delivering up to 28% speedup over ATen with activation fusion and 10% over ATen without fusion.

Building on subgraph infrastructure, we introduced Custom Op Autotuning, which benchmarks and selects the fastest kernel implementations for custom ops. This enables epilogue fusion and the first distributed collective op autotuning in PyTorch. We also introduce Range-based dispatch autotuning that enables dynamic selection of optimal implementations based on input shapes, ensuring performance that closely matches the theoretical best for each range. Our demo shows our autotuned kernels outperform Async TP Fused AG+MM by 9% and Async TP Fully Fused kernel by 41% across all input ranges.
Speakers
avatar for Elias Ellison

Elias Ellison

Software Engineer, Meta
Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack
avatar for Paul Zhang

Paul Zhang

Software Engineer, Meta
Paul Zhang is currently a software engineer working on PyTorch and Triton at Meta, ensuring that PyTorch and PT2 best utilizes the hardware it is run on. Previous to this, Paul has done extensive work on recommendation systems for training and inference, optimizing performance and... Read More →
Wednesday April 8, 2026 15:40 - 15:50 CEST
Founders Cafe

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link