Name: Lightning Talk: Combo Kernels: Horizontal Fusion Optimization in Torch.compile - Karthick Panner Selvam, & Elias Ellison, Meta
Start: 2026-04-07T14:30:00+0200
End: 2026-04-07T14:40:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Lightning Talk: Combo Kernels: Horizontal Fusion Optimization in Torch.compile - Karthick Panner Selvam, & Elias Ellison, Meta

Tuesday April 7, 2026 14:30 - 14:40 CEST

Master Stage

Combo kernels are a compiler optimization in PyTorch Inductor that horizontally fuses multiple independent operations into a single Triton kernel launch, reducing GPU kernel launch overhead and improving memory locality.

The Problem: Models generate many small, independent operations like weight preprocessing and tensor copies. Each launch incurs overhead. For models with many such operations, this becomes a bottleneck.

The Solution: Combo kernels combine multiple operations into one kernel using a dispatch mechanism. A single program ID routes execution to the appropriate subkernel based on cumulative block boundaries. This eliminates redundant launches while preserving correctness.

Key Innovations:

Per-subkernel block dimensions: Each subkernel gets its own optimized block size instead of sharing one size across all, enabling better autotuning.

Flattened grid dispatch: We collapse the multi-dimensional block grid into a single dimension.

Results: On H100 GPUs, combo kernels deliver geomean speedups of +7.38% for HuggingFace, and +5.97% for TorchBench. The optimization is enabled by default in the vLLM repository for LLM inference acceleration.

Speakers

Elias Ellison

Software Engineer, Meta

Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack

Karthick Panner Selvam

Software Engineer, Meta

Karthick Panner Selvam is a SWE at Meta Superintelligence Lab, working on the PyTorch compiler team to enhance performance and scalability for large models. He earned his PhD in Machine for Systems at the University of Luxembourg, collaborating with Google DeepMind, ECMWF, and Frontier... Read More →

Combo Kernels Horizontal fusion optimization in torch.compile pdf

Tuesday April 7, 2026 14:30 - 14:40 CEST
Master Stage

Frameworks & Compilers

Audience Level Any
Slides Attached Yes

PyTorch Conference Europe 2026

Elias Ellison

Karthick Panner Selvam

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event