Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 11:35 - 12:00 CEST


Today, vLLM is the de-facto industry standard for serving Large Language Models and is widely adopted in production.

However, for most of the past, vLLM’s state-of-the-art performance was largely dependent on hand-written CUDA or HIP kernels. These kernels have typically been carefully optimized for a specific GPU platform and may pose a serious obstacle to the portability of vLLM across different hardware.

Leveraging Triton, we introduced a “Triton attention backend” to vLLM that produces highly competitive performance across GPU platforms with a single code base, without involving hand-written CUDA or HIP kernels. The Triton attention backend became the default for AMD GPUs and is used in scenarios where other attention backends have missing features. Additionally, this backend automatically selects appropriate specialized kernels based on model type or request length.

In this talk, we will present our recent advances that consistently deliver high performance on both NVIDIA and AMD GPUs with a single Triton-only code-base. We will present the engineering and science behind this Triton-only backend, including system aspects, kernel improvements, and launch grid optimizations.
Speakers
avatar for Jan van Lunteren

Jan van Lunteren

Senior Research Scientist, IBM Research
Jan van Lunteren is a Senior Research Scientist at IBM Research Zurich holding MSc and PhD degrees in Electrical Engineering. His research has covered a broad range of topics, including high‑speed networking, near‑memory computing, and high‑performance machine‑learning inference... Read More →
avatar for Burkhard Ringlein

Burkhard Ringlein

Research Staff Member, IBM Research
Dr. Burkhard Ringlein is a Research Staff Member in the AI Platform team of IBM Research, based in Zurich. He is an accomplished AI systems researcher and designs, builds, debugs, and optimizes practical systems for low-latency, high-throughput machine learning applications. Currently... Read More →
Wednesday April 8, 2026 11:35 - 12:00 CEST
Master Stage

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link