Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 14:00 - 14:10 CEST


As PyTorch-based LLM serving matures, the challenge shifts from monolithic inference stacks to integrating diverse hardware accelerators efficiently. This session explores how modular plugin architectures enable PyTorch models to run optimally across backends—demonstrating AWS Trainium integration into vLLM through standardized interfaces.

We'll examine how vLLM's Hardware Plugin architecture uses Python's entry_points for automatic platform detection, allowing hardware vendors to extend PyTorch inference without fragmenting the codebase. This delivers automatic device detection, modular feature development, and seamless integration with PyTorch's model loading patterns.

Technical deep-dive includes NeuronWorker and NeuronxDistributedModelRunner extending vLLM base classes, NKI kernels for attention and MoE, and continuous batching with prefill/decode separation. We'll demo HuggingFace models loading through standard vLLM APIs and executing on Trainium without hardware-specific code.

Attendees learn how plugin architectures enable hardware vendors to join PyTorch inference while maintaining standard workflow compatibility.
Speakers
MS

Maen Suleiman

Product Manager, Amazon
avatar for Yahav Biran

Yahav Biran

Principal Architect, Amazon
Yahav Biran is a Principal Architect at AWS, focusing on large-scale AI workloads. He contributes to open-source projects and publishes in AWS blogs and academic journals, including the AWS compute and AI blogs and the Journal of Systems Engineering. He frequently delivers technical... Read More →
Wednesday April 8, 2026 14:00 - 14:10 CEST
Junior Stage

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link