Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Venue: Junior Stage clear filter
Tuesday, April 7
 

11:00 CEST

Lightning Talk: Training Embedding Model Resiliently for Multimodal Model Inference Routing - Huamin Chen, Red Hat & Haichen Zhang, AMD
Tuesday April 7, 2026 11:00 - 11:10 CEST
LLM systems increasingly rely on intelligent routing to balance cost, latency, and quality tradeoffs. The vLLM Semantic Router, a vLLM Ecosystem project, provides both semantic and performance level routing intelligence for Mixture-of-Multimodal Models (MoM) architectures, but its effectiveness depends on fast and accurate classifiers.

This talk presents our end-to-end journey training production-grade embedding and classification models on AMD GPUs using native PyTorch, achieving high GPU utilization with distributed training optimizations.

We introduce a multilingual text embedding model with 32K context window and 2D Matryoshka support, and multimodal embedding models, trained on AMD GPUs using PyTorch DDP. The talk covers practical training optimizations for AMD ROCm. All training code uses native PyTorch distributed primitives, with additional enhancement to improve training stability and pipeline efficiency.

Attendees will learn how to train efficient classifiers for LLM routing systems and integrate these models into production inference pipelines.
Speakers
avatar for Huamin Chen

Huamin Chen

Technical Advisor, Microsoft
Dr. Huamin Chen is a passionate developer. He co-founded the Semantic Router project under vLLM community. His recent contributions to the CNCF ecosystem include Project Kepler, TAG Environmental Sustainability, and Cloud Native AI WG. He is also one of the founding members... Read More →
avatar for Haichen Zhang

Haichen Zhang

Senior AI Software Engineer, AMD
Haichen is the Senior AI Engineer for AMD AI Group, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Before joining AMD, Haichen worked at... Read More →
Tuesday April 7, 2026 11:00 - 11:10 CEST
Junior Stage

11:15 CEST

Lightning Talk: Flexible Deployment of PyTorch Models on MCU-Class Devices Using ExecuTorch - Robert Kalmar & Martin Pavella, NXP
Tuesday April 7, 2026 11:15 - 11:25 CEST
ExecuTorch has recently matured into a production ready framework designed specifically for efficient edge deployment of PyTorch models. Its architecture supports a broad spectrum of hardware targets—from low power, bare metal or RTOS based microcontrollers (MCU) to higher performance Linux or Android based microprocessor platforms—while meeting the demanding constraints of memory, compute, and power typically found in real world embedded applications.
This talk focuses on the deployment flexibility ExecuTorch offers for MCU class devices, highlighting how different backends enable efficient execution across heterogeneous compute units. We will explore CPU, DSP, and NPU acceleration paths using the Cortex-M, Cadence, Ethos-U, and eIQ Neutron backends, and discuss how these integrate into typical ML model deployment workflows.
To make the session practical and application oriented, we will present an optimization journey aimed at reducing power consumption—an essential requirement for ML workloads in energy constrained environments. Attendees will gain insights into backend selection, performance trade offs, and best practices for suitable deploying PyTorch models on edge devices.
Speakers
avatar for Robert Kalmar

Robert Kalmar

Principal AI/ML Engineer at NXP Semiconductors, NXP Semiconductors
Robert Kalmar is a Principal Machine Learning Engineer at NXP Semiconductors. He received his master’s degree in machine learning and intelligent systems from Brno University of Technology. At NXP he focus on machine learning solution enablement for embedded and mobile devices... Read More →
avatar for Martin Pavella

Martin Pavella

ML SW Engineer, NXP Semiconductors
I hold a Master’s degree in Machine Learning from the Brno University of Technology, graduating with distinction at both bachelor’s and master’s levels. I am a mid-level AI/ML Software Engineer at NXP Semiconductors with 2.5+ years of experience. I won the 2025 iGEM overgraduate... Read More →
Tuesday April 7, 2026 11:15 - 11:25 CEST
Junior Stage
  Inference & Production

11:30 CEST

Why Classic IAM Collapses for Agents: Rethinking IAM for Agentic Systems - Parul Singh, Red Hat
Tuesday April 7, 2026 11:30 - 11:55 CEST
Autonomous AI agents increasingly reason, plan and act across tools, services and organizational boundaries. In these environments, traditional Identity and Access Management models begin to fail. Agents are not users and they are not static services. They act on behalf of others, change context during execution and operate with different levels of autonomy and risk.

This talk examines why classic IAM assumptions like long lived identities, static permissions and check once trust always authorization do not hold for agentic systems. We focus on three core questions.
1. How should agent identity be defined when tools, workflows, and execution context are dynamic?
2. How do we preserve accountability when agents act on behalf of users or other agents?
3. How should access change as agent behavior and risk evolve during execution?

We will outline a practical migration path from traditional IAM to agent aware identity workflows. Our insights come from our work collaborating with the COSAI Secure Design for Agentic Systems working group focused on creating shared guidance and emerging standards. We will close with current open questions and how you can join us to help solve them.
Speakers
avatar for Parul Singh

Parul Singh

Principal Software Engineer, Red Hat
Parul is a Principal Software Engineer in Red Hat's Office of the CTO, working on agentic systems and security. Her work focuses on trust, identity, and observability for autonomous AI agents, including delegation, provenance, and zero trust architectures for agentic workflows. She... Read More →
slides pdf
Tuesday April 7, 2026 11:30 - 11:55 CEST
Junior Stage
  Security & Privacy

12:00 CEST

Parameterized CUDA Graph Launch in PyTorch: CUDA Graphs Without the Pain - Daniel Galvez, NVIDIA
Tuesday April 7, 2026 12:00 - 12:25 CEST
Modern GPUs are fast enough that CPU kernel launch overhead has become a real bottleneck. CUDA Graphs can eliminate this overhead, but in practice they are hard to use and easy to get wrong.

When CUDA Graph capture fails, PyTorch users typically face two choices: fix the code that breaks capture—often with limited guidance—or capture only parts of the workload. Partial capture comes with sharp footguns, most notably large increases in device memory usage due to CUDA Graphs’ private memory pools.

This talk walks through the most common CUDA Graph capture failures seen in real PyTorch workloads and shows how to diagnose and fix them. It then presents an alternative to CUDA Graph Trees: Parameterized CUDA Graph launch, which automatically applies CUDA Graphs to only the compatible regions of a workload. All you need to do is make your workload compatible with torch.compile(). This enables CUDA Graph acceleration with minimal user effort and without increased memory usage.

Using this approach, llama3.1-70B in torchtitan runs with only a 2 GB memory increase over a non-graph baseline, compared to ~10 GB using traditional CUDA Graph techniques.
Speakers
avatar for Daniel Galvez

Daniel Galvez

Manager, NVIDIA
Daniel Galvez is an AI developer technology engineer working on speech recognition and natural language processing inference and training. He has contributed to software like PyTorch, NeMo, Megatron, ESPNet, vLLM, and TRT-LLM. He is currently working on reducing CPU overheads in CUDA... Read More →
Tuesday April 7, 2026 12:00 - 12:25 CEST
Junior Stage

13:45 CEST

Teaching PyTorch To Read Your Worst PDFs With Docling - Mingxuan Zhao & Peter Staar, IBM & Carol Chen, Red Hat
Tuesday April 7, 2026 13:45 - 14:10 CEST
Building production RAG pipelines starts with a problem most teams underestimate: getting clean, structured data out of real-world documents. PDFs lose table structure, figures get separated from captions, and multi-column layouts become unreadable. Before your PyTorch models even see your data, crucial information is already lost.
Docling is an open-source, MIT-licensed document parsing library that uses PyTorch-based deep learning models to understand documents the way humans read them. It preserves hierarchy, extracts structured data from tables and figures, and supports over ten common file formats through a consistent API. Because everything runs locally, it integrates cleanly into PyTorch-native workflows with low latency and no data leaving your infrastructure.
In this talk, I'll walk through Docling's PyTorch-powered architecture and show how to build document processing pipelines for RAG and other GenAI applications. I'll also share the architecture of real-world applications of Docling and how it has improved workflows. You'll leave with practical patterns for connecting Docling to your own PyTorch-based GenAI stack.
Speakers
avatar for Carol Chen

Carol Chen

Principal AI Community Architect, Red Hat
Carol Chen is a Community Architect at Red Hat, having led several upstream communities including InstructLab, Ansible and ManageIQ. She has been actively involved in open source communities while working for Jolla and Nokia previously. In addition, she also has experiences in software... Read More →
avatar for Mingxuan Zhao

Mingxuan Zhao

Software Developer/Developer Advocate, IBM
Ming Zhao is an open source developer and Developer Advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers... Read More →
Tuesday April 7, 2026 13:45 - 14:10 CEST
Junior Stage

14:45 CEST

Brevitas Quantization Library - Pablo Monteagudo Lago, AMD
Tuesday April 7, 2026 14:45 - 15:10 CEST
Brevitas is an open‑source PyTorch library from AMD designed to support the research of state‑of‑the‑art quantization methods, including Qronos (ICLR 2026) and MixQuant (arXiv). Built for flexibility and composability, it offers modular components for exploring reduced‑precision data paths and accuracy‑preserving techniques.
As generative models scale, post‑training quantization (PTQ) has become the preferred strategy for maintaining quality without retraining, yet PTQ methods are often applied in isolation due to fragmented tooling. Brevitas provides a unified environment for modern PTQ algorithms—including Qronos, SpinQuant and AutoRound—enabling practitioners to combine complementary techniques effectively.
Brevitas leverages the latest PyTorch features, like Dynamo for tracing and selectively modifying compute graphs—for example, by inserting rotation ops to mitigate outliers. It integrates with frameworks like transformers and supports export flows including vLLM and GGUF, ensuring a smooth transition from experimentation to deployment.
This talk shows how to use Brevitas for an end‑to‑end quantization flow, showcasing how its flexibility enables new research directions.
Speakers
avatar for Pablo Monteagudo Lago

Pablo Monteagudo Lago

Research Scientist, AMD
Pablo Monteagudo is a research scientist in AMD Research and Advanced Development, based in Dublin. He specialises in co-design of neural networks and accelerators, in particular, working on topics involving neural network quantization, sparsity and accelerator design.
Tuesday April 7, 2026 14:45 - 15:10 CEST
Junior Stage
  Frameworks & Compilers

15:40 CEST

torch.compile and Diffusers: A Hands-On Guide to Peak Performance - Sayak Paul, Hugging Face
Tuesday April 7, 2026 15:40 - 16:05 CEST
This session shows how to use torch.compile with the Diffusers library to speed up diffusion models like Flux-1-Dev.

You'll learn practical techniques for both model authors and users. For authors, we cover how to make models compiler-friendly using fullgraph=True. For users, we explain regional compilation (which cuts compile time by 7x while keeping the same runtime gains) and how to avoid recompilations with dynamic=True.

We also cover real-world scenarios: running on memory-constrained GPUs using CPU offloading and quantization, and swapping LoRA adapters without triggering recompilation.

Key takeaways:
- Compiling just the Diffusion Transformer (DiT) delivers ~1.5x speedup on H100
- Regional compilation reduces cold-start compile time from 67s to 9.6s
- NF4 quantization cuts memory from 33GB to 15GB
- Combining quantization + offloading drops memory to 12.2GB
- LoRA hot-swap lets you switch adapters without recompiling

Whether you're building diffusion models or using them, this guide helps you get the best performance with minimal effort.
Speakers
avatar for Sayak Paul

Sayak Paul

Research Engineer, Hugging Face
I am a Research Engineer at Hugging Face, working on image and video generation. My day-to-day includes maintaining the Diffusers library, training, and babysitting models. When I am not working, I can be found either watching Suits for the n-th time or playing the guitar.
Tuesday April 7, 2026 15:40 - 16:05 CEST
Junior Stage

16:10 CEST

Optimizing Reinforcement Learning at Trillion-Parameter Scale - Songlin Jiang, Aalto University & Mind Lab
Tuesday April 7, 2026 16:10 - 16:35 CEST
This talk will dive into how we implemented and optimized reinforcement learning on trillion-parameter Mixture-of-Experts reasoning models using veRL, Megatron-Bridge and vLLM. The session is useful to anyone building large-scale RL training systems.

For the first part, I will walk through the system design required to make RL work at this scale using LoRA: how LoRA adapters are implemented for expert layers, how adapters are sharded and fused under tensor/pipeline/expert parallelism, and most importantly, how refit (parameter sync) is implemented for LoRA between training backend (Megatron) and rollout engine (vLLM).

The second part of the talk focuses on training–inference mismatch in MoE RL. I will explain why common mitigations such as clipping and importance sampling can fail, and how we implement fixed Router Replay R3 across vLLM, veRL, and Megatron to align routing decisions between rollout and training.

These works are done together with Mind Lab and some of the related blog posts are at:
- https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- https://macaron.im/mindlab/research/router-replay-r3-why-it-failed-and-how-we-fixed-it
Speakers
avatar for Songlin Jiang

Songlin Jiang

Doctoral Researcher, Aalto University & Mind Lab
I am a doctoral researcher at Aalto University, focusing on reducing training and inference latency for Reinforcement Learning and Large Language Models (LLMs) on High-Performance Computing (HPC) clusters. I am also a passionate free software developer, a maintainer of VeRL, and a... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Junior Stage
  Training Systems

16:40 CEST

Securing Agentic AI With PyTorch: Threat Modeling & LLM Red Teaming in Practice - Valeri Milke, VamiSec GmbH
Tuesday April 7, 2026 16:40 - 17:05 CEST
Agentic AI systems built with PyTorch introduce a new security paradigm: autonomous decision-making, tool usage, memory, and multi-step reasoning significantly expand the attack surface beyond traditional ML pipelines.

This session presents a practical, security-first approach to building and testing agentic AI systems using PyTorch, combining AI threat modeling and hands-on LLM security testing.

We introduce MAESTRO-based AI Threat Modeling to systematically identify risks across prompts, tools, memory, orchestration and model interactions. Building on this foundation, we demonstrate how the OWASP LLM Top 10 and the OWASP LLM Testing Guide can be applied to real PyTorch-based agent architectures.

The session includes a live demo of a prompt injection attack against an agentic workflow, showing how task delegation and tool invocation can be abused — and how developers can detect, mitigate and test these risks early in the AI development lifecycle.

Attendees will leave with concrete techniques to integrate AI security testing and threat modeling into PyTorch-based systems, bridging research, engineering and real-world AI risk.
Speakers
avatar for Valeri Milke

Valeri Milke

CEO, VamiSec GmbH
Valeri Milke is an AI security and cybersecurity specialist focusing on secure AI and agentic system design. He works at the intersection of PyTorch-based AI engineering, threat modeling and LLM security testing. His work includes AI red teaming, prompt injection analysis and the... Read More →
Tuesday April 7, 2026 16:40 - 17:05 CEST
Junior Stage
 
Wednesday, April 8
 

10:35 CEST

Beyond JSON-RPC: Scaling Model Context Protocols With gRPC in the PyTorch Ecosystem - Ashesh Vidyut & Madhav Bissa, Google
Wednesday April 8, 2026 10:35 - 11:00 CEST
Right now, MCP mostly relies on HTTP and STDIO. That works for simple scripts, but if you’re running high-performance PyTorch models in production, you’re going to hit a wall. When you’re moving large context windows or tensor metadata, the overhead of JSON-RPC starts to hurt.
We’re introducing SEP-1352, which adds gRPC as a native transport for MCP. Since gRPC is already the standard for microservices, it’s a natural fit for the PyTorch ecosystem. By using Protobuf instead of JSON, we get much higher throughput and lower latency—essentially making the communication between models and tools as fast as the models themselves.
In this session, we’ll cover:
Why Protobuf matters: Moving away from bulky JSON to keep bandwidth low and speed high.
Built-in Streaming: How to use gRPC’s streaming to handle long-running model outputs without timeouts.
Production-ready features: Using the same auth, load balancing, and service mesh (mTLS) you already use for your ML microservices.
Upgrading your stack: How to move from PyTorch MCP HTTP services to MCP gRPC services without throwing away your existing infra.
Speakers
avatar for Ashesh Vidyut

Ashesh Vidyut

Senior Software Engineer, Google

avatar for Madhav Bissa

Madhav Bissa

Senior Software Engineer, Google
member, grpc-Go
Wednesday April 8, 2026 10:35 - 11:00 CEST
Junior Stage
  Agents & Interop

11:05 CEST

Lightning Talk: Accelerating PyTorch Models With Torch.compile's C++ Wrapper Mode - Bin Bao, Meta
Wednesday April 8, 2026 11:05 - 11:15 CEST
This lightning talk introduces torch.compile's C++ wrapper mode, a powerful feature that reduces CPU overhead and significantly improves model performance. As modern GPUs become increasingly powerful and compiler optimizations make GPU kernels run faster, CPU overhead has become more visible as the bottleneck. By generating optimized C++ code instead of Python, cpp-wrapper mode directly tackles this challenge.

While CUDAGraphs can also reduce CPU overhead, it is not always applicable—especially with highly dynamic input shapes. In these scenarios, cpp-wrapper mode provides a robust alternative with significant performance gains. Benchmark results from the OSS Huggingface suite demonstrate that cpp-wrapper mode delivers a 39% speedup over default torch.compile.

Attendees will learn when and how to leverage cpp-wrapper mode to overcome CPU-bound limitations and understand how this feature fits into PyTorch's performance optimization landscape, enabling them to build faster machine learning applications.
Speakers
avatar for Bin Bao

Bin Bao

Software Engineer, Meta
Bin Bao is a software engineer working with the PyTorch Compiler team at Meta. He focuses on developing TorchInductor optimizations and AOTInductor for C++ deployment.
Wednesday April 8, 2026 11:05 - 11:15 CEST
Junior Stage
  Frameworks & Compilers

11:20 CEST

Lightning Talk: Building AI That Ops Teams Actually Trust - Robert King, Chronosphere / Palo Alto Networks
Wednesday April 8, 2026 11:20 - 11:30 CEST
You've built an AI that identifies root causes of incidents faster than any human could... but there's one problem, no one trusts it.

Ops teams are skeptical by nature. They've been burned by noisy alerts, black-box tools, and "intelligent" systems that weren't.
This talk covers what we learned building AI for incident response across enterprise environments: why technically correct recommendations get ignored, and how to design for skepticism from day one.

I'll share specific patterns that moved the needle:

- Validating agent responses before they reach users, catching hallucinations, weak reasoning, and overconfident outputs
- Explainability that fits the operator's mental model, not the data scientist's
- Feedback loops that improve the AI and build user trust simultaneously
- Rollout strategies that let teams build confidence gradually

Whether you're using LLMs, agents, or traditional ML for operational tasks, the trust problem is the same. Ship something wrong during an incident and you've lost your users for months.

You'll leave with a practical framework for validating AI outputs and building the kind of trust that gets recommendations acted on.
Speakers
avatar for Robert King

Robert King

Senior Sales Engineer, Chronosphere
Robert is Lead Enterprise Solutions Engineer at Chronosphere and an OpenTelemetry contributor. He recently presented on AI Observability with OpenTelemetry at Cloud Native London https://www.youtube.com/live/qF4wz-pha1w?si=PFzjNcGkbD4pFKnA&t=625 and has spoken at AWS Summit, and other... Read More →
Wednesday April 8, 2026 11:20 - 11:30 CEST
Junior Stage
  Inference & Production

11:35 CEST

Accelerating Complex-Valued Tensors With Torch.compile - Hameer Abbasi, OpenTeams Inc.
Wednesday April 8, 2026 11:35 - 12:00 CEST
torch.compile has been invaluable in accelerating many machine learning and scientific computing workflows. It has become a one-shot way to get free performance for many kinds of programs and models.

However, it comes with its own set of limitations. One of these limitations is that, for a long time, torch.compile didn't accept complex-valued tensors. These tensors have many uses, from quantum mechanics to simplifying the physics for world models. Support for such tensors would accelerate many of these workflows.

In this talk, we will take a journey into the current progress for supporting such tensors in torch.compile; some of the encountered challenges and what we hope to achieve, including some side-benefits for reducing binary size by JIT-ing kernels on demand.
Speakers
avatar for Hameer Abbasi

Hameer Abbasi

Senior Software Engineer I, OpenTeams, Inc.
Hameer Abbasi is a Senior Software Developer at OpenTeams, Inc. As part of his day job and also as a hobby, he has contributed to various projects in the scientific computing space, including NumPy, SciPy and PyTorch. He is also the lead maintainer of PyData/Sparse, a library for... Read More →
Wednesday April 8, 2026 11:35 - 12:00 CEST
Junior Stage
  Frameworks & Compilers

13:30 CEST

PyTorch on RISC-V: From Cross-Compilation To Native CI - Ludovic Henry, Meta
Wednesday April 8, 2026 13:30 - 13:55 CEST
As RISC-V matures into a viable architecture for AI and data center workloads, bringing first-class PyTorch support to the ecosystem is a critical milestone. This session provides a technical deep dive into the ongoing efforts to port PyTorch natively to RISC-V, moving beyond experimental cross-compilation toward a stable, tested, and optimized environment. We detail the challenges of reconciling native math library dependencies like OpenBLAS and oneDNN with RISC-V Vector (RVV) extensions, alongside the work required to upstream these accelerations to ensure sustainable, long-term performance.

The talk also addresses the critical "last mile" of the Python ecosystem: ensuring that the broader dependency tree—including NumPy, SciPy, and ONNX—is natively available and performant on the architecture. Finally, we examine the primary bottleneck for official support: CI infrastructure. We outline the roadmap for transitioning from tagged cross-compilation to a native testing pool, discussing the logistics of maintaining a reliable hardware fleet to meet the high-volume validation standards required for the PyTorch master branch and pull request workflows.
Speakers
avatar for Ludovic Henry

Ludovic Henry

Software Engineering Lead, Rivos
Ludovic works at the intersection of open-source software and emerging hardware. He is a key contributor to the RISC-V ecosystem, focusing on the performance and stability of the AI stack. His recent work involves optimizing native dependencies like OpenBLAS and oneDNN and establishing... Read More →
Wednesday April 8, 2026 13:30 - 13:55 CEST
Junior Stage

14:00 CEST

Lightning Talk: Pluggable PyTorch LLM Inference Architecture With VLLM and AWS Neuron Backends - Yahav Biran, Annapurna Labs & Maen Suleiman, Amazon
Wednesday April 8, 2026 14:00 - 14:10 CEST
As PyTorch-based LLM serving matures, the challenge shifts from monolithic inference stacks to integrating diverse hardware accelerators efficiently. This session explores how modular plugin architectures enable PyTorch models to run optimally across backends—demonstrating AWS Trainium integration into vLLM through standardized interfaces.

We'll examine how vLLM's Hardware Plugin architecture uses Python's entry_points for automatic platform detection, allowing hardware vendors to extend PyTorch inference without fragmenting the codebase. This delivers automatic device detection, modular feature development, and seamless integration with PyTorch's model loading patterns.

Technical deep-dive includes NeuronWorker and NeuronxDistributedModelRunner extending vLLM base classes, NKI kernels for attention and MoE, and continuous batching with prefill/decode separation. We'll demo HuggingFace models loading through standard vLLM APIs and executing on Trainium without hardware-specific code.

Attendees learn how plugin architectures enable hardware vendors to join PyTorch inference while maintaining standard workflow compatibility.
Speakers
MS

Maen Suleiman

Product Manager, Amazon
avatar for Yahav Biran

Yahav Biran

Principal Architect, Amazon
Yahav Biran is a Principal Architect at AWS, focusing on large-scale AI workloads. He contributes to open-source projects and publishes in AWS blogs and academic journals, including the AWS compute and AI blogs and the Journal of Systems Engineering. He frequently delivers technical... Read More →
Wednesday April 8, 2026 14:00 - 14:10 CEST
Junior Stage

14:15 CEST

Lightning Talk: Distributed AI Without the Infrastructure Tax - Yahav Biran, Annapurna Labs & Maen Suleiman, Amazon
Wednesday April 8, 2026 14:15 - 14:25 CEST
Running distributed AI workloads in production requires solving three problems: package compatibility, hardware abstraction, and network configuration. AWS Neuron Deep Learning Containers (DLCs) address all three by providing open-source, production-ready images for Trainium and Inferentia.
This lightning talk shows how DLCs eliminate common failure modes. We'll cover three layers: First, how DLCs solve dependency hell by versioning PyTorch, Neuron SDK, XLA backend, and PyTorch PrivateUse1 dispatcher together as a tested contract. Second, how Dynamic Resource Allocation (DRA) in Kubernetes abstracts hardware complexity—enabling Neuron core slicing, multi-tenant workloads, and topology-aware scheduling without manual device mapping. Third, how pre-configured EFA drivers settings ensure zero-copy data movement, avoiding silent performance degradation that can cost 10x throughput.
We'll demonstrate scaling from laptop to 32-node cluster using the same container image and simple Kubernetes manifests.
Attendees will learn how to eliminate weeks of setup time, achieve 65-80% cluster utilization, and deploy workloads confidently. We'll share the GitHub repository and extension patterns.
Speakers
MS

Maen Suleiman

Product Manager, Amazon
avatar for Yahav Biran

Yahav Biran

Principal Architect, Amazon
Yahav Biran is a Principal Architect at AWS, focusing on large-scale AI workloads. He contributes to open-source projects and publishes in AWS blogs and academic journals, including the AWS compute and AI blogs and the Journal of Systems Engineering. He frequently delivers technical... Read More →
Wednesday April 8, 2026 14:15 - 14:25 CEST
Junior Stage

14:30 CEST

Lightning Talk: Torch-Spyre: Compiling To a Multi-core Dataflow Accelerator With Inductor - David Grove & Olivier Tardieu, IBM
Wednesday April 8, 2026 14:30 - 14:40 CEST
Torch-Spyre (https://github.com/torch-spyre/torch-spyre) is an open source project that provides a PyTorch PrivateUse1 device with OpenReg, including an Inductor backend, for the IBM Spyre Accelerator. IBM Spyre is a high-performance energy-efficient AI accelerator featuring 32 AI-optimized compute cores each with on-chip interconnect and compiler-managed scratchpad memory.

Our goal in this session is to describe how we evolved the Spyre software stack to fully leverage Inductor. This enabled the elimination of a significant fraction of our proprietary compiler code base resulting in improved compilation time and operation coverage without loss of inference performance. We will highlight several technical challenges in compiling for Spyre-like accelerators and describe how we adapted and extended Inductor to tackle them. In particular, we will discuss our extensions to Inductor to support device-specific tiled Tensor memory layouts, and new compiler optimization passes for core-level work division and scratchpad management. We hope to engage the community in evolving the PyTorch ecosystem to more fully support them.
Speakers
avatar for Dave Grove

Dave Grove

Distinguished Research Scientist, IBM
David Grove is a Distinguished Research Scientist at IBM T.J. Watson, NY, USA. He has been a software systems researcher at IBM since 1998, specializing in programming language implementation and scalable runtime systems. He has authored more than sixty peer-reviewed publications... Read More →
avatar for Olivier Tardieu

Olivier Tardieu

Principal Research Scientist, Manager, IBM
Dr. Olivier Tardieu is a Principal Research Scientist and Manager at IBM T.J. Watson, NY, USA. He joined IBM Research in 2007. His current research focuses on cloud-related technologies, including Serverless Computing and Kubernetes, as well as their application to Machine Learning... Read More →
Wednesday April 8, 2026 14:30 - 14:40 CEST
Junior Stage
  Frameworks & Compilers

14:45 CEST

Lightning Talk: Building a PyTorch‑native VLLM Plugin for IBM Spyre - Thomas Parnell, IBM Research & Thomas Ortner, IBM Research Europe - Zurich
Wednesday April 8, 2026 14:45 - 14:55 CEST
IBM Spyre is an AI accelerator used across IBM Z and Power systems for agentic inference in production. Today, we serve models on Spyre using upstream vLLM together with an out-of-tree platform plugin. While the current plugin delivers crucial functionality for our business, it re-uses relatively little of upstream vLLM’s capabilities, and also carries a high maintenance cost.

In this talk, we will describe our efforts to redesign the Spyre vLLM plugin in a more PyTorch-native fashion. We will describe the architectural evolution of the project and describe how it leverages torch‑spyre, an open‑source extension that enables Spyre support in PyTorch via the PrivateUse1 device interface. We discuss key challenges—such as implementing a custom vLLM attention backend for Spyre—and share lessons learned while aligning vLLM’s execution model with Spyre’s hardware capabilities.

Finally, we will demonstrate a vLLM model running natively on Spyre through the new plugin and highlight areas where the community can work together to improve vLLM’s plugin interface. This talk will be especially relevant for those looking to extend vLLM to a wider variety of accelerators and use cases.
Speakers
avatar for Thomas Parnell

Thomas Parnell

Principal Research Scientist, IBM Research
Thomas received his B.Sc. and Ph.D. degrees in mathematics from the University of Warwick. U.K., in 2006 and 2011, respectively. He began his career in the field of EDA, working at Arithmatica and Siglead before joining IBM Research in 2013. During his time at IBM, Thomas has worked... Read More →
avatar for Thomas Ortner

Thomas Ortner

Research Scientist, IBM Research Europe - Zurich
Thomas Ortner is a Research Scientist at IBM Research Europe, Switzerland, in the group of Emerging Computing and Circuits. He holds a PhD and a MSc in Computer Science, a MSc degree in Technical Physics and a MSc degree in Software Engineering and Management from Graz University... Read More →
Wednesday April 8, 2026 14:45 - 14:55 CEST
Junior Stage

15:25 CEST

Building Trust for Users and Regulators Alike: A Cost-Efficient PyTorch Path To Compliance-as-Code - Raja Gopal Hari Vijay, Zoho Corporation
Wednesday April 8, 2026 15:25 - 15:50 CEST
Traditional compliance relies on retroactive logs and manually stitched audit trails, while Opacus, CrypTen, and Captum address isolated concerns without providing end-to-end lifecycle traceability. Compliance-as-Code embeds regulatory controls as executable logic within training and inference pipelines, turning compliance into a continuous engineering function and reducing audit costs.

PyTorch’s dynamic execution model enables real-time auditing and compliance gates across the model lifecycle. Features such as the Dispatcher, custom Autograd functions, and the hook system allow logging, constraint checks, and risk controls to be embedded directly into execution. For ex, a fairness gate using training hooks can block model export if disparity exceeds thresholds. Dataset initialization can detect imbalance, while dispatcher-level monitoring generates tamper-resistant audit trails linking data, model versions, and outputs. In deployment, metrics, inference hooks track bias drift, accuracy degradation, and human-intervention counts.

The talk presents practical PyTorch patterns for automated documentation, immutable audit trails, and faster certification in regulated AI deployments.
Speakers
avatar for Raja Gopal Hari Vijay -

Raja Gopal Hari Vijay -

Member Leadership Staff, Zoho Corporation
At Zoho, Raja builds large-scale Video AI (CCTV analytics, edge inference, privacy-aware deployments) on PyTorch, drives green computing via custom accelerators and FPGAs, and owns a custom Linux distribution for Zoho products and agentic workflows with security reasoning across LSM... Read More →
Wednesday April 8, 2026 15:25 - 15:50 CEST
Junior Stage

15:55 CEST

Sponsored Session: Fault-Tolerant Training: How We Build Reliable Clusters for Distributed AI Workloads - Cyril Konkratenko & Maurits de Groot, Nebius
Wednesday April 8, 2026 15:55 - 16:20 CEST
Large-scale distributed AI training is highly sensitive to infrastructure failures, where even a single node disruption can halt progress and waste substantial compute. This talk presents Nebius’s approach to fault-tolerant training, combining reliability metrics such as goodput, MTBF, and MTTR with automated infrastructure practices including health checks, workload isolation, node replacement, state recovery, and observability. Drawing on production cluster results, the presentation shows how these techniques reduce interruptions, accelerate recovery, and improve the stability and efficiency of long-running AI workloads.
Speakers
CK

Cyril Kondratenko

AI/ML Specialist Solutions Architect, Nebius
MD

Maurits de Groot

AI/ML Specialist Solutions Architect, Nebius
Wednesday April 8, 2026 15:55 - 16:20 CEST
Junior Stage
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Level
  • Slides Attached
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.