PyTorch Conference Europe 2026: Full Schedule

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

arrow_back View All Dates

09:00 CEST

Keynote: PyTorch CTO - Matt White, Global CTO of AI, Linux Foundation

Wednesday April 8, 2026 09:00 - 09:10 CEST

Master Stage

Matt White, Global CTO of AI and CTO at PyTorch Foundation will provide an update on technical strategy, ecosystem and projects and working groups

Speakers

Matt White

Global CTO of AI, Linux Foundation, The Linux Foundation

Matt White is the Executive Director of the PyTorch Foundation and GM of AI at the Linux Foundation. He is also the Director of the Generative AI Commons. Matt has years of experience in applied research and standards in AI and data in telecom, media and gaming industries. Matt is... Read More →

1. Matt White pdf

Wednesday April 8, 2026 09:00 - 09:10 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:10 CEST

Keynote: vLLM & Ray Updates - Tyler Michael Smith, Chief Architect - Inference Engineering, Red Hat & Artur Niederfahrenhorst, Member of Technical Staff,Anyscale

Wednesday April 8, 2026 09:10 - 09:25 CEST

Master Stage

Speakers

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat

Tyler received a PhD in Computer Science at The University of Texas at Austin, studying high performance dense linear algebra - microkernels, parallelism, and theoretical lower bounds on data movement.. After a postdoc at ETH Zürich, he joined Neural Magic, first working on a graph... Read More →

Artur Niederfahrenhorst

Member of Technical Staff, Anyscale

Artur is a member of the technical staff at Anyscale, the company that recently donated Ray to the Linux Foundation. He has been contributing to Ray since early 2022, where his main contributions have been in distributed reinforcement learning. Artur majored in Computer Science at... Read More →

3. Artur Niederfahrenhorst pdf

2. Tyler Michael Smith pdf

Wednesday April 8, 2026 09:10 - 09:25 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:25 CEST

Keynote: The Hub as Infrastructure. From Open PyTorch Models, to a Safe and Performant Distribution Hub - Lysandre Debut, Chief Open-Source Officer, Hugging Face

Wednesday April 8, 2026 09:25 - 09:40 CEST

Master Stage

Speakers

Lysandre Debut

Chief Open-Source Officer, Hugging Face

Lysandre is the Chief Open-Source Officer at Hugging Face; ensuring that the ecosystem is as well supported as possible in the ML lifecycle, with open-source tools.

He has been at Hugging Face for the past six years and was the first open-source employee at Hugging Face; working on transformers and the entire stack of Hugging Face open-source libraries since then... Read More →

4. Lysandre Debut pdf

Wednesday April 8, 2026 09:25 - 09:40 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:45 CEST

Sponsored Keynote: Open Source Infrastructure for the AI Native Era - Jonathan Bryce, Executive Director, Cloud Native Computing Foundation

Wednesday April 8, 2026 09:45 - 09:50 CEST

Master Stage

AI adoption will not be limited by model ideas alone. It will be limited by how fast we can deploy, secure, observe, and scale AI systems in production. Inference is where AI becomes real for most organizations. As AI moves from frontier labs into mainstream production, the operational challenges start to look increasingly cloud native: orchestration, autoscaling, routing, security, policy, and observability. This keynote explores why the next phase of AI adoption will move faster if PyTorch and cloud native communities work together to extend proven open source patterns.

Speakers

Jonathan Bryce

Executive Director, Cloud and Infrastructure, The Linux Foundation

Jonathan Bryce is the Executive Director of Cloud & Infrastructure at the Linux Foundation, where he leads both the Cloud Native Computing Foundation (CNCF) and the OpenInfra Foundation—two of the largest and most influential open source communities in the world. With over... Read More →

5. Jonathan Bryce pdf

Wednesday April 8, 2026 09:45 - 09:50 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:50 CEST

Keynote: Gemma 4: Compacting Intelligence for the Edge - Léonard Hussenot, Research Scientist, Google Deepmind

Wednesday April 8, 2026 09:50 - 10:05 CEST

Master Stage

This talk explores the philosophy and engineering behind Gemma 4, arguing that the future of AI isn't only about size, but about "intelligence per byte."
We will dive into why compacting intelligence—maximizing the reasoning and instruction following ability of every single token—is the ultimate bottleneck for truly useful AI. By optimizing for token efficiency and memory footprints, we unlock a new class of applications that are faster, private, and more accessible.

Speakers

Leonard Hussenot

Research Scientist, Google Deepmind

I am a Research Scientist at Google DeepMind, where I lead the Gemma post-training team focused on developing the most useful compact models for on-device applications. Since joining Google Brain, I have contributed to the evolution of Bard, Gemini, and Gemma, specializing in scaling... Read More →

Wednesday April 8, 2026 09:50 - 10:05 CEST
Master Stage

Keynote Sessions

Audience Level Any

10:05 CEST

Meet the vLLM Maintainers

Wednesday April 8, 2026 10:05 - 10:35 CEST

Open Platform

Meet the core maintainers of vLLM at this session! Come and discuss use cases, features, roadmap with us, or just learn how the vLLM development happens under the hood.

Speakers

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat

Nicolò Lucchesi

Senior Machine Learning Engineer, Red Hat

Nicolò is a Senior Machine Learning Engineer at Red Hat with a background in Deep Learning and Computer Vision. He works on Inference Optimization for vLLM, where he is a maintainer.

Wednesday April 8, 2026 10:05 - 10:35 CEST
Open Platform

Meet the Developers

Audience Level Any

10:35 CEST

Lightning Talk: Monarch: An API To Your Supercomputer - Marius Eriksen, Meta

Wednesday April 8, 2026 10:35 - 10:45 CEST

Master Stage

The training systems driving today’s most advanced AIs are distributed, dynamic, and complex. Pre-training relies on layered parallelism and careful fault isolation. Post-training RL spans thousands of GPUs while coordinating verifiers, compilers, and code execution.

Systems complexity pulls focus away from the core algorithms: developers are forced to assemble systems from schedulers, RPC stacks, container orchestrators, observability tooling, service discovery, and app frameworks just to begin work.

Monarch is a distributed programming framework for PyTorch that makes the cluster programmable through a single-program Python API. It exposes the supercomputer as a coherent, directly controllable system—bringing the experience of local development to large-scale training; handling fault tolerance, orchestration, tooling integration, etc.

In this talk, we will demonstrate how Monarch enables developers to focus on training logic rather than glue, extend systems easily, and supervise and debug distributed systems through a unified programming interface.

Attendees will leave with a clear model for building robust, scalable and customizable distributed PyTorch systems using Monarch.

Speakers

Marius Eriksen

Software Engineer, Meta

Marius Eriksen is a software engineer at Meta, where he works on infrastructure for large-scale training systems.

Wednesday April 8, 2026 10:35 - 10:45 CEST
Master Stage

Frameworks & Compilers

Audience Level Any

11:05 CEST

Lightning Talk: KV-Cache Centric Inference: Building a State-Aware Serving Platform With Llm-d and VLLM - Maroon Ayoub & Martin Hickey, IBM Research

Wednesday April 8, 2026 11:05 - 11:15 CEST

Central Room

We’ve spent years optimizing LLM inference around compute - faster kernels, better batching, smarter parallelism. But in production, the bottleneck increasingly isn’t FLOPs. It’s state. Specifically, the KV-cache: the attention state that makes the difference between a 4-second prefill and a sub-second cache hit. Lose it to eviction, isolate it on a single node, or fail to route to it - and you’re paying the full compute cost again for work already done.

KV-cache centric inference flips the design priority. Instead of treating cache as a byproduct, it becomes the organizing principle of the serving platform. This means tiered memory management - offloading KV blocks from GPU to CPU to shared storage so capacity scales beyond any single node. It means cross-replica visibility - so cached state computed on one instance is reusable by any other. And it means cache-aware scheduling - routing requests to where their prefix already lives.

We cover how llm-d and vLLM implement each layer, how they compose into a coherent system, and what it looks like in practice - with benchmarks, deployment patterns, and lessons from building a KV-cache centric platform in the open.

Speakers

Martin Hickey

Senior Technical Staff Member, IBM Research

Martin Hickey is a STSM at IBM Research, focused on Open Source, Cloud Native Computing, and AI. Martin has notable contributions to open source projects like vLLM, LMCache, Kubernetes, Helm, OpenTelemetry and OpenStack. Martin is a core maintainer for LMCache and an emeritus core... Read More →

Maroon Ayoub

Research Scientist & Architect, IBM Research

Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.

pytorch eu kv cache pptx

Wednesday April 8, 2026 11:05 - 11:15 CEST
Central Room

Inference & Production

Audience Level Any

11:05 CEST

Bringing PyTorch Monarch to AMD GPUs: Single-Controller Distributed Training on ROCm - Liz Li & Zachary Streeter, AMD

Wednesday April 8, 2026 11:05 - 11:30 CEST

Founders Cafe

PyTorch Monarch introduces a new distributed programming paradigm that enables developers to orchestrate entire GPU clusters from a single Python program. With its actor-based runtime, process mesh abstraction, and asynchronous execution model, Monarch simplifies large-scale distributed training and enables complex workflows that combine training, evaluation, and reinforcement learning within one unified script.

In this talk, we present our work enabling PyTorch Monarch on AMD Instinct GPUs with ROCm, expanding the single-controller model beyond CUDA environments and bringing this emerging runtime to a broader hardware ecosystem. We describe the engineering effort required to port Monarch’s GPU runtime and distributed communication stack to ROCm, including HIPification of CUDA-specific components, adaptation of memory management and synchronization semantics, and integration with high-performance GPU-to-GPU communication on multi-node clusters through RDMA.

We will share lessons learned from running Monarch workloads on MI300-class clusters, including performance considerations, debugging workflows, and developer experience improvements. Our results demonstrate that Monarch’s architecture can be successfully extended to heterogeneous hardware environments while preserving scalability and ease of use.

This work advances hardware diversity in distributed PyTorch and highlights how portable runtimes can simplify large-scale training while enabling scalable, cluster-wide experimentation across accelerator platforms.

Speakers

Liz Li

Principal AI engineer, AMD

Liz Li is a Principal AI Engineer in the AMD AI group, specializing in enabling and optimizing cutting-edge AI models on AMD Instinct GPUs for both distributed inference and training. With over 10 years of experience in computer, graphics, and AI architecture, she has previously led... Read More →

Zachary Streeter

Senior Member of Technical Staff, AMD

I'm a computational physicist working in the field of AI the past 5 years. I have a wide range of expertise from mathematics to performance optimizations and system engineering. Feel free to nerd out with me! Please connect with me on LinkedIn.

Monarch PTC v1 pptx

Wednesday April 8, 2026 11:05 - 11:30 CEST
Founders Cafe

Training Systems

Audience Level Any

13:30 CEST

PyTorch on RISC-V: From Cross-Compilation To Native CI - Ludovic Henry, Meta

Wednesday April 8, 2026 13:30 - 13:55 CEST

Junior Stage

As RISC-V matures into a viable architecture for AI and data center workloads, bringing first-class PyTorch support to the ecosystem is a critical milestone. This session provides a technical deep dive into the ongoing efforts to port PyTorch natively to RISC-V, moving beyond experimental cross-compilation toward a stable, tested, and optimized environment. We detail the challenges of reconciling native math library dependencies like OpenBLAS and oneDNN with RISC-V Vector (RVV) extensions, alongside the work required to upstream these accelerations to ensure sustainable, long-term performance.

The talk also addresses the critical "last mile" of the Python ecosystem: ensuring that the broader dependency tree—including NumPy, SciPy, and ONNX—is natively available and performant on the architecture. Finally, we examine the primary bottleneck for official support: CI infrastructure. We outline the roadmap for transitioning from tagged cross-compilation to a native testing pool, discussing the logistics of maintaining a reliable hardware fleet to meet the high-volume validation standards required for the PyTorch master branch and pull request workflows.

Speakers

Ludovic Henry

Software Engineering Lead, Rivos

Ludovic works at the intersection of open-source software and emerging hardware. He is a key contributor to the RISC-V ecosystem, focusing on the performance and stability of the AI stack. His recent work involves optimizing native dependencies like OpenBLAS and oneDNN and establishing... Read More →

PyTorch Conf EU 2026 PyTorch on RISC V pdf

Wednesday April 8, 2026 13:30 - 13:55 CEST
Junior Stage

Frameworks & Compilers

Audience Level Any

13:30 CEST

PyTorch Symmetric Memory + NCCL Device APIs: A New Path Towards Multi-GPU Kernels - Ke Wen & Sylvain Jeaugey, NVIDIA

Wednesday April 8, 2026 13:30 - 13:55 CEST

Master Stage

As large models shift toward inference and Mixture-of-Experts (MoE) architectures, small batch sizes and dynamic routing present new scaling challenges. Fused, customized multi-GPU kernels are emerging as the solution, but programming them for high performance remains difficult. This talk introduces a paradigm shift enabled by PyTorch Symmetric Memory and NCCL device APIs.

PyTorch Symmetric Memory provides a unified infrastructure for direct GPU-to-GPU memory access without CPU involvement. By leveraging symmetric tensor allocation and CUDA Graph-compatible signaling, it enables fine-grained, dynamic data exchange while bypassing traditional "send/receive" overhead.

We further demonstrate how NCCL device APIs simplify this model using in-kernel primitives for NVLink and GPU-Initiated Networking (GIN). We will showcase practical examples of compute-communication fusion, such as AllGather-Matmul, and customized patterns like deduplicated expert all-to-all.

These abstractions represent one of the most significant evolutions in the PyTorch and NCCL ecosystems, offering a versatile path to high-performance distributed programming.

Speakers

Ke Wen

Principal Software Architect, NVIDIA

Ke Wen works on distributed features, including Symmetric Memory, multi-GPU kernels, Expert Parallelism, inference, pipelining and graph analysis.

Sylvain Jeaugey

Distinguished Engineer, NVIDIA

Sylvain has been developing the NCCL library since its inception in 2015. He has been working on optimizing communication libraries for large parallel systems for more than 20 years.

PyTorch Symmetric Memory and NCCL Device APIs pdf

Wednesday April 8, 2026 13:30 - 13:55 CEST
Master Stage

Frameworks & Compilers

Audience Level Any

14:00 CEST

Deploying PyTorch Models To the Browser and Beyond With Transformers.js - Joshua Lochner, Hugging Face

Wednesday April 8, 2026 14:00 - 14:25 CEST

Master Stage

This session presents a comprehensive engineering roadmap for running Hugging Face Transformers entirely locally in your web browser using Transformers.js. We will explore the end-to-end pipeline required to export, optimize, and deploy PyTorch models to the web, leveraging emerging web technologies like WebGPU for efficient, cross-platform inference.

We will dive into the technical nuances of converting PyTorch models to ONNX using torch.export (Dynamo) and applying runtime-specific optimizations via ONNX Runtime GenAI. This workflow enables the production of highly efficient, quantized model artifacts suitable for browser-based execution. Finally, we will demonstrate how to deploy these optimized models using Transformers.js to create performant, interactive, and visually stunning WebAI experiences.

Speakers

Joshua Lochner

Creator of Transformers.js, Hugging Face

Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)

Wednesday April 8, 2026 14:00 - 14:25 CEST
Master Stage

Inference & Production

Audience Level Any

14:15 CEST

Lightning Talk: Inside VLLM's KV Offloading Connector: Async Memory Transfers for Higher Inference Throughput - Nicolò Lucchesi, Red Hat

Wednesday April 8, 2026 14:15 - 14:25 CEST

Central Room

Every LLM request produces KV-cache state that is expensive to recompute. However, GPU memory is limited in size and when memory fills up, entries are discarded from cache. A natural mitigation is expanding the KV cache to CPU DRAM which is meaningfully larger than GPU memory.
vLLM 0.11.0 introduced the Offloading Connector - an asynchronous, pluggable API for KV-cache offloading which is bundled with a native CPU backend. This new feature executes transfers concurrently with model computation on the GPU cores by using GPU DMA. This solution offers speedy loading of KV data from DRAM and near zero overhead from offloading. Getting here required rethinking vLLM's memory layout. The default per-layer KV fragmentation devastated transfer throughput. A new contiguous block layout, upstreamed in 0.12.0, increased effective block sizes by up to 125× and delivered an order-of-magnitude improvement in offloading performance.
We'll walk through the connector architecture, discuss memory transfer tradeoffs, the memory layout redesign, and practical guidance for enabling CPU offloading in production.

Speakers

Nicolò Lucchesi

Senior Machine Learning Engineer, Red Hat

Nicolò is a Senior Machine Learning Engineer at Red Hat with a background in Deep Learning and Computer Vision. He works on Inference Optimization for vLLM, where he is a maintainer.

[Pytorch Europe 26] Native KV Cache Offloading in vLLM pdf

Wednesday April 8, 2026 14:15 - 14:25 CEST
Central Room

Inference & Production

Audience Level Any
Slides Attached Yes

14:45 CEST

Lightning Talk: Full-Stack PyTorch Robotics VLA: From Data To Edge Via ExecuTorch/OpenVINO - Samet Akcay & Dmitriy Pastushenkov, Intel

Wednesday April 8, 2026 14:45 - 14:55 CEST

Central Room

While research-centric tools have lowered the entry barrier for robotics data collection, transitioning Vision-Language-Action models to production remains challenging due to fragmented edge deployment paths. This session presents a unified, PyTorch-native workflow spanning the full robotics lifecycle, from data capture and curation to optimized edge execution. We introduce a modular Physical AI pipeline designed to resolve the disconnect between research scripts and real-time hardware. The talk details practical patterns for robotics data capture and policy training in a unified PyTorch ecosystem, followed by concrete steps to export models via ExecuTorch. Using an OpenVINO backend, Quantizer, and AOT compilation, we address latency, accuracy, and operator coverage gaps, and demonstrate efficient on-device VLA inference. Using a WidowX pick-and-sort task as a case study, we demonstrate how to validate latency and numerical tolerances under physical constraints. Attendees will leave with a reference architecture and a checklist for monitoring, safety gates, and managing dataset drift, providing a roadmap for moving robotics VLA from research to production-grade edge deployment.

Speakers

Dmitriy Pastushenkov

AI Software Product Manager, Intel

Dmitriy Pastushenkov is a passionate Software Product Manager at Intel with more than 20 years of comprehensive and international experience in the industrial automation, industrial Internet of Things (IIoT) and real-time operating systems and AI. Dmitriy has held various roles in... Read More →

Samet Akcay

Principal AI Engineer, Intel

Samet Akcay is a Principal AI Engineer at Intel who leads ML R&D efforts across Open Edge Platform libraries, including Intel Geti, Datumaro, Anomalib, Training Extensions, and Inference libraries. His research specializes self-supervised learning and multi-modal object detection... Read More →

Full Stack PyTorch Robotics VLA PyTorch Conf EU2026 pdf

Wednesday April 8, 2026 14:45 - 14:55 CEST
Central Room

Inference & Production

Audience Level Any
Slides Attached Yes

14:55 CEST

Meet the Ray Maintainers

Wednesday April 8, 2026 14:55 - 15:25 CEST

Open Platform

Meet the core maintainers of Ray at this session! Come and discuss use cases, features, roadmap with us, or just learn how the Ray development happens under the hood.

Speakers

Artur Niederfahrenhorst

Member of Technical Staff, Anyscale

Wednesday April 8, 2026 14:55 - 15:25 CEST
Open Platform

Meet the Developers

Audience Level Any

15:25 CEST

Beyond the Theory: What Actually Breaks When You Scale Your Disaggregated Pytorch Models - Ekin Karabulut & Ron Kahn, NVIDIA

Wednesday April 8, 2026 15:25 - 15:50 CEST

Central Room

As inference demand explodes, new techniques to optimize these deployments have emerged. One such technique is disaggregated inference, which splits inference into differently optimized workloads (e.g. prefill and decode) on separate workers. The theory is straightforward–better GPU utilization, inference performance, and tighter control over SLAs.The deployment in production is not.
Scaling happens at multiple connected levels. Adding prefill workers for a traffic spike? Those workers belong to a prefill leader and must scale as a unit. But your prefill-to-decode ratio matters too, scale prefill without matching decode capacity and you've moved the bottleneck.Placement also plays a role: place prefill and decode far apart in your network topology and KV-cache transfers will kill your latency.Standard autoscaling treats these as independent components.They're not.
In this talk, we'll share what we've learned running disaggregated vLLM and SGLang deployments on K8s: what broke,what worked, and how we're improving performance. We'll evaluate approaches from standard deployments to specialized APIs like LWS and Grove, discuss how these integrate with frameworks like llm-d and Dynamo.

Speakers

Ekin Karabulut

AI/ML Developer Advocate, NVIDIA

Ekin is a Developer Advocate at NVIDIA, following the acquisition of Run:ai. Prior to that, she specialized in the privacy implications of federated learning systems with DNNs in distributed environments as a data scientist. Currently, she is exploring the efficient usage of large... Read More →

Ron Kahn

Senior Software Engineer, NVIDIA

Ron Kahn is a Senior Software Engineer in the NVIDIA Run:ai platform team. Ron works on the design and implementation of workload management systems that abstract Kubernetes complexity for AI practitioners. When not simplifying AI training jobs, Ron can be found cooking something... Read More →

Beyond the Theory What Actually Breaks When You Scale Your Disaggregated Pytorch Models.pptx pdf

Wednesday April 8, 2026 15:25 - 15:50 CEST
Central Room

Inference & Production

Audience Level Any
Slides Attached Yes

15:25 CEST

Building Trust for Users and Regulators Alike: A Cost-Efficient PyTorch Path To Compliance-as-Code - Raja Gopal Hari Vijay, Zoho Corporation

Wednesday April 8, 2026 15:25 - 15:50 CEST

Junior Stage

Traditional compliance relies on retroactive logs and manually stitched audit trails, while Opacus, CrypTen, and Captum address isolated concerns without providing end-to-end lifecycle traceability. Compliance-as-Code embeds regulatory controls as executable logic within training and inference pipelines, turning compliance into a continuous engineering function and reducing audit costs.

PyTorch’s dynamic execution model enables real-time auditing and compliance gates across the model lifecycle. Features such as the Dispatcher, custom Autograd functions, and the hook system allow logging, constraint checks, and risk controls to be embedded directly into execution. For ex, a fairness gate using training hooks can block model export if disparity exceeds thresholds. Dataset initialization can detect imbalance, while dispatcher-level monitoring generates tamper-resistant audit trails linking data, model versions, and outputs. In deployment, metrics, inference hooks track bias drift, accuracy degradation, and human-intervention counts.

The talk presents practical PyTorch patterns for automated documentation, immutable audit trails, and faster certification in regulated AI deployments.

Speakers

Raja Gopal Hari Vijay -

Member Leadership Staff, Zoho Corporation

At Zoho, Raja builds large-scale Video AI (CCTV analytics, edge inference, privacy-aware deployments) on PyTorch, drives green computing via custom accelerators and FPGAs, and owns a custom Linux distribution for Zoho products and agentic workflows with security reasoning across LSM... Read More →

Wednesday April 8, 2026 15:25 - 15:50 CEST
Junior Stage

Responsible AI & Compliance

Audience Level Any

09:00 CEST

09:10 CEST

09:25 CEST

09:45 CEST

09:50 CEST

10:05 CEST

10:35 CEST

11:05 CEST

11:05 CEST

13:30 CEST

13:30 CEST

14:00 CEST

14:15 CEST

14:45 CEST

14:55 CEST

15:25 CEST

15:25 CEST

Get help with the event