PyTorch Conference Europe 2026: Full Schedule

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

arrow_back View All Dates

09:00 CEST

Keynote: PyTorch CTO - Matt White, Global CTO of AI, Linux Foundation

Wednesday April 8, 2026 09:00 - 09:10 CEST

Master Stage

Matt White, Global CTO of AI and CTO at PyTorch Foundation will provide an update on technical strategy, ecosystem and projects and working groups

Speakers

Matt White

Global CTO of AI, Linux Foundation, The Linux Foundation

Matt White is the Executive Director of the PyTorch Foundation and GM of AI at the Linux Foundation. He is also the Director of the Generative AI Commons. Matt has years of experience in applied research and standards in AI and data in telecom, media and gaming industries. Matt is... Read More →

1. Matt White pdf

Wednesday April 8, 2026 09:00 - 09:10 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:10 CEST

Keynote: vLLM & Ray Updates - Tyler Michael Smith, Chief Architect - Inference Engineering, Red Hat & Artur Niederfahrenhorst, Member of Technical Staff,Anyscale

Wednesday April 8, 2026 09:10 - 09:25 CEST

Master Stage

Speakers

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat

Tyler received a PhD in Computer Science at The University of Texas at Austin, studying high performance dense linear algebra - microkernels, parallelism, and theoretical lower bounds on data movement.. After a postdoc at ETH Zürich, he joined Neural Magic, first working on a graph... Read More →

Artur Niederfahrenhorst

Member of Technical Staff, Anyscale

Artur is a member of the technical staff at Anyscale, the company that recently donated Ray to the Linux Foundation. He has been contributing to Ray since early 2022, where his main contributions have been in distributed reinforcement learning. Artur majored in Computer Science at... Read More →

3. Artur Niederfahrenhorst pdf

2. Tyler Michael Smith pdf

Wednesday April 8, 2026 09:10 - 09:25 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:25 CEST

Keynote: The Hub as Infrastructure. From Open PyTorch Models, to a Safe and Performant Distribution Hub - Lysandre Debut, Chief Open-Source Officer, Hugging Face

Wednesday April 8, 2026 09:25 - 09:40 CEST

Master Stage

Speakers

Lysandre Debut

Chief Open-Source Officer, Hugging Face

Lysandre is the Chief Open-Source Officer at Hugging Face; ensuring that the ecosystem is as well supported as possible in the ML lifecycle, with open-source tools.

He has been at Hugging Face for the past six years and was the first open-source employee at Hugging Face; working on transformers and the entire stack of Hugging Face open-source libraries since then... Read More →

4. Lysandre Debut pdf

Wednesday April 8, 2026 09:25 - 09:40 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:45 CEST

Sponsored Keynote: Open Source Infrastructure for the AI Native Era - Jonathan Bryce, Executive Director, Cloud Native Computing Foundation

Wednesday April 8, 2026 09:45 - 09:50 CEST

Master Stage

AI adoption will not be limited by model ideas alone. It will be limited by how fast we can deploy, secure, observe, and scale AI systems in production. Inference is where AI becomes real for most organizations. As AI moves from frontier labs into mainstream production, the operational challenges start to look increasingly cloud native: orchestration, autoscaling, routing, security, policy, and observability. This keynote explores why the next phase of AI adoption will move faster if PyTorch and cloud native communities work together to extend proven open source patterns.

Speakers

Jonathan Bryce

Executive Director, Cloud and Infrastructure, The Linux Foundation

Jonathan Bryce is the Executive Director of Cloud & Infrastructure at the Linux Foundation, where he leads both the Cloud Native Computing Foundation (CNCF) and the OpenInfra Foundation—two of the largest and most influential open source communities in the world. With over... Read More →

5. Jonathan Bryce pdf

Wednesday April 8, 2026 09:45 - 09:50 CEST
Master Stage

Keynote Sessions

Audience Level Any
Slides Attached Yes

09:50 CEST

Keynote: Gemma 4: Compacting Intelligence for the Edge - Léonard Hussenot, Research Scientist, Google Deepmind

Wednesday April 8, 2026 09:50 - 10:05 CEST

Master Stage

This talk explores the philosophy and engineering behind Gemma 4, arguing that the future of AI isn't only about size, but about "intelligence per byte."
We will dive into why compacting intelligence—maximizing the reasoning and instruction following ability of every single token—is the ultimate bottleneck for truly useful AI. By optimizing for token efficiency and memory footprints, we unlock a new class of applications that are faster, private, and more accessible.

Speakers

Leonard Hussenot

Research Scientist, Google Deepmind

I am a Research Scientist at Google DeepMind, where I lead the Gemma post-training team focused on developing the most useful compact models for on-device applications. Since joining Google Brain, I have contributed to the evolution of Bard, Gemini, and Gemma, specializing in scaling... Read More →

Wednesday April 8, 2026 09:50 - 10:05 CEST
Master Stage

Keynote Sessions

Audience Level Any

10:35 CEST

Lightning Talk: Monarch: An API To Your Supercomputer - Marius Eriksen, Meta

Wednesday April 8, 2026 10:35 - 10:45 CEST

Master Stage

The training systems driving today’s most advanced AIs are distributed, dynamic, and complex. Pre-training relies on layered parallelism and careful fault isolation. Post-training RL spans thousands of GPUs while coordinating verifiers, compilers, and code execution.

Systems complexity pulls focus away from the core algorithms: developers are forced to assemble systems from schedulers, RPC stacks, container orchestrators, observability tooling, service discovery, and app frameworks just to begin work.

Monarch is a distributed programming framework for PyTorch that makes the cluster programmable through a single-program Python API. It exposes the supercomputer as a coherent, directly controllable system—bringing the experience of local development to large-scale training; handling fault tolerance, orchestration, tooling integration, etc.

In this talk, we will demonstrate how Monarch enables developers to focus on training logic rather than glue, extend systems easily, and supervise and debug distributed systems through a unified programming interface.

Attendees will leave with a clear model for building robust, scalable and customizable distributed PyTorch systems using Monarch.

Speakers

Marius Eriksen

Software Engineer, Meta

Marius Eriksen is a software engineer at Meta, where he works on infrastructure for large-scale training systems.

Wednesday April 8, 2026 10:35 - 10:45 CEST
Master Stage

Frameworks & Compilers

Audience Level Any

10:50 CEST

Lightning Talk: Achieving SOTA GEMM Performance: A CuTeDSL Backend for PyTorch Inductor - Nikhil Patel, Meta

Wednesday April 8, 2026 10:50 - 11:00 CEST

Master Stage

Matrix multiplication is a central compute primitive in modern deep learning, but achieving SOTA performance on novel architectures like NVIDIA Blackwell has become a bottleneck. Existing Triton-based kernels in torch.compile struggle to keep pace with rapid hardware evolution, often forcing users to hand-write custom, architecture-specific kernels - a growing gap as hardware feature velocity accelerates.

We present a new CuTeDSL GEMM backend in PyTorch Inductor that integrates NVIDIA’s kernel implementations directly into torch.compile. Built using the Cutlass API for kernel discovery, this backend allows PyTorch to expose first-class support for NVIDIA-authored GEMMs and automatically leverage new architectural features as NVIDIA updates their kernels.

The backend currently supports standard GEMM, grouped GEMM, and block-scaled MXFP8 GEMM, along with pointwise epilogue fusions (with reductions forthcoming). We present early end-to-end results from vLLM inference and TorchTitan training, demonstrating how this approach enables PyTorch to achieve high-performance GEMMs on Blackwell and beyond, while eliminating the need for users or developers to maintain handwritten kernels.

Speakers

Nikhil Patel

Software Engineer, Meta

Nikhil is a software engineer on the PyTorch Inductor team at Meta Superintelligence Labs, where he works on Inductor’s CuTeDSL GEMM backend. His work sits at the boundary between compiler code generation and hardware-native GPU features, optimizing large-scale training and inference... Read More →

CuTeDSL Inductor Backend pdf

Wednesday April 8, 2026 10:50 - 11:00 CEST
Master Stage

Frameworks & Compilers

Audience Level Intermediate
Slides Attached Yes

11:05 CEST

Fp8 Training From Hopper To Blackwell - Luca Wehrstedt, Meta

Wednesday April 8, 2026 11:05 - 11:30 CEST

Master Stage

The Hopper generation of NVIDIA GPUs first enabled the use of low-precision float8 data types for training via TensorCore acceleration. However, the recipe to best leverage it was far from settled. Practitioners had to find their way through many entangled decisions around accuracy-vs-efficiency, precision-vs-range, overflows-vs-underflows, and more. The frontier was further push forward by the DeepSeek release, and then by the micro-scaling formats introduced by Blackwell. In this talk we will go through all these approaches, comparing their pros and cons, thus guiding researchers in finding the options that work best for them.

Speakers

Luca Wehrstedt

Software Engineer, Meta

Research Engineer in Meta's Fundamental AI Research team (FAIR). At the intersection of research and infrastructure, Luca specialized in training efficiency and distributed communication. Regular contributor to PyTorch.

Fp8 at PT conf (1) pdf

Wednesday April 8, 2026 11:05 - 11:30 CEST
Master Stage

Training Systems

Audience Level Advanced
Slides Attached Yes

11:35 CEST

Portable High‑Performance LLM Serving: A Triton Backend for VLLM - Burkhard Ringlein, IBM Research & Jan van Lunteren, IBM

Wednesday April 8, 2026 11:35 - 12:00 CEST

Master Stage

Today, vLLM is the de-facto industry standard for serving Large Language Models and is widely adopted in production.

However, for most of the past, vLLM’s state-of-the-art performance was largely dependent on hand-written CUDA or HIP kernels. These kernels have typically been carefully optimized for a specific GPU platform and may pose a serious obstacle to the portability of vLLM across different hardware.

Leveraging Triton, we introduced a “Triton attention backend” to vLLM that produces highly competitive performance across GPU platforms with a single code base, without involving hand-written CUDA or HIP kernels. The Triton attention backend became the default for AMD GPUs and is used in scenarios where other attention backends have missing features. Additionally, this backend automatically selects appropriate specialized kernels based on model type or request length.

In this talk, we will present our recent advances that consistently deliver high performance on both NVIDIA and AMD GPUs with a single Triton-only code-base. We will present the engineering and science behind this Triton-only backend, including system aspects, kernel improvements, and launch grid optimizations.

Speakers

Jan van Lunteren

Senior Research Scientist, IBM Research

Jan van Lunteren is a Senior Research Scientist at IBM Research Zurich holding MSc and PhD degrees in Electrical Engineering. His research has covered a broad range of topics, including high‑speed networking, near‑memory computing, and high‑performance machine‑learning inference... Read More →

Burkhard Ringlein

Research Staff Member, IBM Research

Dr. Burkhard Ringlein is a Research Staff Member in the AI Platform team of IBM Research, based in Zurich. He is an accomplished AI systems researcher and designs, builds, debugs, and optimizes practical systems for low-latency, high-throughput machine learning applications. Currently... Read More →

2026 04 08 vanLunteren Ringlein PyTorch Paris pub pdf

Wednesday April 8, 2026 11:35 - 12:00 CEST
Master Stage

Inference & Production

Audience Level Intermediate

13:30 CEST

PyTorch Symmetric Memory + NCCL Device APIs: A New Path Towards Multi-GPU Kernels - Ke Wen & Sylvain Jeaugey, NVIDIA

Wednesday April 8, 2026 13:30 - 13:55 CEST

Master Stage

As large models shift toward inference and Mixture-of-Experts (MoE) architectures, small batch sizes and dynamic routing present new scaling challenges. Fused, customized multi-GPU kernels are emerging as the solution, but programming them for high performance remains difficult. This talk introduces a paradigm shift enabled by PyTorch Symmetric Memory and NCCL device APIs.

PyTorch Symmetric Memory provides a unified infrastructure for direct GPU-to-GPU memory access without CPU involvement. By leveraging symmetric tensor allocation and CUDA Graph-compatible signaling, it enables fine-grained, dynamic data exchange while bypassing traditional "send/receive" overhead.

We further demonstrate how NCCL device APIs simplify this model using in-kernel primitives for NVLink and GPU-Initiated Networking (GIN). We will showcase practical examples of compute-communication fusion, such as AllGather-Matmul, and customized patterns like deduplicated expert all-to-all.

These abstractions represent one of the most significant evolutions in the PyTorch and NCCL ecosystems, offering a versatile path to high-performance distributed programming.

Speakers

Ke Wen

Principal Software Architect, NVIDIA

Ke Wen works on distributed features, including Symmetric Memory, multi-GPU kernels, Expert Parallelism, inference, pipelining and graph analysis.

Sylvain Jeaugey

Distinguished Engineer, NVIDIA

Sylvain has been developing the NCCL library since its inception in 2015. He has been working on optimizing communication libraries for large parallel systems for more than 20 years.

PyTorch Symmetric Memory and NCCL Device APIs pdf

Wednesday April 8, 2026 13:30 - 13:55 CEST
Master Stage

Frameworks & Compilers

Audience Level Any

14:00 CEST

Deploying PyTorch Models To the Browser and Beyond With Transformers.js - Joshua Lochner, Hugging Face

Wednesday April 8, 2026 14:00 - 14:25 CEST

Master Stage

This session presents a comprehensive engineering roadmap for running Hugging Face Transformers entirely locally in your web browser using Transformers.js. We will explore the end-to-end pipeline required to export, optimize, and deploy PyTorch models to the web, leveraging emerging web technologies like WebGPU for efficient, cross-platform inference.

We will dive into the technical nuances of converting PyTorch models to ONNX using torch.export (Dynamo) and applying runtime-specific optimizations via ONNX Runtime GenAI. This workflow enables the production of highly efficient, quantized model artifacts suitable for browser-based execution. Finally, we will demonstrate how to deploy these optimized models using Transformers.js to create performant, interactive, and visually stunning WebAI experiences.

Speakers

Joshua Lochner

Creator of Transformers.js, Hugging Face

Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)

Wednesday April 8, 2026 14:00 - 14:25 CEST
Master Stage

Inference & Production

Audience Level Any

14:30 CEST

Seamless Integration: Custom Kernels in the Torch.compile Stack Without Graphbreaks - Kshiteej Kalambarkar, Masaki Kozuki & Pawel Gadzinski, NVIDIA

Wednesday April 8, 2026 14:30 - 14:55 CEST

Master Stage

Custom kernels are essential for high-performance PyTorch workflows, but their integration often comes with a hidden cost. While torch.compile promises speedups, calling custom operations typically triggers graph-breaks: fallbacks to Eager mode that introduce overhead and negate your performance gains.

In this session, we provide a practical roadmap for making your extensions "compiler-aware". Using the Transformer Engine project as a case study, we will show how to utilize the custom_op extension point to bridge the gap between high-performance kernels and the torch.compile stack.

What you will learn:
• Identifying the Friction: How to profile and detect graph-breaks caused by custom extensions.
• The Registration Path: A walkthrough of the custom_op registration process for torch.compile.
• Solving the "Hard Parts": Strategies for handling complex Python-side logic that disrupts graph capture.
• Real-World Impact: How these integrations function within the Transformer Engine to maintain peak throughput.

Who should join: This talk is designed for developers building custom PyTorch extensions who want to understand how advanced operations fit into the compiled stack.

Speakers

Kshiteej Kalambarkar

Software Engineer Frameworks, NVIDIA

Kshiteej Kalambarkar is a software engineer at NVIDIA specializing in PyTorch and compiler technologies, with experience in torch.compile and custom kernel integration

Masaki Kozuki

Software Engineer, NVIDIA

Masaki Kozuki is working at NVIDIA on PyTorch.

Pawel Gadzinski

Senior Performance Engineer - Deep Learning, NVIDIA

Pawel Gadzinski is a Deep Learning Performance Engineer at NVIDIA, where he works on the Transformer Engine library, enabling state-of-the-art techniques for accelerating transformer models on NVIDIA GPUs, with a focus on low-precision training.

Wednesday April 8, 2026 14:30 - 14:55 CEST
Master Stage

Frameworks & Compilers

Audience Level Intermediate

15:25 CEST

Bridging the Hardware Gap With Code Harnesses on the Hugging Face Kernels Hub - Ben Burtenshaw, Hugging Face

Wednesday April 8, 2026 15:25 - 15:50 CEST

Master Stage

What: We share experiments and tooling to standardise kernel writing for agentic coding.

We present an end-to-end experiment benchmarking 6 harnesses across 10 models on CUDA and Metal kernel writing. We compare agent cost, kernel latency, VRAM usage, and end inference performance, and show how the Kernels Hub enables distribution at scale.

We demo two tools:

Kernels Hub: Infrastructure for writing, maintaining, and distributing reproducible kernels in the PyTorch ecosystem.

HF Skills: A library for defining and evaluating agent skills for ML tasks like kernel writing.

Why: Beyond agentic hype, kernel writing is a fundamental problem requiring robust evaluation to scale the community. High-performance kernels demand rare expertise in memory coalescing, warp-level primitives, and hardware-specific optimization. In practice, builders optimize for the highest market-share hardware, leaving a massive matrix of model×hardware combinations unserved, For example: edge inference with ExecuTorch, local LLMs on Metal via vLLM, classic ML at scale on Intel. This talk is technical, intended for kernel writers and PyTorch builders who want to use agents robustly.

Speakers

Ben Burtenshaw

Community, Hugging Face

Ben Burtenshaw is an MLE in the Hugging Face open source community team, specializing in agents, LLMs, and fine-tuning. He leads the development of open-source educational initiatives like the Agents Course, the MCP Course, and the LLM Course, which bridge the gap between complex... Read More →

Wednesday April 8, 2026 15:25 - 15:50 CEST
Master Stage

Agents & Interop

Audience Level Intermediate

15:55 CEST

From Gradients To Governance: Making PyTorch Lineage-Aware - Kateryna Romashko & Clodagh Walsh, Red Hat

Wednesday April 8, 2026 15:55 - 16:20 CEST

Master Stage

PyTorch was built to track how models learn, but not whether they should have. As AI systems increasingly operate on regulated, jurisdiction bound, and sovereign data, lineage and policy can no longer live outside the runtime. This talk explores data sovereignty as a first class constraint and argues that lineage is the missing primitive in modern ML frameworks. Building on PyTorch’s dynamic graphs and autograd system, we outline how tensors could carry origin, consent, and policy metadata through training and inference. The goal is not compliance tooling, but a lineage aware PyTorch that enables trustworthy, auditable, and deployable AI across edge, federated, and European AI ecosystems.

Speakers

Kateryna Romashko

Associate Software Engineer, RedHat

Kateryna Romashko is a Software Engineer and a Master’s student in Computer Science, currently working in the Emerging Technology team at Red Hat. Her work focuses on ML systems, data lineage, and event-driven architectures, with hands-on experience across ML platforms, distributed... Read More →

Clodagh Walsh

Software Engineer, Red Hat

Clodagh is a software engineer at Red Hat working on the Emerging Technologies team under the office of the CTO. She has experience working with cloud native technologies. She is currently working on a range of AI related projects focused on topics such as MLOps and dLLMs.

Final Deck From Gradients to Governance Making PyTorch Lineage Aware pdf

Wednesday April 8, 2026 15:55 - 16:20 CEST
Master Stage

Responsible AI & Compliance

Audience Level Intermediate
Slides Attached Yes

09:00 CEST

09:10 CEST

09:25 CEST

09:45 CEST

09:50 CEST

10:35 CEST

10:50 CEST

11:05 CEST

11:35 CEST

13:30 CEST

14:00 CEST

14:30 CEST

15:25 CEST

15:55 CEST

Get help with the event