BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:pytorchconferenceeu2026
X-WR-CALDESC:Event Calendar
METHOD:PUBLISH
CALSCALE:GREGORIAN
PRODID:-//Sched.com PyTorch Conference Europe 2026//EN
X-WR-TIMEZONE:UTC
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T053000Z
DTEND:20260407T163500Z
SUMMARY:Community Expo
DESCRIPTION:\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:fb86e83285ba58020c1b70229af2a858
URL:http://pytorchconferenceeu2026.sched.com/event/fb86e83285ba58020c1b70229af2a858
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T053000Z
DTEND:20260407T160000Z
SUMMARY:Registration & Badge Pick-Up
DESCRIPTION:\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Lobby\, Paris\, France
SEQUENCE:0
UID:10f78a619b5238ec41fe32b1e00cc76d
URL:http://pytorchconferenceeu2026.sched.com/event/10f78a619b5238ec41fe32b1e00cc76d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T070000Z
DTEND:20260407T071000Z
SUMMARY:Keynote: Co-Evolution: How the Open Source Intelligence Stack Compounds - Mark Collier\, Executive Director\, PyTorch Foundation\, General Manager\, AI & Infrastructure\, Linux Foundation
DESCRIPTION:Agentic coding systems have crossed a threshold from experimentation to measurable economic impact. Their rapid adoption reveals a deeper shift: modern AI capability emerges from the co-evolution of models\, training frameworks\, inference engines\, reinforcement systems\, hardware\, and cloud infrastructure\, with open source enabling the flow of code\, research\, and operational knowledge across the stack. As performance gaps narrow and costs fall\, this compounding intelligence system accelerates innovation and spreads capability across companies\, industries\, and hardware platforms\, raising a simple question for the community: how fast do we want to evolve?
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:5ec9e9dc8299c28c6cd041cf069866bf
URL:http://pytorchconferenceeu2026.sched.com/event/5ec9e9dc8299c28c6cd041cf069866bf
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T071000Z
DTEND:20260407T073000Z
SUMMARY:Keynote: PyTorch Updates - Edward Yang\, Research Engineer\, Meta
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:4a264d6e24434869c2b67b381ba10998
URL:http://pytorchconferenceeu2026.sched.com/event/4a264d6e24434869c2b67b381ba10998
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T073500Z
DTEND:20260407T074500Z
SUMMARY:Keynote: Community Led Open Source RL - Joe Spisak\, VP of Product & Head of Open Source\, Reflection AI
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:003187aa7e0b7b40dcecc3758ab89350
URL:http://pytorchconferenceeu2026.sched.com/event/003187aa7e0b7b40dcecc3758ab89350
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T074500Z
DTEND:20260407T075000Z
SUMMARY:Sponsored Keynote: From One Node to Distributed Training and Inference. How the PyTorch Ecosystem Changed AI - Ramine Roane\, Corporate Vice President of AI Product Management and Ecosystem Development\, AMD
DESCRIPTION:PyTorch has evolved from a research framework into a distributed-first platform powering production AI at massive scale. As models grow to hundreds of billions of parameters\, this talk explores the challenges of scaling inference across nodes and the emerging ecosystem from Monarch and TorchTitan to open\, hardware-agnostic systems that makes it possible.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:97433bb8d9630bb1b709a7d77a05c84e
URL:http://pytorchconferenceeu2026.sched.com/event/97433bb8d9630bb1b709a7d77a05c84e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T075500Z
DTEND:20260407T081000Z
SUMMARY:Keynote: Keynote: Stream Everything - Moving from Request input to Streaming input - Patrick von Platen\, Research Engineer\, Mistral AI
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:745ca9643287de26860f43a0de5ba43a
URL:http://pytorchconferenceeu2026.sched.com/event/745ca9643287de26860f43a0de5ba43a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T081000Z
DTEND:20260407T081500Z
SUMMARY:Sponsored Keynote: Any [ Agent | Model | Accelerator | Cloud ]. Open Source AI Unlocks the World's Potential - Maryam Tahhan\, Principal Engineer & Nicolò Lucchesi\, Senior Machine Learning Engineer\, Red Hat
DESCRIPTION:Red Hat is shaping an open future for AI\, delivering on the promise of 'Any Agent\, Any Model\, Any Accelerator\, Any Cloud.' Discover the community advancements contributed in the PyTorch Foundation that empower enterprises to rapidly enable\, test\, and seamlessly scale AI workloads across their choice of infrastructure
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:25541f2004bb38d4293d93fab98a515f
URL:http://pytorchconferenceeu2026.sched.com/event/25541f2004bb38d4293d93fab98a515f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T081500Z
DTEND:20260407T082500Z
SUMMARY:Keynote: The Unbearable Lightness of (Agentic) Evaluations - Besmira Nushi\, Senior Manager\, AI Research\, NVIDIA
DESCRIPTION:The discipline of evaluating large language models underwent a major transformation with the rise of general AI capabilities. Today\, the field is undergoing yet another challenging transformation following the groundbreaking improvements in agentic tasks\, which expect models and systems to plan and take autonomous actions in the real world. Measuring how well models and systems perform in such tasks is however still i) fragile from a methodological perspective\, and ii) difficult to scale and generalize across different domains. This talk will first discuss common challenges in reproducing agentic evaluations\, including differences in reference implementation\, error handling\, trajectory post processing\, and tooling definitions. Next\, it will cover infrastructural requirements that need to be addressed for such evaluations to run efficiently at scale. Finally\, we will conclude with a set of (still nascent) best practices that can help alleviate “lightness” and build more consistent measurement pipelines.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:14345be931a5e5e5eeed676671c7f74c
URL:http://pytorchconferenceeu2026.sched.com/event/14345be931a5e5e5eeed676671c7f74c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T083000Z
DTEND:20260407T090000Z
SUMMARY:Birds of A Feather: Engineering for the EU AI Act: What Should PyTorch Expose Natively? - Roy Saurabh\, AffectLog
DESCRIPTION:The EU AI Act introduces concrete technical obligations for ML systems: traceability\, risk management\, monitoring\, and auditability. Today\, most of this burden is handled outside the ML framework—through ad-hoc tooling\, documentation\, or bespoke infrastructure.\n \n This Birds of a Feather session is an open\, practitioner-driven discussion on a forward-looking question:\n What primitives\, hooks\, or abstractions should PyTorch expose natively to better support AI accountability and regulatory readiness?\n \n Topics for discussion may include:\n - Native support for provenance\, lineage\, and training/inference traces\n - Standardized hooks for fairness\, robustness\, and drift monitoring\n - Model and dataset metadata as first-class PyTorch objects\n - Privacy-preserving logging and zero-retention execution patterns\n \n Gaps between regulatory requirements (e.g. EU AI Act) and current ML frameworks\n The goal is not consensus\, but shared understanding and concrete ideas that can inform community practices\, tooling\, and potential upstream contributions. This BoF is intended for PyTorch users\, maintainers\, researchers\, and infra engineers interested in the future of responsible\, production-grade ML.
CATEGORIES:BIRDS OF A FEATHER
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:f77cb7c7493f090561a47121e002819c
URL:http://pytorchconferenceeu2026.sched.com/event/f77cb7c7493f090561a47121e002819c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T083000Z
DTEND:20260407T090000Z
SUMMARY:Coffee Break
DESCRIPTION:Menu:&nbsp\;\n-Apple and pecan nut cake (Vegan\, Vegetarian) \n-Granola bar (Gluten Free\, Vegetarian) \n-Seasonal fruits (Vegan\, GF\, Vegetarian)\n-Egg sandwich (Vegetarian)\n-Dry fruits and dry grapes mix (Vegan\, GF\, Vegetarian)
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:05cdef11b6c93406f6273cf38529ac6a
URL:http://pytorchconferenceeu2026.sched.com/event/05cdef11b6c93406f6273cf38529ac6a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T083000Z
DTEND:20260407T090000Z
SUMMARY:Meet the Developers of PyTorch Module Maintainers
DESCRIPTION:These sessions give participants an opportunity to meet some of the developers leading PyTorch to foster collaboration\, gather feedback\, and inspire contributions and collaboration .\n\nPyTorch core modules (e.g. torch.autograd\, torch.optim\, torch.nn) form the foundation for most AI research and development\, either directly through PyTorch or indirectly via higher-level framework. The core libraries prioritize API stability\, backward compatibility\, modular design\, and simplicity.
CATEGORIES:MEET THE DEVELOPERS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:7efba8d84881781ed6ffed99d13282aa
URL:http://pytorchconferenceeu2026.sched.com/event/7efba8d84881781ed6ffed99d13282aa
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T090000Z
DTEND:20260407T091000Z
SUMMARY:Lightning Talk: Why Your Forecasting Transformer Isn’t Working (And How To Fix It in Python) - Rosheen Naeem\, Open Climate Fix
DESCRIPTION:Renewable energy is clean — but it’s also inherently variable. Solar PV generation can change dramatically within minutes due to cloud cover and weather conditions\, making accurate short-term forecasts essential for grid stability\, energy trading\, and smart-home optimisation.\n Open Climate Fix builds open and high-impact forecasting tools to accelerate the transition to a low-carbon energy system. One of these projects is Open Quartz Solar Forecast: an open-source model that uses public PV generation data\, site metadata\, and numerical weather prediction variables to forecast solar power for any location.\n In this talk\, I’ll present a real case study from my Google Summer of Code project where I implemented and trained a Temporal Fusion Transformer for multi-horizon solar forecasting. I’ll cover the practical engineering challenges behind making transformer forecasting work in Python: building continuous training windows\, aligning weather forecast steps with observations\, separating static vs time-varying features\, and stabilising training using PyTorch Forecasting and PyTorch Lightning.\n Attendees will leave with reusable patterns for real-world time-series forecasting pipelines.
CATEGORIES:APPLICATIONS & CASE STUDIES
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:63aa6257f18d6ce7cd8b256a215edb7f
URL:http://pytorchconferenceeu2026.sched.com/event/63aa6257f18d6ce7cd8b256a215edb7f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T090000Z
DTEND:20260407T092500Z
SUMMARY:Helion 1.0: A High-Level DSL for Performance Portable Kernels - Oguz Ulgen\, Meta
DESCRIPTION:ML practitioners increasingly author bespoke kernels\, but achieving portable performance demands low-level expertise and repeated manual tuning for each accelerator generation and type. We introduce Helion\, a Python-embedded DSL with a “PyTorch with tiles” programming model that preserves familiar PyTorch APIs while giving developers lower-level control over the generated kernels. Helion integrates tightly with TorchInductor to reuse PyTorch operator lowerings\, automatically manages host/device boundaries\, and provides rich language constructs for tiling\, memory movement\, and synchronization. The language defines an implicit high-dimensional configuration space that our autotuner explores\, shifting the tuning burden from developers to automated search. In this session\, I will cover both the language and what is new since PTC'25\, as well as announcing the official GA launch. This session will be open for both experienced and beginner kernel authors.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:ea4620ce0f810a15307f92b28e2664da
URL:http://pytorchconferenceeu2026.sched.com/event/ea4620ce0f810a15307f92b28e2664da
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T090000Z
DTEND:20260407T092500Z
SUMMARY:Lights\, Camera\, Inference! Video Generation as a Service With VLLM-Omni - Ricardo Noriega\, Red Hat & Doug Smith\, Red Hat\, Inc
DESCRIPTION:LLMs made for text generation as a service. What does it take to do the same for video?\n We built an experimental Video Generation as a Service stack using vLLM-Omni and the LTX-2 open weights video model to explore how far an open\, multimodal stack can go toward production use. We’ll share what worked\, what busted\, and what it takes to treat generative video as a first-class workload.\n vLLM is known for high-performance autoregressive inference\, and vLLM-Omni extends that foundation to multimodal inputs and outputs. We pushed those capabilities further by adding support for LTX-2\, extending the OpenAI-compatible API surface\, integrating with front ends\, and packaging for scalable deployment. We’re here to walk you through and get you familiar with the touch points for just how we put all the Legos together with vLLM-Omni.\n Finally\, we’ll examine the gap between novelty demos and real applications: going from quirky spaghetti eating videos to generating consistent characters\, personalized media\, customized video game cutscenes\, and interactive storytelling\, and highlight what’s still missing to make generative video truly production-ready.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:8fc71bd0fc057864e02da6513c64cbfd
URL:http://pytorchconferenceeu2026.sched.com/event/8fc71bd0fc057864e02da6513c64cbfd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T090000Z
DTEND:20260407T091000Z
SUMMARY:Lightning Talk: Training Embedding Model Resiliently for Multimodal Model Inference Routing - Huamin Chen\, Red Hat & Haichen Zhang\, AMD
DESCRIPTION:LLM systems increasingly rely on intelligent routing to balance cost\, latency\, and quality tradeoffs. The vLLM Semantic Router\, a vLLM Ecosystem project\, provides both semantic and performance level routing intelligence for Mixture-of-Multimodal Models (MoM) architectures\, but its effectiveness depends on fast and accurate classifiers.\n\nThis talk presents our end-to-end journey training production-grade embedding and classification models on AMD GPUs using native PyTorch\, achieving high GPU utilization with distributed training optimizations.\n\nWe introduce a multilingual text embedding model with 32K context window and 2D Matryoshka support\, and multimodal embedding models\, trained on AMD GPUs using PyTorch DDP. The talk covers practical training optimizations for AMD ROCm. All training code uses native PyTorch distributed primitives\, with additional enhancement to improve training stability and pipeline efficiency.\n\nAttendees will learn how to train efficient classifiers for LLM routing systems and integrate these models into production inference pipelines.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:4d911bfcb7710aa1b9f9270497e0aca6
URL:http://pytorchconferenceeu2026.sched.com/event/4d911bfcb7710aa1b9f9270497e0aca6
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T091500Z
DTEND:20260407T092500Z
SUMMARY:Lightning Talk: Deep Learning in the Wild: Embedded PyTorch for Real-World Conservation Bioacoustics - Taraqur Rahman & Owen O'Donnell\, OWL Integrations
DESCRIPTION:Passive acoustic monitoring is a powerful tool for wildlife conservation\, but deploying deep learning models in remote rainforest environments introduces strict constraints on power\, memory\, and compute. In this talk\, we present an end-to-end PyTorch-based pipeline for detecting and analyzing the endangered three-wattled bellbird using embedded deep learning systems.\n \n We cover the full lifecycle from audio preprocessing and model training in PyTorch to optimization and deployment on resource-constrained embedded devices. Topics include model architectures for sparse bioacoustic event detection\, handling extreme class imbalance\, model compression and quantization\, and practical trade-offs between accuracy\, latency\, and power consumption.\n \n The session emphasizes real-world lessons learned deploying machine learning at the edge\, where unreliable connectivity\, noisy signals\, and limited hardware define success more than benchmark metrics. Attendees will gain practical patterns for building and deploying PyTorch models for embedded and edge AI applications with real environmental impact.
CATEGORIES:APPLICATIONS & CASE STUDIES
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:581832097f7b765c6ff54ce76e99f97e
URL:http://pytorchconferenceeu2026.sched.com/event/581832097f7b765c6ff54ce76e99f97e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T091500Z
DTEND:20260407T092500Z
SUMMARY:Lightning Talk: Flexible Deployment of PyTorch Models on MCU-Class Devices Using ExecuTorch - Robert Kalmar & Martin Pavella\, NXP
DESCRIPTION:ExecuTorch has recently matured into a production ready framework designed specifically for efficient edge deployment of PyTorch models. Its architecture supports a broad spectrum of hardware targets—from low power\, bare metal or RTOS based microcontrollers (MCU) to higher performance Linux or Android based microprocessor platforms—while meeting the demanding constraints of memory\, compute\, and power typically found in real world embedded applications.\n This talk focuses on the deployment flexibility ExecuTorch offers for MCU class devices\, highlighting how different backends enable efficient execution across heterogeneous compute units. We will explore CPU\, DSP\, and NPU acceleration paths using the Cortex-M\, Cadence\, Ethos-U\, and eIQ Neutron backends\, and discuss how these integrate into typical ML model deployment workflows.\n To make the session practical and application oriented\, we will present an optimization journey aimed at reducing power consumption—an essential requirement for ML workloads in energy constrained environments. Attendees will gain insights into backend selection\, performance trade offs\, and best practices for suitable deploying PyTorch models on edge devices.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:d3dfeb921151e607fc22da07cab20a87
URL:http://pytorchconferenceeu2026.sched.com/event/d3dfeb921151e607fc22da07cab20a87
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T093000Z
DTEND:20260407T094000Z
SUMMARY:Lightning Talk: Coding Agents for Compiler Construction: Beyond the AI Assistant Paradigm - Reza Rahimi\, yasp.ai & Stefan Krassin\, yasp
DESCRIPTION:Modern ML compilers follow a familiar pattern: a frontend lowers models into an intermediate representation\, while a backend applies graph and kernel optimizations before generating code for target accelerators. PyTorch provides strong foundations through nn.Module\, FX\, and graph capture\, but implementing optimized backends remains challenging due to hardware diversity and kernel-level complexity.\n \n Optimizing GPU kernels is hard. Few engineers do it well. Hardware architectures evolve yearly\, and with hyperscalers\, chip makers\, and AI labs building custom silicon\, demand for efficient kernel generation keeps growing. This creates a gap between model developers and hardware capabilities.\n \n This talk explores coding agents as engineering tools for compiler construction\, not general-purpose assistants. We discuss how agents can generate and refine backend components by analyzing model mathematics and hardware specifications to produce optimized kernels tailored to specific targets.\n \n We present a compiler architecture built as a PyTorch add-on that accepts PyTorch models or FX graphs and produces executable artifacts\, demonstrating practical integration with existing PyTorch workflows.
CATEGORIES:AGENTS & INTEROP
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:81982703dd272edf45006495ac47c2dc
URL:http://pytorchconferenceeu2026.sched.com/event/81982703dd272edf45006495ac47c2dc
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T093000Z
DTEND:20260407T094000Z
SUMMARY:Lightning Talk: How DeepInverse Is Solving Imaging in Science and Healthcare With PyTorch - Andrew Wang\, DeepInverse; Minh Hai Nguyen\, Université de Toulouse
DESCRIPTION:Deep learning has revolutionised imaging\, a foundation of science and healthcare. DeepInverse is the PyTorch library for solving imaging problems\, unifying deep learning methods (e.g. diffusion models)\, physics (medical\, optics) and modern tooling. In this talk\, we’ll show how the PyTorch community can get involved in this exciting yet accessible application of open-source AI.\n \n AI methods in imaging must model the imaging physics\, leading to interesting engineering problems e.g. efficient differentiable ops\, physics-informed losses. We’ll show notebooks on real use-cases: accelerating brain MRI\, reducing radiation in CT scans\, imaging black holes.\n \n PyTorch enthusiasts at any level/background can contribute - from training infra for scientific data to high-level generative modelling frameworks - their AI engineering skills can directly impact imaging across multiple fields.\n \n DeepInverse is supported by a growing international user community and proudly rooted in Paris. We’ve joined the PyTorch Ecosystem and received the Prix Science Ouverte in 2024. We’re excited to join the PyTorch Conf to celebrate the vibrant French developer community!
CATEGORIES:APPLICATIONS & CASE STUDIES
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:2c6a866e9b576414704e7848dd1dfdb2
URL:http://pytorchconferenceeu2026.sched.com/event/2c6a866e9b576414704e7848dd1dfdb2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T093000Z
DTEND:20260407T095500Z
SUMMARY:Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl\, Microsoft
DESCRIPTION:Making your GPUs go brrr is complex. Efficient LLM inference requires navigating a maze of optimization techniques each with different trade-offs. This session provides a practical journey through inference optimizations\, clearly categorized by implementation effort. We'll explore techniques across three levels: - Model choices (start here): Model selection\, quantization\, smart routing - Library-level improvements (using PyTorch-based frameworks like vLLM\, SGLang\, TensorRT-LLM): Continuous batching\, KV-cache management\, tensor parallelism - Custom implementations: Speculative decoding with custom draft heads\, disaggregated inference\, fine-tuning smaller models The session covers practical trade-offs and key metrics: time to first token\, inter-token latency\, throughput\, and cost per token. Whether deploying your first model or optimizing at scale\, this talk delivers actionable insights into which techniques to prioritize for deeper investigation.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:3edb474f2d6230ac3c839918109b6c2a
URL:http://pytorchconferenceeu2026.sched.com/event/3edb474f2d6230ac3c839918109b6c2a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T093000Z
DTEND:20260407T095500Z
SUMMARY:Why Classic IAM Collapses for Agents: Rethinking IAM for Agentic Systems - Parul Singh\, Red Hat
DESCRIPTION:Autonomous AI agents increasingly reason\, plan and act across tools\, services and organizational boundaries. In these environments\, traditional Identity and Access Management models begin to fail. Agents are not users and they are not static services. They act on behalf of others\, change context during execution and operate with different levels of autonomy and risk.\n \n This talk examines why classic IAM assumptions like long lived identities\, static permissions and check once trust always authorization do not hold for agentic systems. We focus on three core questions.\n 1. How should agent identity be defined when tools\, workflows\, and execution context are dynamic?\n 2. How do we preserve accountability when agents act on behalf of users or other agents?\n 3. How should access change as agent behavior and risk evolve during execution?\n \n We will outline a practical migration path from traditional IAM to agent aware identity workflows. Our insights come from our work collaborating with the COSAI Secure Design for Agentic Systems working group focused on creating shared guidance and emerging standards. We will close with current open questions and how you can join us to help solve them.
CATEGORIES:SECURITY & PRIVACY
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:0ad61d38e69359e798fcf5a4345edda0
URL:http://pytorchconferenceeu2026.sched.com/event/0ad61d38e69359e798fcf5a4345edda0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T094500Z
DTEND:20260407T095500Z
SUMMARY:Lightning Talk: ExecuTorch on Microcontrollers: Deploying PyTorch To the Smallest Edge - RJ Ascani & Matthias Cremon\, Meta
DESCRIPTION:ExecuTorch extends PyTorch's reach to the most resource-constrained devices: microcontrollers\, DSPs\, and specialized neural processing units powering always-on sensors\, wearables\, and embedded systems. In this talk\, we'll share the current state and roadmap for running ExecuTorch on platforms where every kilobyte of memory and milliwatt of power matters.\n \n What you'll learn:\n - How ExecuTorch's design enables deployment from ultra-low-power MCUs to DSP and NPU accelerators\, all from a single PyTorch workflow\n - The state of backend support for Cadence DSPs\, ARM Ethos-U and Cortex-M\n - Practical considerations for deploying models with sub-megabyte footprints and milliwatt power budgets\n - Case studies spanning always-on audio\, embedded vision\, and TinyML applications
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:3bd54cb1bf2a50ba54a71b6a03c68a4c
URL:http://pytorchconferenceeu2026.sched.com/event/3bd54cb1bf2a50ba54a71b6a03c68a4c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T094500Z
DTEND:20260407T095500Z
SUMMARY:Lightning Talk: TorchJD: Jacobian Descent in PyTorch - Pierre Quinton\, EPFL & Valérian Rey\, Simplex Lab
DESCRIPTION:Jacobian descent (JD) is an extension of gradient descent supporting the optimization of vector-valued functions. This algorithm can be used to train neural networks with multiple loss functions (e.g. multi-task learning). JD iteratively updates the parameters of the model using the Jacobian matrix of the vector of losses (the matrix stacking each individual loss' gradient).\n \n To support and extend our research\, we have developed the TorchJD library. With it\, it's easy and efficient to compute the Jacobians with respect to the model parameters\, and to aggregate them into an update direction that is beneficial to every objective. In contrast\, if we had averaged the losses and used gradient descent\, the update would have been beneficial to the average loss\, but may have actually increased one of the individual losses.\n \n In this session\, we will give a quick introduction to the theory behind Jacobian descent\, and then show how to use TorchJD on a variety of use-cases\, beyond multi-task learning.\n \n Library: https://github.com/TorchJD/torchjd\n Paper: https://arxiv.org/abs/2406.16232
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:49df846c46735e3d7f39f6b6e2597d88
URL:http://pytorchconferenceeu2026.sched.com/event/49df846c46735e3d7f39f6b6e2597d88
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T100000Z
DTEND:20260407T102500Z
SUMMARY:Parameterized CUDA Graph Launch in PyTorch: CUDA Graphs Without the Pain - Daniel Galvez\, NVIDIA
DESCRIPTION:Modern GPUs are fast enough that CPU kernel launch overhead has become a real bottleneck. CUDA Graphs can eliminate this overhead\, but in practice they are hard to use and easy to get wrong.\n \n When CUDA Graph capture fails\, PyTorch users typically face two choices: fix the code that breaks capture—often with limited guidance—or capture only parts of the workload. Partial capture comes with sharp footguns\, most notably large increases in device memory usage due to CUDA Graphs’ private memory pools.\n \n This talk walks through the most common CUDA Graph capture failures seen in real PyTorch workloads and shows how to diagnose and fix them. It then presents an alternative to CUDA Graph Trees: Parameterized CUDA Graph launch\, which automatically applies CUDA Graphs to only the compatible regions of a workload. All you need to do is make your workload compatible with torch.compile(). This enables CUDA Graph acceleration with minimal user effort and without increased memory usage.\n \n Using this approach\, llama3.1-70B in torchtitan runs with only a 2 GB memory increase over a non-graph baseline\, compared to ~10 GB using traditional CUDA Graph techniques.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:96e2328c80deef96cd89cf2b5d51fceb
URL:http://pytorchconferenceeu2026.sched.com/event/96e2328c80deef96cd89cf2b5d51fceb
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T100000Z
DTEND:20260407T102500Z
SUMMARY:Write Once\, Run Everywhere with Pytorch Transformers - Pedro Cuenca\, Hugging Face
DESCRIPTION:The Hugging Face transformers library is built on pure PyTorch and can be succinctly described as a model-definition framework. It provides an unified\, familiar\, clear and concise interface to multiple machine learning architectures across modalities.\n\nServing and inference optimizations are not its focus.\n\nHowever\, transformers model definitions become the de-facto reference implementations multiple other projects use. This includes training libraries\, fast deployment engines such as vLLM and SGLang\, and on-device libraries like MLX and llama.cpp.\n\nThis session describes the path towards increasingly simpler downstream integration of transformers models into inference and deployment libraries\, and how transformers and PyTorch core features enable the ecosystem to enjoy newly-released models as soon as they are released.\n\nWe'll go through the journey towards easier modeling\, which implies easier downstream porting and adaptation. The end-game is pure interoperability\, where no code changes are required! This is now possible with vLLM and SGLang\, and we'll show how. We'll end up discussing our ideas on upcoming interop features with MLX and llama.cpp.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:c3c769032549ea9c1c6adc4a91a11de2
URL:http://pytorchconferenceeu2026.sched.com/event/c3c769032549ea9c1c6adc4a91a11de2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T100000Z
DTEND:20260407T101000Z
SUMMARY:Lightning Talk: Ethical\, Privacy and Sustainability Considerations in PyTorch Systems - Paula Mesa Macias\, Pau&Company
DESCRIPTION:PyTorch models are part of larger systems that handle data\, logs\, APIs and other services. Ethical\, privacy\, security and environmental considerations appear not only around the AI itself\, but across the whole system.\n Using the Ethical Software Framework and the Ethical IT Audit\, this session explores practical ways to think about these issues in real workflows. It highlights situations where decisions in data handling\, model deployment\, logging or infrastructure have ethical\, compliance or sustainability implications. It also shows considerations for using AI responsibly\, such as dataset choices\, bias awareness and evaluating risks before deployment. \n The goal is to provide a clear\, structured lens for identifying risks and trade-offs\, making ethical\, privacy\, security\, and sustainability concerns easier to discuss in everyday work.
CATEGORIES:RESPONSIBLE AI & COMPLIANCE
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:c284f3c38eccb5c3443044d1cbd15d62
URL:http://pytorchconferenceeu2026.sched.com/event/c284f3c38eccb5c3443044d1cbd15d62
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T100000Z
DTEND:20260407T101000Z
SUMMARY:Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy - Ankita Luthra & Trinadh Kotturu\, Google
DESCRIPTION:As PyTorch models scale to billions of parameters\, the bottleneck has quietly shifted from compute to storage. Modern GPU clusters often sit idle\, "starving" for data while waiting on legacy REST-based protocols. This talk introduces Rapid Storage: a fundamental architectural shift bringing Google’s Colossus stateful protocol (that powers many Google’s products) to PyTorch via fsspec \, a common Pythonic file interface used by many frameworks within PyTorch ecosystem.\nBy bypassing REST APIs entirely via persistent gRPC streams to the storage layer\, we eliminate protocol overhead. In this talk\, we also dive into how Rapid achieves &lt\;1ms random read/write latency\, 20x faster data access\, and a massive 6 TB/s of aggregate throughput. Crucially\, it delivers up to 10x lower tail latency for random I/O\, preventing the stragglers that often stall distributed training jobs.\nBeyond raw speed\, we will deconstruct the integration with gcsfs and the broader fsspec ecosystem. This ensures that high-performance I/O is available across the entire data stack including Dask\, Ray\, HF Datasets and vLLM etc. Join us to learn how to stop wasting GPU cycles and achieve linear scaling in the cloud.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:99d92b9edad02446b1c60f22e5c7d583
URL:http://pytorchconferenceeu2026.sched.com/event/99d92b9edad02446b1c60f22e5c7d583
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T101500Z
DTEND:20260407T102500Z
SUMMARY:Lightning Talk: FlexAttention + FlashAttention-4: Fast and Flexible - Driss Guessous\, Meta
DESCRIPTION:FlexAttention democratized attention research by letting researchers prototype custom attention variants in PyTorch without hand-written CUDA. Over 1\,000 repos have adopted it\, and dozens of papers cite it. But flexibility came at a cost: FlexAttention achieved only ~60% of FlashAttention-3's throughput on Hopper\, and the gap widened dramatically on Blackwell GPUs. We bridged this gap by integrating FlexAttention with FlashAttention-4\, the new CuTeDSL-based implementation optimized for Blackwell's async pipelines and tensor memory. PyTorch's Inductor now generates CuTeDSL score/mask modifications directly\, enabling JIT instantiation of FA4 for arbitrary attention variants. Results: 1.2–3.2× speedups over the Triton backend on compute-bound workloads. On B200\, patterns like ALiBi\, document masking\, and sliding window see up to 2.7× forward and 3× backward speedups. On Hopper\, gains range from 1.3–2× across all sequence lengths. This talk covers the technical integration: how Inductor lowers score mods to CuTeDSL\, how FA4's warp-specialized kernel accommodates block-sparse iteration\, and practical considerations for users adopting the Flash backend today.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:16f0d96ef956f5cc4505a9812ce3852b
URL:http://pytorchconferenceeu2026.sched.com/event/16f0d96ef956f5cc4505a9812ce3852b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T102500Z
DTEND:20260407T115500Z
SUMMARY:Attendee Lunch
DESCRIPTION:Menu | Boxed Lunches:\n\nVegan: (Vegetarian)\n-Marocaintaboule\n-Indian vegetable wrap with sesame oil and tandoori spices\n-Chocolate Chip Cookie\n\nGluten-Free: \n-Bowl Niçoise salad (350 g)\n-Potatoes\, green beans\, cherry tomatoes\, tuna\, black olives\, iceberg lettuce\,eggs\, chopped red onions\n-Chocolate cookie\n\nClassic:\n-Bird's tongue pasta salad with baby vegetables (Vegetarian)\nRound baguette sandwich with sliced chicken and avocado\nOr\nTuna mimosa burger with red onions and baby spinach\nOr\nSemolina burger\, tzatziki\, falafel (Vegetarian)\nChocolate and hazelnut muffin &nbsp\;(Vegetarian)
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:440cde70acc69d76c9a4e9004399a14a
URL:http://pytorchconferenceeu2026.sched.com/event/440cde70acc69d76c9a4e9004399a14a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T102500Z
DTEND:20260407T115500Z
SUMMARY:Women & Non-Binary in PyTorch Lunch
DESCRIPTION:We’d like to invite all attendees who identify as women or non-binary to join each other for a networking lunch at the event. We will begin with a brief introduction and then attendees will be free to enjoy lunch and mingle with one another. All attendees must identify as a woman or non-binary and must be registered for the conference to attend.\n\nMenu\n-Burrata with basil pesto (Vegetarian\, Gluten Free) \n-Trecce di mozzarella di bufala (Vegetarian\, Gluten Free) \n-Charcuterie plates (prosciutto\, mortadella\, fennel or truffle sausage\, speck) (Gluten Free) \n-Soft focaccia (Vegetarian\, Vegan) \n-Pizza Al Taglio - Vegetarian\n-Seasonal salad (Vegetarian\, Gluten Free\, Vegan) \n-Risotto dello Chef (Vegetarian\, Gluten Free) + Vegan Option\n \n-Tiramisù (Gluten Free\, Vegetarian) \n-Panna Cotta (Vegan) \n\n-Water\, soft drinks\, and coffee/tea\n-Flavored water bottles
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Biblioteca Room at La Felicità\, 5 Parv. Alan Turing\, 75013 Paris\, France
SEQUENCE:0
UID:65766be4eeb1f22f28f9f5e0ffd00739
URL:http://pytorchconferenceeu2026.sched.com/event/65766be4eeb1f22f28f9f5e0ffd00739
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T114500Z
DTEND:20260407T121000Z
SUMMARY:Bringing ExecuTorch To the Next Frontiers of Edge AI - Mergen Nachin\, Meta
DESCRIPTION:Since the General Availability release of ExecuTorch 1.0 in October 2025\, our team has continued to advance the state of the on-device AI software stack. In this talk\, we will share our upcoming roadmap and present demos that highlight ExecuTorch’s deployment across the next frontiers\, such as AI PCs\, robotics\, TinyML devices\, and the integration of AI agents to improve productivity for on-device deployment.\n \n ExecuTorch is built on open source collaboration\, encouraging community adoption\, contributions from hardware partners\, and interoperability with other ecosystem libraries. We will discuss how these foundations set the stage for the next phase of edge AI with ExecuTorch.
CATEGORIES:APPLICATIONS & CASE STUDIES
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:0c1f5fd06b52f853cddfc227d91f0c86
URL:http://pytorchconferenceeu2026.sched.com/event/0c1f5fd06b52f853cddfc227d91f0c86
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T114500Z
DTEND:20260407T121000Z
SUMMARY:Teaching PyTorch To Read Your Worst PDFs With Docling - Mingxuan Zhao & Peter Staar\, IBM & Carol Chen\, Red Hat
DESCRIPTION:Building production RAG pipelines starts with a problem most teams underestimate: getting clean\, structured data out of real-world documents. PDFs lose table structure\, figures get separated from captions\, and multi-column layouts become unreadable. Before your PyTorch models even see your data\, crucial information is already lost.\n Docling is an open-source\, MIT-licensed document parsing library that uses PyTorch-based deep learning models to understand documents the way humans read them. It preserves hierarchy\, extracts structured data from tables and figures\, and supports over ten common file formats through a consistent API. Because everything runs locally\, it integrates cleanly into PyTorch-native workflows with low latency and no data leaving your infrastructure.\n In this talk\, I'll walk through Docling's PyTorch-powered architecture and show how to build document processing pipelines for RAG and other GenAI applications. I'll also share the architecture of real-world applications of Docling and how it has improved workflows. You'll leave with practical patterns for connecting Docling to your own PyTorch-based GenAI stack.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:7c4302085d5234958c7046eaf74124b2
URL:http://pytorchconferenceeu2026.sched.com/event/7c4302085d5234958c7046eaf74124b2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T114500Z
DTEND:20260407T121000Z
SUMMARY:Why WideEP Inference Needs Data-Parallel-Aware Scheduling - Maroon Ayoub\, IBM; Tyler Michael Smith\, Red Hat
DESCRIPTION:WideEP—wide expert parallelism fails not because experts are expensive\, but because routing ignores where state already lives. In PyTorch LLM serving with vLLM\, WideEP fans tokens across many experts while KV caches accumulate unevenly across data-parallel replicas. When routing is unaware of KV placement and per-replica load\, requests land on replicas that cannot reuse cache or make progress efficiently and latency spikes as expert fan-out grows.\n The fix is not reshaping expert parallelism\, but making routing data-parallel aware using signals vLLM already exposes. In this talk\, we show how llm-d extends its router to leverage KV-cache locality and load awareness when routing WideEP flows. Rather than treating replicas as interchangeable\, the router prefers replicas with warm KV state and available capacity\, aligning routing decisions with vLLM’s execution reality and reducing cache fragmentation.\n This session walks through how KV-aware\, data-parallel routing changes WideEP inference in practice: which signals matter\, how routing behavior evolves\, and where the gains come from. Attendees leave with a clear mental model for when KV- and load-aware routing unlocks higher throughput.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:fc9773f3fdc75545d1176c438e86aa32
URL:http://pytorchconferenceeu2026.sched.com/event/fc9773f3fdc75545d1176c438e86aa32
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T114500Z
DTEND:20260407T115500Z
SUMMARY:Lightning Talk: From Pretrained To Personal: Privacy-First Fine-Tuning on AI PCs - Daniel Holanda Noronha & Iswarya Alex\, AMD
DESCRIPTION:Pytorch on AI PCs crossed a threshold: local hardware can now support meaningful model fine-tuning\, not just inference. This unlocks a new class of enterprise workflows where sensitive data never leaves the device\, yet models can still be personalized and adapted using PyTorch.\n \n In this session\, we’ll show how to design on-device fine-tuning pipelines for AI PCs\, focusing on enterprise scenarios where privacy is non-negotiable: regulated healthcare data\, government and public-sector workloads\, financial services\, and proprietary enterprise systems. We’ll walk through key decisions such as selecting efficient pre-trained models\, and how the right PyTorch optimizations enable effective personalization on large private datasets.\n \n We'll also showcase practical fine-tuning techniques such as supervised fine-tuning (SFT)\, LoRA\, and QLoRA\, and show how mixed-precision training and correct use of training vs. evaluation modes make these approaches efficient and practical on AI PCs while preserving privacy. The result is a cloud-free\, privacy-first fine-tuning blueprint that turns AI PCs into secure personalization engines for enterprise AI.
CATEGORIES:SECURITY & PRIVACY
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:7be87e67e2acb461e7ca3464fc67d323
URL:http://pytorchconferenceeu2026.sched.com/event/7be87e67e2acb461e7ca3464fc67d323
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T121500Z
DTEND:20260407T124000Z
SUMMARY:Sponsored Session: TorchTPU: Expanding TPU Programmability to Pytorch - Kat Ko & Claudio Basile\, Google; Jana van Greunen\, Meta
DESCRIPTION:Google Tensor Processing Units (TPUs) are designed for ML at massive scale\, offering significant benefits in performance\, energy\, and cost. While TPUs have historically been associated with the TensorFlow and JAX ecosystems\, we introduce TorchTPU: a new Google effort to expand TPU programmability to PyTorch.\n\nThis talk charts TorchTPU’s evolution\, from the initial RFC to establishing a native\, eager-first PyTorch backend. We will outline the core technical challenges overcome during this transition—particularly the complexities of translating dynamic\, eager execution into highly optimized TPU computations.\n\nWe’ll highlight current milestones\, including native integration with torch.compile\, DTensor\, and robust support for the latest Ironwood (TPU v7) architecture. These advancements collectively enable multi-billion parameter models to run on TPUs with minimal code changes\, while retaining the ability for users to apply model-specific optimizations (e.g.\, custom kernels\, quantization\, sharding) to reach peak performance. Finally\, we’ll provide a sneak peek at our roadmap for 2026.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:86698aa4c89b893c92dfd6cefebf2f0a
URL:http://pytorchconferenceeu2026.sched.com/event/86698aa4c89b893c92dfd6cefebf2f0a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T121500Z
DTEND:20260407T122500Z
SUMMARY:Lightning Talk: Accelerating On-Device ML Inference With ExecuTorch and Arm SME2 - Jason Zhu\, Arm
DESCRIPTION:As on-device AI workloads grow in complexity\, achieving low-latency inference within mobile power constraints remains a central challenge. We examine how ExecuTorch\, combined with Arm’s Scalable Matrix Extension 2 (SME2)\, enables efficient CPU deployments of production AI workloads. We present a case study of SqueezeSAM\, a segmentation model deployed in real-world mobile applications. Using ExecuTorch with XNNPACK delegation and SME2-optimized kernels\, we evaluate INT8 and FP16 inference on a flagship smartphone. Moving beyond aggregate latency\, we apply operator-level profiling to decompose runtime across convolution\, GEMM\, elementwise\, and data movement operators\, showing how hardware acceleration reshapes bottlenecks in the execution stack. SME2 delivers up to 3.9x end-to-end speedup on a single CPU core\, materially altering runtime composition and revealing data movement as the primary post-acceleration bottleneck. This session presents a practical workflow for deploying\, profiling\, and systematically optimizing on-device PyTorch models\, demonstrating how SME2 expands the viable design space for interactive mobile AI.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:1256fd21e8b2c1242d8e9ef48ade6208
URL:http://pytorchconferenceeu2026.sched.com/event/1256fd21e8b2c1242d8e9ef48ade6208
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T121500Z
DTEND:20260407T124000Z
SUMMARY:The Token Slice: Implementing Preemptive Scheduling Via Chunked Decoding - Maroon Ayoub\, IBM & Kellen Swain\, Google
DESCRIPTION:Production LLM serving faces a critical trade-off: while continuous batching maximizes throughput\, it often sacrifices SLAs due to Head-of-Line (HoL) blocking. When long-context requests hijack the engine\, tail latencies spike. Without fine-grained preemption\, guaranteeing priority or fairness remains nearly impossible.\n \n We propose a solution: Chunked Decoding. By treating a fixed number of tokens as a "time slice\," we bring 50 years of OS scheduling wisdom to inference. This technique decouples generation from completion\, enabling a preemptive multitasking environment for LLMs.\n \n In this talk\, we present a sidecar implementation for PyTorch-based servers (like vLLM) that orchestrates decoding in manageable chunks. This allows the system to pause\, hold\, or swap requests mid-stream without discarding the KV cache. We will share early evaluation results\, discussing how varying chunk sizes impact priority handling and tail latency. Attendees will learn how a sidecar approach enables sophisticated scheduling while keeping the core engine lean—offering a blueprint for integrating preemptive scheduling into the next generation of model servers.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:3d44679d1e85fbc49b779fe5e35fb02f
URL:http://pytorchconferenceeu2026.sched.com/event/3d44679d1e85fbc49b779fe5e35fb02f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T123000Z
DTEND:20260407T124000Z
SUMMARY:Lightning Talk: Combo Kernels: Horizontal Fusion Optimization in Torch.compile - Karthick Panner Selvam\, & Elias Ellison\, Meta
DESCRIPTION:Combo kernels are a compiler optimization in PyTorch Inductor that horizontally fuses multiple independent operations into a single Triton kernel launch\, reducing GPU kernel launch overhead and improving memory locality.\n \n The Problem: Models generate many small\, independent operations like weight preprocessing and tensor copies. Each launch incurs overhead. For models with many such operations\, this becomes a bottleneck.\n \n The Solution: Combo kernels combine multiple operations into one kernel using a dispatch mechanism. A single program ID routes execution to the appropriate subkernel based on cumulative block boundaries. This eliminates redundant launches while preserving correctness.\n \n Key Innovations:\n \n Per-subkernel block dimensions: Each subkernel gets its own optimized block size instead of sharing one size across all\, enabling better autotuning.\n \n Flattened grid dispatch: We collapse the multi-dimensional block grid into a single dimension.\n \n Results: On H100 GPUs\, combo kernels deliver geomean speedups of +7.38% for HuggingFace\, and +5.97% for TorchBench. The optimization is enabled by default in the vLLM repository for LLM inference acceleration.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:2fdbee5756f9b8ff68a3a6c0d928d642
URL:http://pytorchconferenceeu2026.sched.com/event/2fdbee5756f9b8ff68a3a6c0d928d642
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T124500Z
DTEND:20260407T131000Z
SUMMARY:Brevitas Quantization Library - Pablo Monteagudo Lago\, AMD
DESCRIPTION:Brevitas is an open‑source PyTorch library from AMD designed to support the research of state‑of‑the‑art quantization methods\, including Qronos (ICLR 2026) and MixQuant (arXiv). Built for flexibility and composability\, it offers modular components for exploring reduced‑precision data paths and accuracy‑preserving techniques.\n As generative models scale\, post‑training quantization (PTQ) has become the preferred strategy for maintaining quality without retraining\, yet PTQ methods are often applied in isolation due to fragmented tooling. Brevitas provides a unified environment for modern PTQ algorithms—including Qronos\, SpinQuant and AutoRound—enabling practitioners to combine complementary techniques effectively.\n Brevitas leverages the latest PyTorch features\, like Dynamo for tracing and selectively modifying compute graphs—for example\, by inserting rotation ops to mitigate outliers. It integrates with frameworks like transformers and supports export flows including vLLM and GGUF\, ensuring a smooth transition from experimentation to deployment.\n This talk shows how to use Brevitas for an end‑to‑end quantization flow\, showcasing how its flexibility enables new research directions.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:6b49f49f3d02a5d5c94af4da5e54a001
URL:http://pytorchconferenceeu2026.sched.com/event/6b49f49f3d02a5d5c94af4da5e54a001
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T124500Z
DTEND:20260407T125500Z
SUMMARY:Lightning Talk: Implementing Single-Dim Strategies With Sharding Validator - Anshul Sinha\, Meta
DESCRIPTION:DTensor sharding propagation is a major bottleneck to full operator coverage: adding or fixing an op strategy is complex\, bug‑prone\, and gaps often surface as unexpected resharding and extra collectives. A key source of complexity is that today’s rules conflate (1) semantic correctness—valid input/output sharding combinations for an operator—with (2) search‑space pruning to avoid combinatorial blowups on N‑dimensional meshes.\n \n This talk presents a landed prototype that separates these concerns via Single Mesh Dim Strategies: each operator specifies valid placement combinations for one mesh dimension\, while infra expands/composes them across the full mesh and selects low‑cost strategies. For contributors\, this provides a clear path to refactor existing op_strategies into single‑dim rules that are easier to review and extend. We also introduce a Truth Table‑style sharding validator that systematically tests shapes and sharding specs to check soundness/completeness and to flag unnecessary redistribution/collectives caused by missing cases. \n \n The goal of this presentation is faster\, higher‑confidence contributions that improve correctness and expand DTensor operator coverage.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:29f08dca48e5164763d70e9758f1398f
URL:http://pytorchconferenceeu2026.sched.com/event/29f08dca48e5164763d70e9758f1398f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T124500Z
DTEND:20260407T131000Z
SUMMARY:Model-Changing Transforms With Torch.compile - Thomas Viehmann\, Lightning AI
DESCRIPTION:torch.compile is the goto mechanism to increase performance of PyTorch models of all shapes and forms.\n \n While it is widely understood how to change the computation by manipulating the FX trace representation\, it becomes a much more general tool by also transforming model and input expectations (the guards):\n This enables model-changing transformations like quantization and distributed without needing to adapt the model to it.\n \n We take a deep dive into the torch.compile internals to see what's going on under the hood and how we can hook into the gears to enable distributed (starting from a single-GPU model) and quantization.\n In this quest\, marvel at the interplay between PyTorch's Python code\, the Pyton interpreter and PyTorch's C++ code that enable the Dynamo frontend of torch.compile and then use a big hammer to use it in unexpected ways. Building on our experience with Lightning Thunder\, an experimental compiler for PyTorch models\, we propose a transform mechanism taking care of compute\, model\, and weights.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:314f181f63d7540d7f475d41a3f9184e
URL:http://pytorchconferenceeu2026.sched.com/event/314f181f63d7540d7f475d41a3f9184e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T124500Z
DTEND:20260407T131000Z
SUMMARY:The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus\, NVIDIA
DESCRIPTION:Rapid advances in AI have expanded the range of capabilities required for successful real-world deployment. Understanding where we are in this multi-dimensional frontier is essential for accelerating innovation through effective quality assurance. Rigorous evaluation is increasingly difficult to scale as development requires testing many checkpoints across numerous benchmarks. Model comparison is further complicated by limited transparency of reported results. This talk explores challenges\, best practices\, and open-source tools that elevate evaluation to a core component of LLM development\, delivering continuous signals across the model lifecycle.\n We discuss principles for standardizing evaluation methods and improving consistency through practical patterns and anti-patterns\, and examples of integrating the science of evaluation directly into model development. Using Nemo-Evaluator\, an open-source scalable evaluation tool\, we demonstrate modular architectures that enable transparent\, reproducible measurement. Finally\, we show how Nemo-Evaluator supports reproducible evaluation for the Nemotron model family\, helping enable one of the most open development processes in modern AI.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:959bb2cb9de38f49223dd6535c39ff28
URL:http://pytorchconferenceeu2026.sched.com/event/959bb2cb9de38f49223dd6535c39ff28
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T130000Z
DTEND:20260407T131000Z
SUMMARY:Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen\, Karlsruhe Institute of Technology
DESCRIPTION:Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized\, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures.\n Moreover\, existing tensor parallelism strategies typically replicate input data across all processes\, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting\, medical imaging\, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization.\n Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations\, convolutions\, linear layers\, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP\, FSDP\, and Megatron-LM approaches.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:8d5e110714f752a73a65a05fbbb84995
URL:http://pytorchconferenceeu2026.sched.com/event/8d5e110714f752a73a65a05fbbb84995
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T131000Z
DTEND:20260407T134000Z
SUMMARY:Coffee Break
DESCRIPTION:Menu:&nbsp\;\n-Chocolate cake\n-Red frutis yogurt (and its wooden spoon) (Gluten Free\, Vegetarian)\n-Seasonal fruits (GF\, Vegan) \n-Hummus and vegetable brioche roll\n-Dry fruits and dry grapes mix\n-Chocolate Cookie (GF\, Vegan)
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:2cf4c3bd046c3d41856d383ff52b70bb
URL:http://pytorchconferenceeu2026.sched.com/event/2cf4c3bd046c3d41856d383ff52b70bb
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T131000Z
DTEND:20260407T134000Z
SUMMARY:Meet the Developers of Helion
DESCRIPTION:This session offers a unique opportunity to connect with the core developers of Helion (https://github.com/pytorch/helion)—ask questions\, share feedback\, and explore collaboration opportunities with the team.\n \n About Helion\n At PTC 2025\, we launched Helion (in Beta)\, a PyTorch-native kernel authoring DSL designed to deliver portable performance across heterogeneous hardware. Since then\, Helion has outperformed expert-tuned Triton and CuTe DSL kernels and seen meaningful adoption across research labs\, production teams\, and OSS frameworks like vLLM.\n \n At PyTorch Conference Europe 2026\, we are excited to announce Helion 1.0 (General Availability). Join us to learn how Helion works under the hood and discover what's new in the GA release.\n \n Core Developers\n Jason Ansel: Research Scientist\, creator of PyTorch Compiler and Helion\n Oguz Ulgen: Software Engineer\, creator of PyTorch Compiler cache\, working on Helion\n Will Feng: Software Engineer working on TorchInductor and Helion\n Markus Hoehnerbach: Software Engineer focusing on Helion development and kernel authoring\n \n Drop in for an informal discussion\, share your experiences\, and explore opportunities to collaborate with the team!
CATEGORIES:MEET THE DEVELOPERS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:ce21612129fa51f735f1cb48a0d305ba
URL:http://pytorchconferenceeu2026.sched.com/event/ce21612129fa51f735f1cb48a0d305ba
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T134000Z
DTEND:20260407T140500Z
SUMMARY:Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams - Michael Lazos\, Meta
DESCRIPTION:CUDA streams are a widely-used method for parallelizing GPU computation on NVIDIA GPUs. They have long been requested by our users and enable multiple key capabilities - overlapping communication and compute kernels\, training on multiple batches in parallel and parallelizing kernels\, all of which are needed for achieving SOTA training performance. Another key capability is activation offloading - this can be applied to any model to prevent OOMs by asynchronously storing activations in cpu memory until they are needed by the model. \n \n Before this work\, torch.compile previously would graph break on CUDA stream contexts\, which can be costly for models that utilize streams. Although workarounds exist (e.g. wrapping stream manipulation into custom ops)\, these solutions add complexity and create friction in the user experience. By enabling seamless CUDA stream support in PT2\, we allow our users to leverage the familiar eager APIs for stream assignment and synchronization directly within torch.compile. This not only simplifies the workflow but also ensures that models using custom streaming patterns can run efficiently out-of-the-box without manual intervention or code restructuring.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:853b99748b21cf7dc84240b5a2714115
URL:http://pytorchconferenceeu2026.sched.com/event/853b99748b21cf7dc84240b5a2714115
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T134000Z
DTEND:20260407T135000Z
SUMMARY:Lightning Talk: Graph Based Pipeline Parallelism - Sanket Purandare\, Meta & Simon Fan\, Meta PyTorch
DESCRIPTION:Pipeline parallelism is vital for large models\, but advanced schedules for SOTA LLMs are difficult to express in current PyTorch. MoE communication dominates the critical path\, making latency hiding essential. Leading systems use fw-bw overlapping\; fw-fw and bw-bw overlapping further boost throughput.\n \n Schedules like ZeroBubbleV and DualPipeV rely on dI-dW backward splitting for fine-grained overlap. However\, eager-mode implementations require a patchwork of fragile integrations (multi-threading\, custom autograd functions\, activation checkpointing\, etc.) that rely on implicit behavior and hand-written logic with poor torch.compile compatibility and upstream composability. \n \n We present Graph-Based PP: stages are compiled to reusable FX graphs executed via an explicit schedule language. Users write standard PyTorch code while specifying schedules at varying granularity\; all manipulations run as graph passes\, abstracting complexity away from user code and into the compiler/runtime\, allowing for greater composability.\n \n We have integrated Graph-PP into TorchTitan and AutoParallel on real MoE workloads\, targeting upstream inclusion in torch.distributed.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:853e7eaca941b189e91771295f46a1d7
URL:http://pytorchconferenceeu2026.sched.com/event/853e7eaca941b189e91771295f46a1d7
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T134000Z
DTEND:20260407T140500Z
SUMMARY:torch.compile and Diffusers: A Hands-On Guide to Peak Performance - Sayak Paul\, Hugging Face
DESCRIPTION:This session shows how to use torch.compile with the Diffusers library to speed up diffusion models like Flux-1-Dev.\n\nYou'll learn practical techniques for both model authors and users. For authors\, we cover how to make models compiler-friendly using fullgraph=True. For users\, we explain regional compilation (which cuts compile time by 7x while keeping the same runtime gains) and how to avoid recompilations with dynamic=True.\n\nWe also cover real-world scenarios: running on memory-constrained GPUs using CPU offloading and quantization\, and swapping LoRA adapters without triggering recompilation.\n\nKey takeaways:\n- Compiling just the Diffusion Transformer (DiT) delivers ~1.5x speedup on H100\n- Regional compilation reduces cold-start compile time from 67s to 9.6s\n- NF4 quantization cuts memory from 33GB to 15GB\n- Combining quantization + offloading drops memory to 12.2GB\n- LoRA hot-swap lets you switch adapters without recompiling\n\nWhether you're building diffusion models or using them\, this guide helps you get the best performance with minimal effort.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:b747b481c160b5f353c7639079de4f57
URL:http://pytorchconferenceeu2026.sched.com/event/b747b481c160b5f353c7639079de4f57
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T134000Z
DTEND:20260407T135000Z
SUMMARY:Lightning Talk: Cross-Region Model Serving: PyTorch Inference\, Observability & LLMOps - Suraj Muraleedharan\, Amazon Web Services
DESCRIPTION:As PyTorch models move to production\, organizations face a critical challenge: deploying\, monitoring\, and operating inference at scale across multiple regions. Single-region serving is well-understood\, but multi-region LLMOps—model distribution\, observability\, failover\, and cost management—remains ad-hoc and challenging for multiple customers.\n \n This session presents production-tested architectures for multi-region PyTorch inference and LLMOps workflows. We cover:\n \n Serving: Multi-region TorchServe/KServe on Kubernetes with latency-based routing\, blue-green deployments\, model versioning\, and automated failover with circuit breakers.\n \n Observability: OpenTelemetry distributed tracing\, Prometheus/Grafana dashboards for latency\, throughput\, GPU utilization\, and LLM-specific metrics like time-to-first-token and KV-cache hit rate.\n \n LLMOps: CI/CD pipelines for cross-region model deployment with automated rollback\, drift detection\, and SLO-based alerting.\n \n Attendees leave with serving architectures\, dashboards\, and deployment pipelines using open-source tooling.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:b19cdf7637d0c3e9084acde62cb749e7
URL:http://pytorchconferenceeu2026.sched.com/event/b19cdf7637d0c3e9084acde62cb749e7
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T135500Z
DTEND:20260407T140500Z
SUMMARY:Lightning Talk: Running ExecuTorch Applications With Silicon Acceleration\, in Ultra-low Power - George Gekov\, Arm; Aki Makkonen\, Alif Semiconductor
DESCRIPTION:Efficient deployment of ML models on low-power embedded systems has been a significant challenge for a number of years. At the same time\, these embedded SoCs are all around us—from everyday appliances to the latest smart glasses.\n \n ExecuTorch is a PyTorch-native framework for deploying neural networks on resource-constrained systems. In this session\, we show how to build an end-to-end speech recognition application using PyTorch and ExecuTorch—from training a Transformer-based neural network in PyTorch\, through quantization\, all the way to deployment on a low-power embedded device.\n \n We will introduce the key ExecuTorch APIs for quantization and explain how models are transformed and lowered into a form that can run efficiently on a device. The application is running on the Alif Ensemble E8 SoC\, the first implementation of the leading Arm® Ethos-U85 NPU which brings native support for Transformer models to the ultra-low power domain. \n \n Join the experts from Arm and Alif Semiconductor to see how we are bridging the gap between PyTorch and embedded deployment—and how you can bring PyTorch models to silicon-accelerated\, ultra-low-power systems.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:e13c82538386e8fdc25f1a95fb06c998
URL:http://pytorchconferenceeu2026.sched.com/event/e13c82538386e8fdc25f1a95fb06c998
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T135500Z
DTEND:20260407T140500Z
SUMMARY:Lightning Talk: Beyond Generic Spans: Distributed Tracing for Actionable LLM Observability - Sally O'Malley & Greg Pereira\, Red Hat
DESCRIPTION:End-to-end observability is non-negotiable for production LLMs to track performance\, attribute costs\, and validate optimizations. Generating actionable traces from complex distributed inference remains a significant challenge.\n \n We implemented tracing for llm-d\, a high-performance distributed LLM inference framework. Using manual OpenTelemetry instrumentation with carefully crafted spans at critical paths\, we expose insights that generic tooling can't capture.\n \n This talk explores how distributed tracing illuminates requests through unique inference scenarios:\n \n * Prefix cache-aware routing: Track cache hits and validate whether intelligent scheduling improves TTFT\n * Prefill/decode disaggregation: Analyze why each request chose split vs unified processing based on cache locality.\n * Wide expert-parallelism: Profile MoE models across multi-node deployments \n * Workload autoscaling: Correlate request patterns with scaling decisions\n \n Attendees will learn why LLMOps requires a new approach to distributed tracing\, contrasting it with traditional microservices\, and how to instrument inference stacks effectively. Walk away ready to add meaningful observability to your own deployments.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:aeca8eebe80cfa7ddc5aef2b58412362
URL:http://pytorchconferenceeu2026.sched.com/event/aeca8eebe80cfa7ddc5aef2b58412362
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T141000Z
DTEND:20260407T143500Z
SUMMARY:Build PyTorch to Understand PyTorch - Vijay Janapa Reddi\, Harvard University; Andrea Mattia Garavagno\, University of Genoa
DESCRIPTION:PyTorch's success depends on more than users—it needs engineers who understand what's inside. Engineers who can debug framework issues\, optimize at the systems level\, contribute upstream\, and build what comes next. But ML education today produces practitioners who call APIs without understanding them. They train models without knowing why Adam needs 3× the memory of SGD\, or what happens when they call loss.backward().\n\nTinyTorch is a 20-module open-source curriculum that closes this gap. Students construct PyTorch's core components—tensors\, autograd\, optimizers\, CNNs\, transformers—in pure Python\, building a complete framework where every operation is code they wrote. By the final module\, they don't just use PyTorch\; they understand how to build it.\n\nThe curriculum uses progressive disclosure\, systems-first profiling from Module 01\, and build-to-validate milestones—recreating ML breakthroughs from Perceptron (1958) through Transformers (2017)\, culminating in MLPerf-style benchmarking.\n\nTinyTorch is how we grow the next generation of PyTorch contributors and the engineers who will build what comes after.\n\nOpen source: mlsysbook.ai/tinytorch
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:a7d2b7a9b8ba40aab9cfb42f3bcfc00a
URL:http://pytorchconferenceeu2026.sched.com/event/a7d2b7a9b8ba40aab9cfb42f3bcfc00a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T141000Z
DTEND:20260407T143500Z
SUMMARY:On-Device LLM Inference on Android With ExecuTorch and Qualcomm QNN - Shivay Lamba & Kartikey Rawat\, Qualcomm
DESCRIPTION:Multimodal models like CLIP are typically deployed in the cloud due to their size and computational demands\, limiting their use in latency-sensitive\, privacy-preserving\, and offline-first applications. This talk demonstrates how one can run fully on-device CLIP inference on Android using ExecuTorch with the Qualcomm QNN backend\, enabling real-time vision–language understanding without server dependency. One can run models like CLIP (ViT-B/32) model entirely on edge devices\, leveraging QNN for hardware-accelerated inference. A key focus of the talk is a deep dive into ExecuTorch optimizations for QNN\, including graph lowering\, operator fusion\, quantization strategies\, memory planning\, and backend-specific execution choices that materially impact latency\, memory footprint\, and power consumption. The talk will cover architectural insights\, model export and compilation workflows\, and real-world benchmarks covering latency\, memory usage\, and power efficiency. This talk highlights how large multimodal PyTorch models can be made production-ready on edge devices\, unlocking new classes of private\, offline-capable AI applications.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:1a388b6b4356fc571cac5bfe09847702
URL:http://pytorchconferenceeu2026.sched.com/event/1a388b6b4356fc571cac5bfe09847702
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T141000Z
DTEND:20260407T143500Z
SUMMARY:Optimizing Reinforcement Learning at Trillion-Parameter Scale - Songlin Jiang\, Aalto University & Mind Lab
DESCRIPTION:This talk will dive into how we implemented and optimized reinforcement learning on trillion-parameter Mixture-of-Experts reasoning models using veRL\, Megatron-Bridge and vLLM. The session is useful to anyone building large-scale RL training systems.\n\nFor the first part\, I will walk through the system design required to make RL work at this scale using LoRA: how LoRA adapters are implemented for expert layers\, how adapters are sharded and fused under tensor/pipeline/expert parallelism\, and most importantly\, how refit (parameter sync) is implemented for LoRA between training backend (Megatron) and rollout engine (vLLM).\n\nThe second part of the talk focuses on training–inference mismatch in MoE RL. I will explain why common mitigations such as clipping and importance sampling can fail\, and how we implement fixed Router Replay R3 across vLLM\, veRL\, and Megatron to align routing decisions between rollout and training.\n\nThese works are done together with Mind Lab and some of the related blog posts are at:\n- https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus\n- https://macaron.im/mindlab/research/router-replay-r3-why-it-failed-and-how-we-fixed-it
CATEGORIES:TRAINING SYSTEMS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:0abb97b4ffc5086a0c98b6355ce6f5f6
URL:http://pytorchconferenceeu2026.sched.com/event/0abb97b4ffc5086a0c98b6355ce6f5f6
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T141000Z
DTEND:20260407T143500Z
SUMMARY:TorchStore: What We Learned Building Distributed Storage Solutions for AysncRL - Lucas Pasqualin\, Danielle Pintz\, Allen Wang\, Amir Afzail Meta
DESCRIPTION:Asynchronous Reinforcement Learning (AsyncRL) workloads have unique data sharing requirements: actors must efficiently exchange large tensors across processes and nodes\, often with different sharding configurations—not just at checkpoint time\, but continuously during training for live weight synchronization. This talk presents Torchstore\, an open-source distributed tensor storage system built on Monarch actors that tackles these challenges. We'll share the key lessons learned—from designing pluggable transport backends (RDMA\, shared memory\, RPC) to implementing transparent live DTensor resharding that lets producers and consumers use entirely different parallelism strategies. We'll also discuss the friction we encountered integrating with inference engines like vLLM\, where differing model definitions and integrations present new bottlenecks. Whether you're building actor-based training systems or thinking about disaggregated training-inference architectures\, you'll leave with practical insights on distributed tensor storage design.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:0d9a88754d42a456ca03bf501f26312e
URL:http://pytorchconferenceeu2026.sched.com/event/0d9a88754d42a456ca03bf501f26312e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T144000Z
DTEND:20260407T150500Z
SUMMARY:Optimizing PyTorch on CPU-GPU Coherent Platforms - Matthias Jouanneaux\, Nvidia
DESCRIPTION:In recent years\, both Nvidia and AMD have introduced hardware coherent platforms: GH200\, GB200 and MI300A. These coherent platforms provide both many new features and challenges for PyTorch applications attempting to make the most out of the platform.\n This talk will focus on Nvidia's GB200 and walk through techniques to utilize the features of the coherent architecture in PyTorch\, such as the high CPU-GPU interconnect bandwidth\, unified memory\, as well as the advantages and caveats of sharing system memory between CPU and GPU.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:32161de6b5060fd56f09d2f3a2d53b7e
URL:http://pytorchconferenceeu2026.sched.com/event/32161de6b5060fd56f09d2f3a2d53b7e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T144000Z
DTEND:20260407T145000Z
SUMMARY:Lightning Talk: TerraKit: Standardising AI-Ready Geospatial Data Preparation for the TorchGeo Ecosystem - Rosie Lickorish & Romeo Kienzler\, IBM
DESCRIPTION:With the advent of geospatial foundation models\, unexplored use cases are emerging that require well-curated datasets. Currently\, no standardised approach exists for creating such AI-ready geospatial datasets. In this session\, we introduce TerraKit: a comprehensive open-source Python library for retrieving\, and processing geospatial data\, that seamlessly integrates with upstream geospatial model training libraries such as TorchGeo or TerraTorch.\n\nFrom raster/vector annotations\, TerraKit will match\, download\, process\, align and split the requested data source (e.g.\, EarthData\, CDSE\, Planetary Computer) based on user specifications provided by a simple configuration file. TerraKit also supports spatial train/val splits and exports datasets in standard formats such as TACO datasets. TerraKit streamlines the pipeline from raw EO data to AI-ready datasets\, accelerating the development of custom geospatial applications\, and ensuring query and processing pipelines are reproducible. By lowering the barrier to entry\, a wider community of TorchGeo and TerraTorch users are empowered to leverage foundation models for Earth observation.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:c160c17bd7edfde2a81e31302b277eb2
URL:http://pytorchconferenceeu2026.sched.com/event/c160c17bd7edfde2a81e31302b277eb2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T144000Z
DTEND:20260407T150500Z
SUMMARY:Securing Agentic AI With PyTorch: Threat Modeling & LLM Red Teaming in Practice - Valeri Milke\, VamiSec GmbH
DESCRIPTION:Agentic AI systems built with PyTorch introduce a new security paradigm: autonomous decision-making\, tool usage\, memory\, and multi-step reasoning significantly expand the attack surface beyond traditional ML pipelines. This session presents a practical\, security-first approach to building and testing agentic AI systems using PyTorch\, combining AI threat modeling and hands-on LLM security testing. We introduce MAESTRO-based AI Threat Modeling to systematically identify risks across prompts\, tools\, memory\, orchestration and model interactions. Building on this foundation\, we demonstrate how the OWASP LLM Top 10 and the OWASP LLM Testing Guide can be applied to real PyTorch-based agent architectures. The session includes a live demo of a prompt injection attack against an agentic workflow\, showing how task delegation and tool invocation can be abused — and how developers can detect\, mitigate and test these risks early in the AI development lifecycle. Attendees will leave with concrete techniques to integrate AI security testing and threat modeling into PyTorch-based systems\, bridging research\, engineering and real-world AI risk.
CATEGORIES:SECURITY & PRIVACY
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:4dfb7f9a7238375604bdbfd039b993c4
URL:http://pytorchconferenceeu2026.sched.com/event/4dfb7f9a7238375604bdbfd039b993c4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T145500Z
DTEND:20260407T150500Z
SUMMARY:Lightning Talk: Bayesian Neural Networks With Variational Inference in PyTorch - Lars Heyen\, Karlsruhe Instute of Technology\, Scientific Computing Center
DESCRIPTION:Uncertainty quantification is becoming more and more important as neural networks are used for increasingly critical tasks. Bayesian neural networks (BNNs) inherently provide a measure of their own uncertainty\, but can be either hard to implement or inflexible if one uses common frameworks. In this session I discuss how to efficiently implement BNNs using Variational Inference within PyTorch and present torch_blue\, a light-weight open source library that implements these methods with the goal of being easy to pick up\, yet flexible enough for research on BNNs.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:acef11b7b03829360c3d8488e09ed00c
URL:http://pytorchconferenceeu2026.sched.com/event/acef11b7b03829360c3d8488e09ed00c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T150500Z
DTEND:20260407T163000Z
SUMMARY:Flare Party
DESCRIPTION:Wrap up Day 1 of PyTorch Conference Europe 2026 at our official Flare Party. It’s the perfect opportunity to unwind\, network\, and keep the day’s momentum going.\n\nEnjoy complimentary beer\, wine\, and appetizers as you connect with speakers\, core contributors\, and fellow developers. Throughout the evening\, explore the Poster Sessions\, where presenters will be available for live Q&A to spark deeper technical conversations. Be sure to also visit our sponsor booths to discover new tools and pick up conference swag.\n\nJoin us to recharge and celebrate with the PyTorch community!\n\nMenu:\nTHE COLD PIECES\n-Cheese cone: comté and osso iraty\, salad\, grapes\, and nuts (Gluten Free\, Vegetarian) \n-Sausage cone\n-Vegetable chips cone (Vegan\, Vegetarian)\n-Bretzel bread (Vegan\, Vegetarian)\n\nTHE HOT PIECES\n-Parmesan cream and artichoke brioche tart with arugula (Vegetarian)\n-Watercress brioche tart (Vegetarian)\n-Green frittata with olives (Gluten Free\, Vegetarian) \n-Flammen truffle (Vegetarian)\n-Red lentil\, ginger\, and coconut milk soup (Vegan\, Vegetarian)\n\nWine\, Beer\, Soft Drinks
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:105f389df22414cc7777f6a9eddba594
URL:http://pytorchconferenceeu2026.sched.com/event/105f389df22414cc7777f6a9eddba594
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T150500Z
DTEND:20260407T163500Z
SUMMARY:Poster Presentations: Applications & Case Studies
DESCRIPTION:LegoLoaderX: a PyTorch DataLoader for Sparse Spatio-Temporal Data - Michelle Audirac\, Harvard UniversityStress State Estimation from Deformed Surface Images Using Deep Learning - Bakhtiyar Mammadli\, NOMATEN Centre of Excellence\, National Centre for Nuclear Research
CATEGORIES:POSTER PRESENTATIONS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:5d4ff3ef336652b333a9f6cb9abf0c34
URL:http://pytorchconferenceeu2026.sched.com/event/5d4ff3ef336652b333a9f6cb9abf0c34
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T150500Z
DTEND:20260407T163500Z
SUMMARY:Poster Presentations: Frameworks & Compilers
DESCRIPTION:Automatic Comm-Compute Overlap and Bucketing in torch.compile - Elias Ellison & Ivan Kobzarev\, MetaFlexible Custom Operators: custom ops with arbitrary inputs and outputs&nbsp\;- Angela Yi & Richard Zou\, MetaHow Your Code Becomes a Kernel - Harshita Varma\, Juspay\; Nikita Verma\, IndividualTorchCodec: The Easy and Efficient Media Decoding Library for PyTorch - Daniel Flores & Molly Xu\, MetaTorchDynamo Debugging Tools for Power Users&nbsp\;- William Wen\, MetaAccelerating GNN Workloads on ARM CPUs with PyTorch Geometric - Akash Agrawal\, Fujitsu Research of India\; Co-Authors: N Maajid Khan & Devang Choudhary\, Fujitsu Research of IndiaReaching SOTA Normalization Performance with torch.compile - &nbsp\;Paul Zhang & Shunting Zhang\, Meta\n
CATEGORIES:POSTER PRESENTATIONS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:6bc6dfa6fc4e234d2866d894b0299aa2
URL:http://pytorchconferenceeu2026.sched.com/event/6bc6dfa6fc4e234d2866d894b0299aa2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T150500Z
DTEND:20260407T163500Z
SUMMARY:Poster Presentations: GenAI & Multimodal
DESCRIPTION:Unifying Modalities: Building Efficient Video Flows with PyTorch and Diffusion Transformers - David Brewster\, Red Hat\n
CATEGORIES:POSTER PRESENTATIONS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:597b3e0a5e6cc060bb10722768883afc
URL:http://pytorchconferenceeu2026.sched.com/event/597b3e0a5e6cc060bb10722768883afc
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T150500Z
DTEND:20260407T163500Z
SUMMARY:Poster Presentations: Inference & Production
DESCRIPTION:A Tale of Two DSLs: A Comparative Study of vLLM GPU Performance with cuTile and CuTe DSL - Anil Vishnoi & Matthew Odden\, Red HatBringing BitNet to ExecuTorch via Vulkan - Marcus Edel & Vineet Suryan\, CollaboraBuilding Production-Grade PyTorch Inference Pipelines for 100K+ Heterogenous Devices - Samaresh Kumar Singh\, HP Inc.Feather: Software Emulated FP8 for Older GPUs - Suriyaa MM\, Indian Institute of Technology TirupatiModel Compression API for a fast and precise inference in ExecuTorch - Daniil Liakhov & Aamir Nazir\, IntelNear-Lossless MXFP4 Compression for Accelerated LLM Serving: Jointly Tuning Distribution Transforms - Felix Marty\, AMDRecursive Language Models (RLMs): Scaling to Infinite Context via Programmatic Decomposition - Rudraksh Karpe\, Simplismart\, Shivay Lamba\, QualcommWhen CPUs Win: Profiling Small Neural Network Inference Through PyTorch's Compiler and Runtime Stack&nbsp\;-&nbsp\;Dave Grove &&nbsp\;Olivier Tardieu\, IBM\; Co-Authors:&nbsp\;Vivek Mankar\,&nbsp\;Nethra Khandige\,&nbsp\;Pradipta Ghosh\, &&nbsp\;Anto Ajay Raj John\, IBM
CATEGORIES:POSTER PRESENTATIONS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:bc117de2aa429083e4e493b7e8bf0938
URL:http://pytorchconferenceeu2026.sched.com/event/bc117de2aa429083e4e493b7e8bf0938
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T150500Z
DTEND:20260407T163500Z
SUMMARY:Poster Presentations: Responsible AI & Compliance
DESCRIPTION:When Models Collaborate but Data Cannot: Explainable Ensemble Learning Under Privacy Constraints - Pavani Rajula\, NeuCorelytix Solutions LLP\n
CATEGORIES:POSTER PRESENTATIONS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:dcbbbde0f4d4e866e7cc39a29b6b4986
URL:http://pytorchconferenceeu2026.sched.com/event/dcbbbde0f4d4e866e7cc39a29b6b4986
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260407T163000Z
DTEND:20260407T190000Z
SUMMARY:Open Source AI Soirée hosted by Label Studio and Docling
DESCRIPTION:Join Label Studio and Docling for an evening of conversation\, connection\, and community during PyTorch Conf EU.\n\nWhether you're working on training pipelines\, document workflows\, evaluation systems\, or production AI infrastructure\, this gathering is a chance to meet peers\, exchange ideas\, and connect with others building real-world AI.\n\nThe evening will bring together the technical founder of Label Studio (CTO of HumanSignal)\, the creator of Docling\, PyTorch Conf attendees\,&nbsp\;and the greater open source AI ecosystem.&nbsp\; The venue is a short walk from Station F on the Seine.&nbsp\;\n\nLocation: Near Station F (PyTorch venue)\, ParisDate: Tuesday 7 April 2026Time: 18:30 CETClick here to register\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Paris\, France
SEQUENCE:0
UID:fd89b0baae3c1eca9d8b879062f62122
URL:http://pytorchconferenceeu2026.sched.com/event/fd89b0baae3c1eca9d8b879062f62122
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T060000Z
DTEND:20260408T134000Z
SUMMARY:Community Expo
DESCRIPTION:\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:5549b843dd0dbc11fb9a6f0cd9d49f00
URL:http://pytorchconferenceeu2026.sched.com/event/5549b843dd0dbc11fb9a6f0cd9d49f00
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T060000Z
DTEND:20260408T132500Z
SUMMARY:Registration & Badge Pick-Up
DESCRIPTION:\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Lobby\, Paris\, France
SEQUENCE:0
UID:e8056d15fb0a7fb6c1007c9693c0ea48
URL:http://pytorchconferenceeu2026.sched.com/event/e8056d15fb0a7fb6c1007c9693c0ea48
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T070000Z
DTEND:20260408T071000Z
SUMMARY:Keynote: PyTorch CTO - Matt White\, Global CTO of AI\, Linux Foundation
DESCRIPTION:Matt White\, Global CTO of AI and CTO at PyTorch Foundation will provide an update on technical strategy\, ecosystem and projects and working groups
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:82c6a02d9474dbdecb22e634ed9b5a40
URL:http://pytorchconferenceeu2026.sched.com/event/82c6a02d9474dbdecb22e634ed9b5a40
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T071000Z
DTEND:20260408T072500Z
SUMMARY:Keynote: vLLM & Ray Updates - Tyler Michael Smith\, Chief Architect - Inference Engineering\, Red Hat & Artur Niederfahrenhorst\, Member of Technical Staff\,Anyscale
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:49957940e95837a461c5cbeee946bf32
URL:http://pytorchconferenceeu2026.sched.com/event/49957940e95837a461c5cbeee946bf32
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T072500Z
DTEND:20260408T074000Z
SUMMARY:Keynote: The Hub as Infrastructure. From Open PyTorch Models\, to a Safe and Performant Distribution Hub - Lysandre Debut\, Chief Open-Source Officer\, Hugging Face
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:152fd48bdbc788eb95939ee28270c806
URL:http://pytorchconferenceeu2026.sched.com/event/152fd48bdbc788eb95939ee28270c806
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T074500Z
DTEND:20260408T075000Z
SUMMARY:Sponsored Keynote: Open Source Infrastructure for the AI Native Era - Jonathan Bryce\, Executive Director\, Cloud Native Computing Foundation
DESCRIPTION:AI adoption will not be limited by model ideas alone. It will be limited by how fast we can deploy\, secure\, observe\, and scale AI systems in production. Inference is where AI becomes real for most organizations. As AI moves from frontier labs into mainstream production\, the operational challenges start to look increasingly cloud native: orchestration\, autoscaling\, routing\, security\, policy\, and observability. This keynote explores why the next phase of AI adoption will move faster if PyTorch and cloud native communities work together to extend proven open source patterns.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:e49e88fc654afb47f253cd08ebc4d499
URL:http://pytorchconferenceeu2026.sched.com/event/e49e88fc654afb47f253cd08ebc4d499
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T075000Z
DTEND:20260408T080500Z
SUMMARY:Keynote: Gemma 4: Compacting Intelligence for the Edge - Léonard Hussenot\, Research Scientist\, Google Deepmind
DESCRIPTION:This talk explores the philosophy and engineering behind Gemma 4\, arguing that the future of AI isn't only about size\, but about "intelligence per byte."\nWe will dive into why compacting intelligence—maximizing the reasoning and instruction following ability of every single token—is the ultimate bottleneck for truly useful AI. By optimizing for token efficiency and memory footprints\, we unlock a new class of applications that are faster\, private\, and more accessible.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:fd32caed1b091b66cf29e6a7624d2262
URL:http://pytorchconferenceeu2026.sched.com/event/fd32caed1b091b66cf29e6a7624d2262
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T080500Z
DTEND:20260408T083500Z
SUMMARY:Birds of A Feather: Disaggregated Tokenization: Building Toward Tokens-In-Tokens-Out LLM Inference - Maroon Ayoub\, IBM Research; Hang Yin & Xi Ning Wang\, Alibaba Cloud; Nili Guy\, IBM; Hyunkyun Moon\, Moreh
DESCRIPTION:LLMs are token-in\, token-out - but our serving stacks aren't. Tokenization and preprocessing are still locked inside the inference engine\, blocking the cache-aware routing and encode/prefill/decode (E/P/D) disaggregation that production deployments demand. To route smart\, you need tokens before you reach the backend - and with multi-modal inputs requiring heavy encode-stage preprocessing\, this is an architectural imperative\, not just an optimization.\n \n In llm-d\, we learned this the hard way: three tokenization approaches\, three gaps. We're now converging on disaggregated tokenization via vLLM's Renderer API as a gRPC sidecar\, and collaborating with the Gateway API Inference Extension community to define the tokens-in-tokens-out interface. For multi-modal workloads\, disaggregating preprocessing unlocks independent scaling of encode\, prefill\, and decode - each with different compute profiles.\n \n Join us to discuss: How should we standardize tokenization and multi-modal preprocessing outside the engine? How does this shape E/P/D disaggregation? What are your pain points? We'll frame the problem from scheduling\, vLLM\, and gateway perspectives - then open the floor.
CATEGORIES:BIRDS OF A FEATHER
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:92ed6f98a6dff79b6a79ffe9e987804a
URL:http://pytorchconferenceeu2026.sched.com/event/92ed6f98a6dff79b6a79ffe9e987804a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T080500Z
DTEND:20260408T083500Z
SUMMARY:Coffee Break
DESCRIPTION:Menu:&nbsp\;\n-Brioche\n-Granola bar (Gluten Free\, Vegan) \n-Seasonal fruits (Gluten Free\, Vegan) \n-Roasted pumpkin cake\n-Dry fruits and dry grapes mix (Gluten Free\, Vegan)
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:5c1e67c89c1862f965556cbd1d1200fd
URL:http://pytorchconferenceeu2026.sched.com/event/5c1e67c89c1862f965556cbd1d1200fd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T080500Z
DTEND:20260408T083500Z
SUMMARY:Meet the vLLM Maintainers
DESCRIPTION:Meet the core maintainers of vLLM at this session! Come and discuss use cases\, features\, roadmap with us\, or just learn how the vLLM development happens under the hood.
CATEGORIES:MEET THE DEVELOPERS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:7ecfd997853217b0a62db75b1d4d3b40
URL:http://pytorchconferenceeu2026.sched.com/event/7ecfd997853217b0a62db75b1d4d3b40
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T082500Z
DTEND:20260408T084000Z
SUMMARY:Sponsor Activity - Validating AI on CPUs: The vLLM 3-Phase Evaluation Framework
DESCRIPTION:Stop guessing your hardware capabilities. This automated test engine benchmarks vLLM on CPUs through controlled\, realistic\, and production phases\, delivering precise metrics on throughput\, latency\, and optimal KV cache sizing. Join us for a demo!\n\nSponsor: Red Hat\nLocation: Red Hat within the Community Showcase\n\nIn order to facilitate networking and business relationships at the event\, you may choose to visit a third party's booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities\, the third party will receive some of your registration data. This data includes your first name\, last name\, title\, company\, address\, email\, standard demographics questions (i.e. job function\, industry)\, consenting to receipt and use of such data by the third-party recipients\, which will be subject to their own privacy policies.&nbsp\;\n\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:f6d0d833eabe7e8b312810fe9ce476dd
URL:http://pytorchconferenceeu2026.sched.com/event/f6d0d833eabe7e8b312810fe9ce476dd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T083500Z
DTEND:20260408T090000Z
SUMMARY:Beyond JSON-RPC: Scaling Model Context Protocols With gRPC in the PyTorch Ecosystem - Ashesh Vidyut & Madhav Bissa\, Google
DESCRIPTION:Right now\, MCP mostly relies on HTTP and STDIO. That works for simple scripts\, but if you’re running high-performance PyTorch models in production\, you’re going to hit a wall. When you’re moving large context windows or tensor metadata\, the overhead of JSON-RPC starts to hurt.\n We’re introducing SEP-1352\, which adds gRPC as a native transport for MCP. Since gRPC is already the standard for microservices\, it’s a natural fit for the PyTorch ecosystem. By using Protobuf instead of JSON\, we get much higher throughput and lower latency—essentially making the communication between models and tools as fast as the models themselves.\n In this session\, we’ll cover:\n Why Protobuf matters: Moving away from bulky JSON to keep bandwidth low and speed high.\n Built-in Streaming: How to use gRPC’s streaming to handle long-running model outputs without timeouts.\n Production-ready features: Using the same auth\, load balancing\, and service mesh (mTLS) you already use for your ML microservices.\n Upgrading your stack: How to move from PyTorch MCP HTTP services to MCP gRPC services without throwing away your existing infra.
CATEGORIES:AGENTS & INTEROP
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:50cebd50e935df64f63957e9578e1ea4
URL:http://pytorchconferenceeu2026.sched.com/event/50cebd50e935df64f63957e9578e1ea4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T083500Z
DTEND:20260408T090000Z
SUMMARY:How To Write C++ Extensions in 2026 - Jane Xu\, Meta & Mikayla Gawarecki\, Meta
DESCRIPTION:Are you writing a C++ custom op extension to PyTorch? It's 2026 and are you still shipping M x N wheels for M CPython versions and N libtorch versions? Did you know you can just ship 1 wheel that works across multiple CPythons and libtorches? If you're curious how\, attend this talk to get the deets on py_limited_api\, APIs like torch::stable::Tensor & TORCH_TARGET_VERSION\, and generally the latest and greatest ways for keeping your code and your release matrix simple. Get your custom kernel enrolling in new features with benefits proven out in FA3\, xformers\, torchao\, torchaudio\, and more in progress! We'll also share some of our vision towards smoother and faster custom ops extensions.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:5b7d44b124d1e28f5e13693513b7f83b
URL:http://pytorchconferenceeu2026.sched.com/event/5b7d44b124d1e28f5e13693513b7f83b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T083500Z
DTEND:20260408T084500Z
SUMMARY:Lightning Talk: Monarch: An API To Your Supercomputer - Marius Eriksen\, Meta
DESCRIPTION:The training systems driving today’s most advanced AIs are distributed\, dynamic\, and complex. Pre-training relies on layered parallelism and careful fault isolation. Post-training RL spans thousands of GPUs while coordinating verifiers\, compilers\, and code execution. Systems complexity pulls focus away from the core algorithms: developers are forced to assemble systems from schedulers\, RPC stacks\, container orchestrators\, observability tooling\, service discovery\, and app frameworks just to begin work. Monarch is a distributed programming framework for PyTorch that makes the cluster programmable through a single-program Python API. It exposes the supercomputer as a coherent\, directly controllable system—bringing the experience of local development to large-scale training\; handling fault tolerance\, orchestration\, tooling integration\, etc. In this talk\, we will demonstrate how Monarch enables developers to focus on training logic rather than glue\, extend systems easily\, and supervise and debug distributed systems through a unified programming interface. Attendees will leave with a clear model for building robust\, scalable and customizable distributed PyTorch systems using Monarch.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:609c629f9cf8c804ff73a37e8f1b1b9a
URL:http://pytorchconferenceeu2026.sched.com/event/609c629f9cf8c804ff73a37e8f1b1b9a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T083500Z
DTEND:20260408T084500Z
SUMMARY:Lightning Talk: Live Migration of PyTorch GPU Nodes From Azure To European Clouds - Mike Krom\, Acf Cyber Solutions
DESCRIPTION:Many European PyTorch teams run their GPU workloads on hyperscalers like Azure\, AWS\, or GCP—often without realizing that this places their data and models under US jurisdiction. \n \n This lightning talk shows how PyTorch compute nodes can be migrated to European cloud providers while keeping the full ML environment intact. Through a live demo\, we migrate a GPU-enabled PyTorch VM—including CUDA drivers and Jupyter notebooks—from Azure to European infrastructure\, without retraining models or rebuilding environments.\n \n The focus is on practical challenges: GPU compatibility\, reproducibility\, and data movement across clouds. \n \n The migration is demonstrated using DigitalNomadSky\, an open-source Python platform for cross-cloud VM migration\, but the lessons apply broadly to PyTorch teams aiming to reduce jurisdictional risk and vendor lock-in.\n \n Key takeaways\n Why PyTorch workloads on hyperscalers raise sovereignty concerns for EU teams\n What actually breaks (and what doesn’t) when migrating GPU-based ML nodes\n How to regain control over ML infrastructure without rewriting your stack
CATEGORIES:SECURITY & PRIVACY
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:563d3cda4c372ccd5dfe9b55a121eec4
URL:http://pytorchconferenceeu2026.sched.com/event/563d3cda4c372ccd5dfe9b55a121eec4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T085000Z
DTEND:20260408T090000Z
SUMMARY:Lightning Talk: Achieving SOTA GEMM Performance: A CuTeDSL Backend for PyTorch Inductor - Nikhil Patel\, Meta
DESCRIPTION:Matrix multiplication is a central compute primitive in modern deep learning\, but achieving SOTA performance on novel architectures like NVIDIA Blackwell has become a bottleneck. Existing Triton-based kernels in torch.compile struggle to keep pace with rapid hardware evolution\, often forcing users to hand-write custom\, architecture-specific kernels - a growing gap as hardware feature velocity accelerates.\n\nWe present a new CuTeDSL GEMM backend in PyTorch Inductor that integrates NVIDIA’s kernel implementations directly into torch.compile. Built using the Cutlass API for kernel discovery\, this backend allows PyTorch to expose first-class support for NVIDIA-authored GEMMs and automatically leverage new architectural features as NVIDIA updates their kernels.\n\nThe backend currently supports standard GEMM\, grouped GEMM\, and block-scaled MXFP8 GEMM\, along with pointwise epilogue fusions (with reductions forthcoming). We present early end-to-end results from vLLM inference and TorchTitan training\, demonstrating how this approach enables PyTorch to achieve high-performance GEMMs on Blackwell and beyond\, while eliminating the need for users or developers to maintain handwritten kernels.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:422a3e64d6574e308e91ec577abd0925
URL:http://pytorchconferenceeu2026.sched.com/event/422a3e64d6574e308e91ec577abd0925
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T085000Z
DTEND:20260408T090000Z
SUMMARY:Lightning Talk: Step-Aligned Telemetry for Distributed PyTorch Training (Time & Memory Attribution Across Ranks) - Abhinav Srivastav\, TraceOpt
DESCRIPTION:Distributed PyTorch training often looks healthy in system dashboards\; GPU utilization is high\, memory is stable and yet throughput degrades\, steps jitter\, or GPUs go idle intermittently. The core issue is misalignment: most \n telemetry is sampled by time\, while training progresses by "steps"\, and distributed behavior is dominated by the slowest rank rather than averages.\n \n In this talk I will breaks down common failure modes in DDP training that standard metrics miss (rank stragglers\, dataloader stalls\, step-time variance\, and memory spikes/creep). We will show how step-aligned\, rank-aware aggregation changes debugging: per-step worst-rank vs median-rank views\, gating to completed steps across ranks\, and how to tie time and memory back to training semantics without relying on heavyweight profilers.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:64919f50577cae20e02577d1b0092af8
URL:http://pytorchconferenceeu2026.sched.com/event/64919f50577cae20e02577d1b0092af8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T090500Z
DTEND:20260408T091500Z
SUMMARY:Lightning Talk: Accelerating PyTorch Models With Torch.compile's C++ Wrapper Mode - Bin Bao\, Meta
DESCRIPTION:This lightning talk introduces torch.compile's C++ wrapper mode\, a powerful feature that reduces CPU overhead and significantly improves model performance. As modern GPUs become increasingly powerful and compiler optimizations make GPU kernels run faster\, CPU overhead has become more visible as the bottleneck. By generating optimized C++ code instead of Python\, cpp-wrapper mode directly tackles this challenge.\n \n While CUDAGraphs can also reduce CPU overhead\, it is not always applicable—especially with highly dynamic input shapes. In these scenarios\, cpp-wrapper mode provides a robust alternative with significant performance gains. Benchmark results from the OSS Huggingface suite demonstrate that cpp-wrapper mode delivers a 39% speedup over default torch.compile.\n \n Attendees will learn when and how to leverage cpp-wrapper mode to overcome CPU-bound limitations and understand how this feature fits into PyTorch's performance optimization landscape\, enabling them to build faster machine learning applications.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:47c8965c131350e5bee3e4bb1a694f2b
URL:http://pytorchconferenceeu2026.sched.com/event/47c8965c131350e5bee3e4bb1a694f2b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T090500Z
DTEND:20260408T091500Z
SUMMARY:Lightning Talk: KV-Cache Centric Inference: Building a State-Aware Serving Platform With Llm-d and VLLM - Maroon Ayoub & Martin Hickey\, IBM Research
DESCRIPTION:We’ve spent years optimizing LLM inference around compute - faster kernels\, better batching\, smarter parallelism. But in production\, the bottleneck increasingly isn’t FLOPs. It’s state. Specifically\, the KV-cache: the attention state that makes the difference between a 4-second prefill and a sub-second cache hit. Lose it to eviction\, isolate it on a single node\, or fail to route to it - and you’re paying the full compute cost again for work already done.\n \n KV-cache centric inference flips the design priority. Instead of treating cache as a byproduct\, it becomes the organizing principle of the serving platform. This means tiered memory management - offloading KV blocks from GPU to CPU to shared storage so capacity scales beyond any single node. It means cross-replica visibility - so cached state computed on one instance is reusable by any other. And it means cache-aware scheduling - routing requests to where their prefix already lives.\n \n We cover how llm-d and vLLM implement each layer\, how they compose into a coherent system\, and what it looks like in practice - with benchmarks\, deployment patterns\, and lessons from building a KV-cache centric platform in the open.​​​​​​​​​​​​​​​​
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:6d86b949db3ccb8a41001634a12acffe
URL:http://pytorchconferenceeu2026.sched.com/event/6d86b949db3ccb8a41001634a12acffe
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T090500Z
DTEND:20260408T093000Z
SUMMARY:Bringing PyTorch Monarch to AMD GPUs: Single-Controller Distributed Training on ROCm - Liz Li & Zachary Streeter\, AMD
DESCRIPTION:PyTorch Monarch introduces a new distributed programming paradigm that enables developers to orchestrate entire GPU clusters from a single Python program. With its actor-based runtime\, process mesh abstraction\, and asynchronous execution model\, Monarch simplifies large-scale distributed training and enables complex workflows that combine training\, evaluation\, and reinforcement learning within one unified script.\n\nIn this talk\, we present our work enabling PyTorch Monarch on AMD Instinct GPUs with ROCm\, expanding the single-controller model beyond CUDA environments and bringing this emerging runtime to a broader hardware ecosystem. We describe the engineering effort required to port Monarch’s GPU runtime and distributed communication stack to ROCm\, including HIPification of CUDA-specific components\, adaptation of memory management and synchronization semantics\, and integration with high-performance GPU-to-GPU communication on multi-node clusters through RDMA.\n\nWe will share lessons learned from running Monarch workloads on MI300-class clusters\, including performance considerations\, debugging workflows\, and developer experience improvements. Our results demonstrate that Monarch’s architecture can be successfully extended to heterogeneous hardware environments while preserving scalability and ease of use.\n\nThis work advances hardware diversity in distributed PyTorch and highlights how portable runtimes can simplify large-scale training while enabling scalable\, cluster-wide experimentation across accelerator platforms.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:558e60460191f56da7f86701ef759fdd
URL:http://pytorchconferenceeu2026.sched.com/event/558e60460191f56da7f86701ef759fdd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T090500Z
DTEND:20260408T093000Z
SUMMARY:Fp8 Training From Hopper To Blackwell - Luca Wehrstedt\, Meta
DESCRIPTION:The Hopper generation of NVIDIA GPUs first enabled the use of low-precision float8 data types for training via TensorCore acceleration. However\, the recipe to best leverage it was far from settled. Practitioners had to find their way through many entangled decisions around accuracy-vs-efficiency\, precision-vs-range\, overflows-vs-underflows\, and more. The frontier was further push forward by the DeepSeek release\, and then by the micro-scaling formats introduced by Blackwell. In this talk we will go through all these approaches\, comparing their pros and cons\, thus guiding researchers in finding the options that work best for them.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:c4b1f431dc4d43ca237a36d481674a40
URL:http://pytorchconferenceeu2026.sched.com/event/c4b1f431dc4d43ca237a36d481674a40
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T092000Z
DTEND:20260408T093000Z
SUMMARY:Lightning Talk: Building AI That Ops Teams Actually Trust - Robert King\, Chronosphere / Palo Alto Networks
DESCRIPTION:You've built an AI that identifies root causes of incidents faster than any human could... but there's one problem\, no one trusts it.\n \n Ops teams are skeptical by nature. They've been burned by noisy alerts\, black-box tools\, and "intelligent" systems that weren't.\n This talk covers what we learned building AI for incident response across enterprise environments: why technically correct recommendations get ignored\, and how to design for skepticism from day one.\n \n I'll share specific patterns that moved the needle:\n \n - Validating agent responses before they reach users\, catching hallucinations\, weak reasoning\, and overconfident outputs\n - Explainability that fits the operator's mental model\, not the data scientist's\n - Feedback loops that improve the AI and build user trust simultaneously\n - Rollout strategies that let teams build confidence gradually\n \n Whether you're using LLMs\, agents\, or traditional ML for operational tasks\, the trust problem is the same. Ship something wrong during an incident and you've lost your users for months.\n \n You'll leave with a practical framework for validating AI outputs and building the kind of trust that gets recommendations acted on.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:60b69d18d8fd87fedce94e940ab0d20b
URL:http://pytorchconferenceeu2026.sched.com/event/60b69d18d8fd87fedce94e940ab0d20b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T092000Z
DTEND:20260408T093000Z
SUMMARY:Lightning Talk: Not All Tokens Are Equal: Semantic KV-Cache for Agentic LLM Serving - Maroon Ayoub\, IBM Research & Hyunkyun Moon\, moreh
DESCRIPTION:Agentic AI workloads - tree-of-thought exploration\, ReAct loops\, hierarchical swarms - expose a fundamental mismatch in how we serve PyTorch models. Today's inference stacks treat the KV-cache as a flat\, anonymous tensor buffer with blind LRU eviction. This ignores the structural reality of agents: system prompts are durable\, tool definitions are shared\, and reasoning scratchpads are ephemeral. We are currently evicting high-value state to preserve throwaway tokens.\n \n In this talk\, we present Semantic KV-Cache\, an architectural evolution for llm-d and vLLM that replaces anonymous blocks with Typed State.\n \n We demonstrate a runtime that tags blocks as SystemPrompt\, ToolDefinition\, or ReasoningBranch\, applying differentiated policies to each: pinning foundational context\, replicating shared tools\, and eagerly evicting completed thoughts. We show how this "lifecycle-aware" caching reduces recomputation and minimizes the "Agentic Tax" - evolving the PyTorch serving stack from request-centric to workload-aware.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:25a58efd4fc8c993e5e6ba9097d8cef1
URL:http://pytorchconferenceeu2026.sched.com/event/25a58efd4fc8c993e5e6ba9097d8cef1
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T093500Z
DTEND:20260408T100000Z
SUMMARY:Accelerating Complex-Valued Tensors With Torch.compile - Hameer Abbasi\, OpenTeams Inc.
DESCRIPTION:torch.compile has been invaluable in accelerating many machine learning and scientific computing workflows. It has become a one-shot way to get free performance for many kinds of programs and models.\n \n However\, it comes with its own set of limitations. One of these limitations is that\, for a long time\, torch.compile didn't accept complex-valued tensors. These tensors have many uses\, from quantum mechanics to simplifying the physics for world models. Support for such tensors would accelerate many of these workflows.\n \n In this talk\, we will take a journey into the current progress for supporting such tensors in torch.compile\; some of the encountered challenges and what we hope to achieve\, including some side-benefits for reducing binary size by JIT-ing kernels on demand.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:bd6bdeb807c9aa734262de45d0dc93fa
URL:http://pytorchconferenceeu2026.sched.com/event/bd6bdeb807c9aa734262de45d0dc93fa
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T093500Z
DTEND:20260408T094500Z
SUMMARY:Lightning Talk: Enabling the Audio Modality for Language Models - Eustache Le Bihan\, Hugging Face
DESCRIPTION:As the maintainer of everything audio in `transformers` (the lib)\, this talk shares how audio is being integrated into large language models\, grounded in what we observe from the OS ecosystem. Beginning with a brief overview of the current landscape of Audio LMs\, I'll then highlight emerging trends in how audio is incorporated into pretrained text backbones. In particular\, we examine the growing convergence of architectural choices\, many inspired by VLMs\, as well as newer concepts such as audio tokenization and streaming. The core of the talk focuses on providing the audience with key technical insights: audio encoders vs audio tokenizers\, their respective advantages and limitations. It covers the motivations behind introducing concepts such as audio tokenizers and audio processors into transformers\, shows how these design choices are reflected in the library\, and explains how PyTorch tooling is leveraged to make audio a standardized modality for the open-source community.
CATEGORIES:GENAI & MULTIMODAL
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:fe78f8aa0b743e078ca602ede7c552cd
URL:http://pytorchconferenceeu2026.sched.com/event/fe78f8aa0b743e078ca602ede7c552cd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T093500Z
DTEND:20260408T100000Z
SUMMARY:Optimizing Large MoE Inference on NVIDIA Blackwell: NVFP4\, ADP\, and DualPipe Strategies - Julien Demouth\, NVIDIA
DESCRIPTION:Deploying massive Mixture-of-Experts (MoE) architectures like DeepSeek-V3/R1 requires a co-designed approach leveraging NVIDIA Blackwell’s fifth-generation Tensor Cores. This session details the transition to NVFP4 precision for MoE weights to significantly reduce memory load\, coupled with FP4/FP8 KV caching to minimize attention layer footprint and enable higher concurrency.\n We will analyze the architectural shift to Expert Parallelism (EP) for expert layers to maximize FLOPS\, and Attention Data Parallelism (ADP) for attention heads—avoiding redundant KV replication and converting Multi-Head Latent Attention (MLA) into Multi-Query Attention (MQA) via weight absorption. The talk will demonstrate advanced execution strategies\, including DualPipe algorithms to overlap dispatch/combine communication with computation\, and the integration of DeepGEMM and FlashInfer kernels. Finally\, we will cover runtime optimizations using Programmatic Dependent Launch (PDL) and CUDA Graphs to minimize host latency\, alongside Multi-Token Prediction (MTP) for accelerated speculative decoding.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:fd657dbbf96d995e995e10b6b3e8c882
URL:http://pytorchconferenceeu2026.sched.com/event/fd657dbbf96d995e995e10b6b3e8c882
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T093500Z
DTEND:20260408T100000Z
SUMMARY:Portable High‑Performance LLM Serving: A Triton Backend for VLLM - Burkhard Ringlein\, IBM Research & Jan van Lunteren\, IBM
DESCRIPTION:Today\, vLLM is the de-facto industry standard for serving Large Language Models and is widely adopted in production. However\, for most of the past\, vLLM’s state-of-the-art performance was largely dependent on hand-written CUDA or HIP kernels. These kernels have typically been carefully optimized for a specific GPU platform and may pose a serious obstacle to the portability of vLLM across different hardware. Leveraging Triton\, we introduced a “Triton attention backend” to vLLM that produces highly competitive performance across GPU platforms with a single code base\, without involving hand-written CUDA or HIP kernels. The Triton attention backend became the default for AMD GPUs and is used in scenarios where other attention backends have missing features. Additionally\, this backend automatically selects appropriate specialized kernels based on model type or request length. In this talk\, we will present our recent advances that consistently deliver high performance on both NVIDIA and AMD GPUs with a single Triton-only code-base. We will present the engineering and science behind this Triton-only backend\, including system aspects\, kernel improvements\, and launch grid optimizations.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:abe480f77a730dc9ee7c43b3fd469e8d
URL:http://pytorchconferenceeu2026.sched.com/event/abe480f77a730dc9ee7c43b3fd469e8d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T100000Z
DTEND:20260408T113000Z
SUMMARY:Attendee Lunch
DESCRIPTION:Menu | Boxed Lunch\n\nVegan: (Vegetarian)\n-Organic green lentils from Beauce\, lentil hummus\, and red cabbage pickles\n-Chocolate cookie\n\nGluten-Free: (Vegetarian)\n-Organic Beauce quinoa with dried fruit\, coconut yogurt with herbs\n-Yogurt to drink\n\nClassic:\nBulgur wheat and red lentil salad (Vegetarian)\nCereal bread\, poached salmon\, and vegetables\nOr\nPastrami burger with vegetable caviar and tomato sauce\nOr\nRound baguette with artichoke tapenade\, arugula\, tomato\, and Parmesan cheese (Vegetarian)\nBrownie &nbsp\;(Vegetarian)
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:b88033a7b5e6357264f192283c14ae31
URL:http://pytorchconferenceeu2026.sched.com/event/b88033a7b5e6357264f192283c14ae31
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T110000Z
DTEND:20260408T111000Z
SUMMARY:Sponsor Activity - Lobster Trap: OpenClaw in Containers
DESCRIPTION:In this demo\, we containerize OpenClaw with Docker/Podman\, wire up HashiCorp Vault so secrets work identically on a laptop and in a cluster\, and then deploy to K8s. With containers\, one teammate's carefully built agent becomes a deployable team standard.\n\nSponsor: Red Hat\nLocation: Red Hat within the Community Showcase\n\nIn order to facilitate networking and business relationships at the event\, you may choose to visit a third party's booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities\, the third party will receive some of your registration data. This data includes your first name\, last name\, title\, company\, address\, email\, standard demographics questions (i.e. job function\, industry)\, consenting to receipt and use of such data by the third-party recipients\, which will be subject to their own privacy policies.&nbsp\;\n\n
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:6d229bfcbae1a09ade2290ce8aeebfda
URL:http://pytorchconferenceeu2026.sched.com/event/6d229bfcbae1a09ade2290ce8aeebfda
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T113000Z
DTEND:20260408T115500Z
SUMMARY:PyTorch on RISC-V: From Cross-Compilation To Native CI - Ludovic Henry\, Meta
DESCRIPTION:As RISC-V matures into a viable architecture for AI and data center workloads\, bringing first-class PyTorch support to the ecosystem is a critical milestone. This session provides a technical deep dive into the ongoing efforts to port PyTorch natively to RISC-V\, moving beyond experimental cross-compilation toward a stable\, tested\, and optimized environment. We detail the challenges of reconciling native math library dependencies like OpenBLAS and oneDNN with RISC-V Vector (RVV) extensions\, alongside the work required to upstream these accelerations to ensure sustainable\, long-term performance. The talk also addresses the critical "last mile" of the Python ecosystem: ensuring that the broader dependency tree—including NumPy\, SciPy\, and ONNX—is natively available and performant on the architecture. Finally\, we examine the primary bottleneck for official support: CI infrastructure. We outline the roadmap for transitioning from tagged cross-compilation to a native testing pool\, discussing the logistics of maintaining a reliable hardware fleet to meet the high-volume validation standards required for the PyTorch master branch and pull request workflows.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:ef7f3cf7c142ce8b6ca5909e038ba6bf
URL:http://pytorchconferenceeu2026.sched.com/event/ef7f3cf7c142ce8b6ca5909e038ba6bf
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T113000Z
DTEND:20260408T115500Z
SUMMARY:PyTorch Symmetric Memory + NCCL Device APIs: A New Path Towards Multi-GPU Kernels - Ke Wen & Sylvain Jeaugey\, NVIDIA
DESCRIPTION:As large models shift toward inference and Mixture-of-Experts (MoE) architectures\, small batch sizes and dynamic routing present new scaling challenges. Fused\, customized multi-GPU kernels are emerging as the solution\, but programming them for high performance remains difficult. This talk introduces a paradigm shift enabled by PyTorch Symmetric Memory and NCCL device APIs.\n \n PyTorch Symmetric Memory provides a unified infrastructure for direct GPU-to-GPU memory access without CPU involvement. By leveraging symmetric tensor allocation and CUDA Graph-compatible signaling\, it enables fine-grained\, dynamic data exchange while bypassing traditional "send/receive" overhead.\n \n We further demonstrate how NCCL device APIs simplify this model using in-kernel primitives for NVLink and GPU-Initiated Networking (GIN). We will showcase practical examples of compute-communication fusion\, such as AllGather-Matmul\, and customized patterns like deduplicated expert all-to-all. \n \n These abstractions represent one of the most significant evolutions in the PyTorch and NCCL ecosystems\, offering a versatile path to high-performance distributed programming.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:1e1044b28c15cd723ecd3ffaf13b619e
URL:http://pytorchconferenceeu2026.sched.com/event/1e1044b28c15cd723ecd3ffaf13b619e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T113000Z
DTEND:20260408T114000Z
SUMMARY:Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang\, Google
DESCRIPTION:This session will demonstrate the E2E journey of bringing custom PyTorch-based Open Source LLMs on cross platform devices using LiteRT. We will show developers how to take a custom Hugging Face Transformers checkpoint and convert them for on-device execution\, including:\n -Taking the Pytorch model from conversion to deployment.\n -Automated Optimization: How LiteRT performs automated patching of performance-critical components\, including architecture-specific rewrites for PyTorch models.\n -Seamless Fine-Tuning Integration: How to move from an Unsloth fine-tuning session to a TorchAO-quantized model and LiteRT export without leaving your script.\n -The "0-Day" Enablement Strategy: Well-known architectures are supported out-of-the-box. We’ll share how we enabled the QWEN0.6 (or Liquid AI) model in just 20 minutes.\n -Interactive Validation: Run inference on the exported model directly in the Terminal or Colab to verify numerical correctness before deploying to device.\n This workflow shows a smooth fine-tune-to-deployment story where everything stays within the original PyTorch/Hugging Face ecosystem. Viewers can "vibe code" along using Gemini CLI or other coding agents.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:9cf4a5b4e8de2c2a98b47f8fc11829e0
URL:http://pytorchconferenceeu2026.sched.com/event/9cf4a5b4e8de2c2a98b47f8fc11829e0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T113000Z
DTEND:20260408T115500Z
SUMMARY:Optimizing CPU LLM Inference in PyTorch: Lessons From VLLM - Crefeda Rodrigues\, Arm Limited & Fadi Arafeh\, Arm
DESCRIPTION:vLLM has emerged as a reference inference stack in the PyTorch ecosystem for high-throughput large language model serving. CPUs continue to play an important role in LLM inference\, supporting cost-sensitive deployments\, hybrid CPU/GPU serving\, and batch or off-peak workloads on general-purpose infrastructure.\n \n In this talk\, we examine CPU-based LLM inference through the lens of PyTorch internals\, using vLLM as a case study. We describe how vLLM interacts with PyTorch’s operator stack\, including tensor layout management\, backend dispatch\, and threading behaviour\, and highlight common sources of overhead such as repeated weight repacking and poor threading behaviour. \n \n We present runtime and kernel-level optimizations that reduce overhead including CPU paged-attention kernel tuning with vectorized softmax\, specialized Q–K and P–V GEMM kernels aligned with vLLM’s scheduler\, an ISA-aware BF16 attention\, pre-packed weight layouts for quantized matmul\, SIMD vectorization using PyTorch’s at::vec::Vectorized primitives\, and NUMA-aware scheduling for scalable parallel inference.\n \n Finally\, we conclude with lessons learned from building and upstreaming a high-performance CPU inference engine.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:d6c73bdffdecc01bbf5c3f068bcb9a8b
URL:http://pytorchconferenceeu2026.sched.com/event/d6c73bdffdecc01bbf5c3f068bcb9a8b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T114500Z
DTEND:20260408T115500Z
SUMMARY:Lightning Talk: Slash LLM Cold-Start Times by Pre-distributing GPU Caches - Billy McFall & Maryam Tahhan\, Red Hat
DESCRIPTION:Are your Large Language Model (LLM) deployments stuck waiting for GPU kernels to compile? If you are running distributed inference at scale\, your infrastructure is likely wasting time rebuilding the same GPU Kernel Cache for every single instance. You may not even realize the time and resources that are being consumed for rebuilding. This session is designed for platform engineers and ML practitioners who need to optimize inference scaling and reduce startup latency.\n \n We will demonstrate how to eliminate redundant compilation by pre-distributing GPU kernel caches to all the inference nodes using KServe\, a distributed model inference runtime for Kubernetes. Beyond just the "what\," we will dive into the technical implementation of signing\, verifying\, and mounting cache images to ensure supply-chain security across clusters. Attendees will leave with a practical blueprint for reducing cold-start times and securing GPU-heavy workloads in production.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:ec13cacbff28a65f6748b78d269453a2
URL:http://pytorchconferenceeu2026.sched.com/event/ec13cacbff28a65f6748b78d269453a2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T120000Z
DTEND:20260408T122500Z
SUMMARY:Deploying PyTorch Models To the Browser and Beyond With Transformers.js - Joshua Lochner\, Hugging Face
DESCRIPTION:This session presents a comprehensive engineering roadmap for running Hugging Face Transformers entirely locally in your web browser using Transformers.js. We will explore the end-to-end pipeline required to export\, optimize\, and deploy PyTorch models to the web\, leveraging emerging web technologies like WebGPU for efficient\, cross-platform inference.\n \n We will dive into the technical nuances of converting PyTorch models to ONNX using torch.export (Dynamo) and applying runtime-specific optimizations via ONNX Runtime GenAI. This workflow enables the production of highly efficient\, quantized model artifacts suitable for browser-based execution. Finally\, we will demonstrate how to deploy these optimized models using Transformers.js to create performant\, interactive\, and visually stunning WebAI experiences.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:8f129e463e471ace1a81049c7fd57346
URL:http://pytorchconferenceeu2026.sched.com/event/8f129e463e471ace1a81049c7fd57346
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T120000Z
DTEND:20260408T121000Z
SUMMARY:Lightning Talk: Pluggable PyTorch LLM Inference Architecture With VLLM and AWS Neuron Backends - Yahav Biran\, Annapurna Labs & Maen Suleiman\, Amazon
DESCRIPTION:As PyTorch-based LLM serving matures\, the challenge shifts from monolithic inference stacks to integrating diverse hardware accelerators efficiently. This session explores how modular plugin architectures enable PyTorch models to run optimally across backends—demonstrating AWS Trainium integration into vLLM through standardized interfaces.\n \n We'll examine how vLLM's Hardware Plugin architecture uses Python's entry_points for automatic platform detection\, allowing hardware vendors to extend PyTorch inference without fragmenting the codebase. This delivers automatic device detection\, modular feature development\, and seamless integration with PyTorch's model loading patterns.\n \n Technical deep-dive includes NeuronWorker and NeuronxDistributedModelRunner extending vLLM base classes\, NKI kernels for attention and MoE\, and continuous batching with prefill/decode separation. We'll demo HuggingFace models loading through standard vLLM APIs and executing on Trainium without hardware-specific code.\n \n Attendees learn how plugin architectures enable hardware vendors to join PyTorch inference while maintaining standard workflow compatibility.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:9c918e4a7e6a1a733b71e7081c600d8f
URL:http://pytorchconferenceeu2026.sched.com/event/9c918e4a7e6a1a733b71e7081c600d8f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T120000Z
DTEND:20260408T121000Z
SUMMARY:Lightning Talk: Backpropagation-Free Optimization in PyTorch - Andrii Krutsylo\, Polish Academy of Sciences
DESCRIPTION:Backpropagation is not the only mechanism for training deep networks. This talk presents a compact\, implementation-driven map of backpropagation-free training methods\, organized around representative algorithms that expose key design trade-offs.\n \n We focus on four families: Difference Target Propagation (target-based credit assignment)\, Direct Feedback Alignment (random feedback without weight transport)\, local loss / greedy layerwise training (strictly local objectives)\, and Forward-Forward learning as a forward-only alternative. Each is treated as a minimal working pattern rather than a full system.\n \n For each representative\, we answer the same practical questions: what learning signal is propagated\, what intermediate state must be stored\, how parameters are updated\, and what limits scalability on modern accelerators. The emphasis is on PyTorch-level mechanics—explicit update loops\, local objectives\, and training without autograd—rather than derivations.\n \n The goal is to give practitioners a clear mental model of the backprop-free design space and concrete patterns for experimenting with these methods in real PyTorch training pipelines.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:5b3826a279c3aa0e77fa4fc804da7763
URL:http://pytorchconferenceeu2026.sched.com/event/5b3826a279c3aa0e77fa4fc804da7763
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T120000Z
DTEND:20260408T121000Z
SUMMARY:Lightning Talk: Debugging the Undebuggable: Introducing Torch.distributed.debug - Tristan Rice\, Meta\, PyTorch
DESCRIPTION:Distributed training in PyTorch enables unprecedented scale\, but it also introduces notoriously difficult debugging challenges. When a job with thousands of ranks hangs or slows down\, identifying the root cause can feel like searching for a needle in a haystack. This lightning talk introduces the new PyTorch Distributed Debug Server\, a powerful\, interactive tool designed to bring clarity and control to the chaos of distributed debugging. We will provide a high-level overview of its architecture and core features\, demonstrating how it provides a unified interface to inspect stack traces\, analyze performance\, and diagnose hangs across all workers simultaneously. Attendees will learn how this extensible server can dramatically reduce debugging time and improve the reliability of large-scale training jobs.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:6b922258abf7a6c3e694c64f10e15f29
URL:http://pytorchconferenceeu2026.sched.com/event/6b922258abf7a6c3e694c64f10e15f29
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T121500Z
DTEND:20260408T122500Z
SUMMARY:Lightning Talk: Distributed AI Without the Infrastructure Tax - Yahav Biran\, Annapurna Labs & Maen Suleiman\, Amazon
DESCRIPTION:Running distributed AI workloads in production requires solving three problems: package compatibility\, hardware abstraction\, and network configuration. AWS Neuron Deep Learning Containers (DLCs) address all three by providing open-source\, production-ready images for Trainium and Inferentia.\n This lightning talk shows how DLCs eliminate common failure modes. We'll cover three layers: First\, how DLCs solve dependency hell by versioning PyTorch\, Neuron SDK\, XLA backend\, and PyTorch PrivateUse1 dispatcher together as a tested contract. Second\, how Dynamic Resource Allocation (DRA) in Kubernetes abstracts hardware complexity—enabling Neuron core slicing\, multi-tenant workloads\, and topology-aware scheduling without manual device mapping. Third\, how pre-configured EFA drivers settings ensure zero-copy data movement\, avoiding silent performance degradation that can cost 10x throughput.\n We'll demonstrate scaling from laptop to 32-node cluster using the same container image and simple Kubernetes manifests.\n Attendees will learn how to eliminate weeks of setup time\, achieve 65-80% cluster utilization\, and deploy workloads confidently. We'll share the GitHub repository and extension patterns.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:bbef27346cfe53081f0f34edac9f4f86
URL:http://pytorchconferenceeu2026.sched.com/event/bbef27346cfe53081f0f34edac9f4f86
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T121500Z
DTEND:20260408T122500Z
SUMMARY:Lightning Talk: Inside VLLM's KV Offloading Connector: Async Memory Transfers for Higher Inference Throughput - Nicolò Lucchesi\, Red Hat
DESCRIPTION:Every LLM request produces KV-cache state that is expensive to recompute. However\, GPU memory is limited in size and when memory fills up\, entries are discarded from cache. A natural mitigation is expanding the KV cache to CPU DRAM which is meaningfully larger than GPU memory.\n vLLM 0.11.0 introduced the Offloading Connector - an asynchronous\, pluggable API for KV-cache offloading which is bundled with a native CPU backend. This new feature executes transfers concurrently with model computation on the GPU cores by using GPU DMA. This solution offers speedy loading of KV data from DRAM and near zero overhead from offloading. Getting here required rethinking vLLM's memory layout. The default per-layer KV fragmentation devastated transfer throughput. A new contiguous block layout\, upstreamed in 0.12.0\, increased effective block sizes by up to 125× and delivered an order-of-magnitude improvement in offloading performance.\n We'll walk through the connector architecture\, discuss memory transfer tradeoffs\, the memory layout redesign\, and practical guidance for enabling CPU offloading in production.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:720e54892634252f7b70c2e2a91fc5cd
URL:http://pytorchconferenceeu2026.sched.com/event/720e54892634252f7b70c2e2a91fc5cd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T121500Z
DTEND:20260408T122500Z
SUMMARY:Lightning Talk: Scaling Recommendation Systems To 2K GPUs and Beyond - Zain Huda\, Meta
DESCRIPTION:TLDR: In this session\, we go over one of the key technologies to Ads model scaling at Meta\, 2D sparse parallelism. Which scales sparse recommendation embedding tables beyond 1k GPUs to 8k GPUs - enabling the largest Ads model training runs in production at Meta.\n \n Scaling Laws have dominated LLMs and shown the industry we can achieve better model performance through scaling. The same scaling law can be applied to recommendation systems. However\, the path to scaling recommender systems is not the same. The leap from hundreds to thousands of GPUs introduces complex technical challenges\, particularly around handling sparse operations in recommendation models. \n \n In this talk\, we will detail the development of 2D sparse parallelism\, tracing its path from research to production to address sparse scaling challenges. We will demonstrate how we optimize these systems to push performance boundaries\, increasing speed and reducing memory at scale. Participants will walk away with lessons learned from designing 1\,000+ GPU scale systems\, and a deeper understanding of how to implement these solutions efficiently in production.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:c5219f09ad2e53eafe1b8f58c58f4c50
URL:http://pytorchconferenceeu2026.sched.com/event/c5219f09ad2e53eafe1b8f58c58f4c50
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T123000Z
DTEND:20260408T124000Z
SUMMARY:Lightning Talk: Torch-Spyre: Compiling To a Multi-core Dataflow Accelerator With Inductor - David Grove & Olivier Tardieu\, IBM
DESCRIPTION:Torch-Spyre (https://github.com/torch-spyre/torch-spyre) is an open source project that provides a PyTorch PrivateUse1 device with OpenReg\, including an Inductor backend\, for the IBM Spyre Accelerator. IBM Spyre is a high-performance energy-efficient AI accelerator featuring 32 AI-optimized compute cores each with on-chip interconnect and compiler-managed scratchpad memory. \n \n Our goal in this session is to describe how we evolved the Spyre software stack to fully leverage Inductor. This enabled the elimination of a significant fraction of our proprietary compiler code base resulting in improved compilation time and operation coverage without loss of inference performance. We will highlight several technical challenges in compiling for Spyre-like accelerators and describe how we adapted and extended Inductor to tackle them. In particular\, we will discuss our extensions to Inductor to support device-specific tiled Tensor memory layouts\, and new compiler optimization passes for core-level work division and scratchpad management. We hope to engage the community in evolving the PyTorch ecosystem to more fully support them.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:22c6fc73c8368f75e80bc7be6dd1b000
URL:http://pytorchconferenceeu2026.sched.com/event/22c6fc73c8368f75e80bc7be6dd1b000
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T123000Z
DTEND:20260408T125500Z
SUMMARY:Seamless Integration: Custom Kernels in the Torch.compile Stack Without Graphbreaks - Kshiteej Kalambarkar\, Masaki Kozuki & Pawel Gadzinski\, NVIDIA
DESCRIPTION:Custom kernels are essential for high-performance PyTorch workflows\, but their integration often comes with a hidden cost. While torch.compile promises speedups\, calling custom operations typically triggers graph-breaks: fallbacks to Eager mode that introduce overhead and negate your performance gains.\n \n In this session\, we provide a practical roadmap for making your extensions "compiler-aware". Using the Transformer Engine project as a case study\, we will show how to utilize the custom_op extension point to bridge the gap between high-performance kernels and the torch.compile stack.\n \n What you will learn:\n • Identifying the Friction: How to profile and detect graph-breaks caused by custom extensions.\n • The Registration Path: A walkthrough of the custom_op registration process for torch.compile.\n • Solving the "Hard Parts": Strategies for handling complex Python-side logic that disrupts graph capture.\n • Real-World Impact: How these integrations function within the Transformer Engine to maintain peak throughput.\n \n Who should join: This talk is designed for developers building custom PyTorch extensions who want to understand how advanced operations fit into the compiled stack.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:ccae4a627f0773d289abca05c592d67f
URL:http://pytorchconferenceeu2026.sched.com/event/ccae4a627f0773d289abca05c592d67f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T123000Z
DTEND:20260408T124000Z
SUMMARY:Lightning Talk: Every Millisecond Counts: The Fine-tuning Journey of an Ultra-Efficient PyTorch Model for the Edge - Pavel Macenauer\, NXP Semiconductors
DESCRIPTION:From smart cameras that protect privacy by analyzing video on-device\, to wearables that interpret voice and motion instantly\, to industrial sensors that prevent failures before they happen\, edge AI is shaping our everyday routines and transforming our lives.\n \n Eliminating cloud dependency and making connectivity optional is essential for data staying local. Without cloud\, our options become severely limited to the constraints of the devices\, and efficiency drives innovation. Every millisecond and milliwatt can unlock a new use case — or limit one.\n \n This talk will explore optimization techniques for vision\, audio\, and language models that allow them to run on tiny\, resource-constrained devices\, and fine-tune them to the limit of our model’s latency\, accuracy\, or power efficiency. We will start with an initial rapid simulation\, and follow up with silicon-level tuning with real device profiling feedback.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:343625b07895bf6bb8a24a58da1a9edb
URL:http://pytorchconferenceeu2026.sched.com/event/343625b07895bf6bb8a24a58da1a9edb
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T123000Z
DTEND:20260408T125500Z
SUMMARY:From Responses To Trajectories: Multi-Turn and Multi-Environment Reinforcement Learning - Kashif Rasul & Sergio Paniego Blanco\, Hugging Face
DESCRIPTION:Post-training of LLMs with reinforcement learning is increasingly moving beyond static prompt–response pairs and preference optimization methods such as DPO\, toward trajectory-based optimization. This talk focuses on the latest advances in multi-turn and multi-environment GRPO training\, enabling LLMs to learn from interactive\, agent-like experiences\, including interacting with simulated environments\, using tools\, or completing multi-step reasoning tasks.\n \n We highlight how TRL\, as a PyTorch-native post-training framework\, supports these workflows at scale. Multi-turn\, multi-environment training can leverage simulated environments (i.e.\, coding\, terminals\, browsers) such as OpenEnv\, while GRPO can also be applied to datasets for training LLMs on tool use or multi-step reasoning. Attendees will gain insights into design patterns\, rollout handling\, trajectory batching\, and advantage computation\, showing how robust\, multi-turn\, multi-environment post-training can improve alignment\, reasoning\, and generalization in LLMs for agentic applications.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:22ac02d783e649dfc0bcffe79acadf46
URL:http://pytorchconferenceeu2026.sched.com/event/22ac02d783e649dfc0bcffe79acadf46
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T124500Z
DTEND:20260408T125500Z
SUMMARY:Lightning Talk: Building a PyTorch‑native VLLM Plugin for IBM Spyre - Thomas Parnell\, IBM Research & Thomas Ortner\, IBM Research Europe - Zurich
DESCRIPTION:IBM Spyre is an AI accelerator used across IBM Z and Power systems for agentic inference in production. Today\, we serve models on Spyre using upstream vLLM together with an out-of-tree platform plugin. While the current plugin delivers crucial functionality for our business\, it re-uses relatively little of upstream vLLM’s capabilities\, and also carries a high maintenance cost. \n \n In this talk\, we will describe our efforts to redesign the Spyre vLLM plugin in a more PyTorch-native fashion. We will describe the architectural evolution of the project and describe how it leverages torch‑spyre\, an open‑source extension that enables Spyre support in PyTorch via the PrivateUse1 device interface. We discuss key challenges—such as implementing a custom vLLM attention backend for Spyre—and share lessons learned while aligning vLLM’s execution model with Spyre’s hardware capabilities.\n \n Finally\, we will demonstrate a vLLM model running natively on Spyre through the new plugin and highlight areas where the community can work together to improve vLLM’s plugin interface. This talk will be especially relevant for those looking to extend vLLM to a wider variety of accelerators and use cases.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:7b9538b790145914af959888be817561
URL:http://pytorchconferenceeu2026.sched.com/event/7b9538b790145914af959888be817561
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T124500Z
DTEND:20260408T125500Z
SUMMARY:Lightning Talk: Full-Stack PyTorch Robotics VLA: From Data To Edge Via ExecuTorch/OpenVINO - Samet Akcay & Dmitriy Pastushenkov\, Intel
DESCRIPTION:While research-centric tools have lowered the entry barrier for robotics data collection\, transitioning Vision-Language-Action models to production remains challenging due to fragmented edge deployment paths. This session presents a unified\, PyTorch-native workflow spanning the full robotics lifecycle\, from data capture and curation to optimized edge execution. We introduce a modular Physical AI pipeline designed to resolve the disconnect between research scripts and real-time hardware. The talk details practical patterns for robotics data capture and policy training in a unified PyTorch ecosystem\, followed by concrete steps to export models via ExecuTorch. Using an OpenVINO backend\, Quantizer\, and AOT compilation\, we address latency\, accuracy\, and operator coverage gaps\, and demonstrate efficient on-device VLA inference. Using a WidowX pick-and-sort task as a case study\, we demonstrate how to validate latency and numerical tolerances under physical constraints. Attendees will leave with a reference architecture and a checklist for monitoring\, safety gates\, and managing dataset drift\, providing a roadmap for moving robotics VLA from research to production-grade edge deployment.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:19cd28c4dde8bb16aa280ba9fb66b675
URL:http://pytorchconferenceeu2026.sched.com/event/19cd28c4dde8bb16aa280ba9fb66b675
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T125500Z
DTEND:20260408T132000Z
SUMMARY:Birds of A Feather: NCCL in the Wild: Scaling Communications To Thousands of GPUs - Jeff Hammond\, Gabrielle Talavera\, Ke Wen & Asma Farjallah\, NVIDIA
DESCRIPTION:We will share the latest updates to NCCL and how they can be used in PyTorch. We invite the community to share their feedback on challenges using NCCL at scale and ways to improve integration of NCCL with PyTorch applications.\n \n Some of the important topics for community discussion include:\n - Symmetric memory support and GPU-initiated networking.\n - Copy-engine collectives and maximizing overlap of communication and computation for better end-to-end performance.\n - Profiling\, debugging and tuning\, as well as resilience (handling failed nodes without a restart).
CATEGORIES:BIRDS OF A FEATHER
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:ecadc81cb8139674f93f49146ed0b25d
URL:http://pytorchconferenceeu2026.sched.com/event/ecadc81cb8139674f93f49146ed0b25d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T125500Z
DTEND:20260408T132500Z
SUMMARY:Coffee Break
DESCRIPTION:Menu:\n-Lemon cake\n-Caramelized arlette\n-Seasonal fruits (GF\, Vegan)\n-Roasted pumpkin cake\n-Dry fruits and dry grapes mix (GF\, Vegan)\n-Chocolate Cookie (GF\, Vegan)
CATEGORIES:BREAKS/EXHIBITS/SPECIAL EVENTS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:78e8b8e80e21e5182caa1efd3c8703a3
URL:http://pytorchconferenceeu2026.sched.com/event/78e8b8e80e21e5182caa1efd3c8703a3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T125500Z
DTEND:20260408T132500Z
SUMMARY:Meet the Ray Maintainers
DESCRIPTION:Meet the core maintainers of Ray at this session! Come and discuss use cases\, features\, roadmap with us\, or just learn how the Ray development happens under the hood.
CATEGORIES:MEET THE DEVELOPERS
LOCATION:Open Platform\, Paris\, France
SEQUENCE:0
UID:ec689940f1c4a7e5f6eb7e88da81fab3
URL:http://pytorchconferenceeu2026.sched.com/event/ec689940f1c4a7e5f6eb7e88da81fab3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T132500Z
DTEND:20260408T135000Z
SUMMARY:Bridging the Hardware Gap With Code Harnesses on the Hugging Face Kernels Hub - Ben Burtenshaw\, Hugging Face
DESCRIPTION:What: We share experiments and tooling to standardise kernel writing for agentic coding. \n \n We present an end-to-end experiment benchmarking 6 harnesses across 10 models on CUDA and Metal kernel writing. We compare agent cost\, kernel latency\, VRAM usage\, and end inference performance\, and show how the Kernels Hub enables distribution at scale.\n \n We demo two tools:\n \n Kernels Hub: Infrastructure for writing\, maintaining\, and distributing reproducible kernels in the PyTorch ecosystem.\n \n HF Skills: A library for defining and evaluating agent skills for ML tasks like kernel writing.\n \n Why: Beyond agentic hype\, kernel writing is a fundamental problem requiring robust evaluation to scale the community. High-performance kernels demand rare expertise in memory coalescing\, warp-level primitives\, and hardware-specific optimization. In practice\, builders optimize for the highest market-share hardware\, leaving a massive matrix of model×hardware combinations unserved\, For example: edge inference with ExecuTorch\, local LLMs on Metal via vLLM\, classic ML at scale on Intel. This talk is technical\, intended for kernel writers and PyTorch builders who want to use agents robustly.
CATEGORIES:AGENTS & INTEROP
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:66d169f27d7c21c6d209e658ac38216c
URL:http://pytorchconferenceeu2026.sched.com/event/66d169f27d7c21c6d209e658ac38216c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T132500Z
DTEND:20260408T135000Z
SUMMARY:Beyond the Theory: What Actually Breaks When You Scale Your Disaggregated Pytorch Models - Ekin Karabulut & Ron Kahn\, NVIDIA
DESCRIPTION:As inference demand explodes\, new techniques to optimize these deployments have emerged. One such technique is disaggregated inference\, which splits inference into differently optimized workloads (e.g. prefill and decode) on separate workers. The theory is straightforward–better GPU utilization\, inference performance\, and tighter control over SLAs.The deployment in production is not.\n Scaling happens at multiple connected levels. Adding prefill workers for a traffic spike? Those workers belong to a prefill leader and must scale as a unit. But your prefill-to-decode ratio matters too\, scale prefill without matching decode capacity and you've moved the bottleneck.Placement also plays a role: place prefill and decode far apart in your network topology and KV-cache transfers will kill your latency.Standard autoscaling treats these as independent components.They're not.\n In this talk\, we'll share what we've learned running disaggregated vLLM and SGLang deployments on K8s: what broke\,what worked\, and how we're improving performance. We'll evaluate approaches from standard deployments to specialized APIs like LWS and Grove\, discuss how these integrate with frameworks like llm-d and Dynamo.
CATEGORIES:INFERENCE & PRODUCTION
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:5a2f24d75d3e4c9981413b055083e615
URL:http://pytorchconferenceeu2026.sched.com/event/5a2f24d75d3e4c9981413b055083e615
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T132500Z
DTEND:20260408T135000Z
SUMMARY:Building Trust for Users and Regulators Alike: A Cost-Efficient PyTorch Path To Compliance-as-Code - Raja Gopal Hari Vijay\, Zoho Corporation
DESCRIPTION:Traditional compliance relies on retroactive logs and manually stitched audit trails\, while Opacus\, CrypTen\, and Captum address isolated concerns without providing end-to-end lifecycle traceability. Compliance-as-Code embeds regulatory controls as executable logic within training and inference pipelines\, turning compliance into a continuous engineering function and reducing audit costs.\n \n PyTorch’s dynamic execution model enables real-time auditing and compliance gates across the model lifecycle. Features such as the Dispatcher\, custom Autograd functions\, and the hook system allow logging\, constraint checks\, and risk controls to be embedded directly into execution. For ex\, a fairness gate using training hooks can block model export if disparity exceeds thresholds. Dataset initialization can detect imbalance\, while dispatcher-level monitoring generates tamper-resistant audit trails linking data\, model versions\, and outputs. In deployment\, metrics\, inference hooks track bias drift\, accuracy degradation\, and human-intervention counts.\n \n The talk presents practical PyTorch patterns for automated documentation\, immutable audit trails\, and faster certification in regulated AI deployments.
CATEGORIES:RESPONSIBLE AI & COMPLIANCE
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:7c256c7dc83f7b6ffe6232baa2f9c6e6
URL:http://pytorchconferenceeu2026.sched.com/event/7c256c7dc83f7b6ffe6232baa2f9c6e6
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T132500Z
DTEND:20260408T133500Z
SUMMARY:Lightning Talk: Trinity Large - Torchtitan on 2000+ B300s - Matej Sirovatka\, Prime Intellect
DESCRIPTION:In this talk\, we'll cover how to use torchtitan to scale training of ultra-sparse mixture-of-experts models across over 2\,000 GPUs. We'll walk through the pre-training of Trinity Large\, a 400B mixture-of-experts model trained entirely using torchtitan\, focusing on maximizing throughput and minimizing the impact of hardware induced failures. Along the way\, we'll discuss challenges like fault tolerance\, large-scale distributed training\, and ensuring determinism - and how we've addressed each of these using torchtitan. Finally\, we'll share insights and common pitfalls to avoid in your own large-scale training runs.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:10a5bd9d98e70f4edfb5843e02a6bf61
URL:http://pytorchconferenceeu2026.sched.com/event/10a5bd9d98e70f4edfb5843e02a6bf61
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T134000Z
DTEND:20260408T135000Z
SUMMARY:Lightning Talk: Faster Than SOTA Kernels in Torch.compile With Subgraph Fusions and Custom Op Autotuning - Elias Ellison & Paul Zhang\, Meta
DESCRIPTION:Unlocking state-of-the-art performance\, this talk reveals how subgraph and custom operator autotuning in torch.compile deliver breakthrough speedups—surpassing previous SOTA for matmul and distributed collective ops.\n \n DecomposeK is a novel subgraph optimization in PyTorch\, designed to accelerate matrix multiplication when the inner dimension (K) is very large. DecomposeK achieves\, delivering up to 28% speedup over ATen with activation fusion and 10% over ATen without fusion.\n \n Building on subgraph infrastructure\, we introduced Custom Op Autotuning\, which benchmarks and selects the fastest kernel implementations for custom ops. This enables epilogue fusion and the first distributed collective op autotuning in PyTorch. We also introduce Range-based dispatch autotuning that enables dynamic selection of optimal implementations based on input shapes\, ensuring performance that closely matches the theoretical best for each range. Our demo shows our autotuned kernels outperform Async TP Fused AG+MM by 9% and Async TP Fully Fused kernel by 41% across all input ranges.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:c544519f4e04b494be22d0427390e5c2
URL:http://pytorchconferenceeu2026.sched.com/event/c544519f4e04b494be22d0427390e5c2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T135500Z
DTEND:20260408T142000Z
SUMMARY:From Gradients To Governance: Making PyTorch Lineage-Aware - Kateryna Romashko & Clodagh Walsh\, Red Hat
DESCRIPTION:PyTorch was built to track how models learn\, but not whether they should have. As AI systems increasingly operate on regulated\, jurisdiction bound\, and sovereign data\, lineage and policy can no longer live outside the runtime. This talk explores data sovereignty as a first class constraint and argues that lineage is the missing primitive in modern ML frameworks. Building on PyTorch’s dynamic graphs and autograd system\, we outline how tensors could carry origin\, consent\, and policy metadata through training and inference. The goal is not compliance tooling\, but a lineage aware PyTorch that enables trustworthy\, auditable\, and deployable AI across edge\, federated\, and European AI ecosystems.
CATEGORIES:RESPONSIBLE AI & COMPLIANCE
LOCATION:Master Stage\, Paris\, France
SEQUENCE:0
UID:17020f24f63231870191f61195e09e2c
URL:http://pytorchconferenceeu2026.sched.com/event/17020f24f63231870191f61195e09e2c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T135500Z
DTEND:20260408T142000Z
SUMMARY:DualPipe from Scratch: Implementing DeepSeek's 5D Parallelism in PyTorch - Dev Jadhav\, ING Bank
DESCRIPTION:The DeepSeek-V3 paper describes 5D parallelism and DualPipe at a high level\, but leaves critical implementation details undocumented. This session presents our open-source PyTorch reference implementation that fills those gaps - verified against the original architecture and designed for learning and extension.\n\nWe'll share what we discovered building it from scratch:\nWhy K_pe is shared across heads in decoupled RoPE (not explicit in paper)\nThe critical timing of bias updates in auxiliary-loss-free load balancing\nHow sigmoid routing separates selection scores from gate values\nThe warmup formula that makes DualPipe achieve 3% bubble overhead\nBugs we caught: causal mask position offsets\, EMA initialization\, capacity dropping priority\n\nWhat you'll learn:\n\n5D Parallelism: How TP\, PP\, DP\, EP\, and SP interact at 2\,048+ GPU scale\nDualPipe: Building the bidirectional scheduler with 55% throughput gain over GPipe\nHierarchical All-to-All: Two-level communication reducing MoE dispatch overhead by 4x\nTeachable abstractions: CapacityMetrics\, ExpertSpecializationTracker\, ScheduleStep enums\n\nPrerequisites: torch.distributed basics.\nCode: github.com/DevJadhav/deepseek-from-scratch
CATEGORIES:TRAINING SYSTEMS
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:dc5572e792322f97891ea21d2dc3ecc3
URL:http://pytorchconferenceeu2026.sched.com/event/dc5572e792322f97891ea21d2dc3ecc3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T135500Z
DTEND:20260408T140500Z
SUMMARY:Lightning Talk: Why Logging Isn’t Enough: Making PyTorch Training Regressions Visible in Practice - Sahana Venkatesh\, Wayve
DESCRIPTION:PyTorch teams often log rich training metrics\, yet still discover training regressions late after significant developer time and GPU budget have already been spent. In this talk\, I’ll share a practical pattern we used to turn PyTorch training metrics into an operational guardrail for large-model training.\n\nThe approach combines scheduled short and long training runs\, standardized performance and stability metrics (throughput\, memory\, loss\, divergence)\, and simple statistical baselines to automatically surface regressions via alerts without hard gates or complex infrastructure.\n\nI’ll focus on why logging alone is insufficient\, how we chose what to monitor\, and what tradeoffs we encountered (false positives\, alert fatigue\, baseline drift). The goal is not a tool demo\, but a reusable pattern other PyTorch teams can adapt to catch training regressions earlier and make retraining more predictable.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:49665c17184cc8f6bcae98eb492e6351
URL:http://pytorchconferenceeu2026.sched.com/event/49665c17184cc8f6bcae98eb492e6351
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T135500Z
DTEND:20260408T142000Z
SUMMARY:Sponsored Session: Fault-Tolerant Training: How We Build Reliable Clusters for Distributed AI Workloads - Cyril Konkratenko & Maurits de Groot\, Nebius
DESCRIPTION:Large-scale distributed AI training is highly sensitive to infrastructure failures\, where even a single node disruption can halt progress and waste substantial compute. This talk presents Nebius’s approach to fault-tolerant training\, combining reliability metrics such as goodput\, MTBF\, and MTTR with automated infrastructure practices including health checks\, workload isolation\, node replacement\, state recovery\, and observability. Drawing on production cluster results\, the presentation shows how these techniques reduce interruptions\, accelerate recovery\, and improve the stability and efficiency of long-running AI workloads.
CATEGORIES:TRAINING SYSTEMS
LOCATION:Junior Stage\, Paris\, France
SEQUENCE:0
UID:bdd12b5a484d04504fd96396c26c3d7b
URL:http://pytorchconferenceeu2026.sched.com/event/bdd12b5a484d04504fd96396c26c3d7b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T141000Z
DTEND:20260408T142000Z
SUMMARY:Lightning Talk: Ball Tracking and Detection in Soccer Videos - Comparison of VLMs and Traditional Pipelines - Maciej Szymkowski\, Future Processing
DESCRIPTION:Nowadays\, Vision-Language Models (VLMs) have plenty of different applications. However\, it must be pointed out that we cannot be totally sure that they are the most accurate and precise solution for all potential problems. We must compare their possibilities with some other pipelines. In this presentation\, we would like to compare on-premise models – Qwen 3 and InternVL-3.5\, and cloud-based solutions – Gemini 3\, GPT-5 with traditional pipeline based on YOLOv11 and image processing techniques. The battlefield will be ball detection and tracking in soccer matches recordings (from different angles and in diversified light\, e.g.\, sunny\, night\, and weather conditions\, e.g.\, snowy\, rainy day) downloaded from SoccerNet database. In this case\, we used both broadcast videos and action and replay images. All of them were marked manually to prepare ground truth database. The models must recognize not only the ball but also track it through the whole sequence of images. To give equal chances we fine-tuned YOLOv11 and provided additional knowledge to VLMs in the form of RAG pipeline. Comparison was made with traditional Machine Learning metrics like accuracy\, precision\, and recall.
CATEGORIES:APPLICATIONS & CASE STUDIES
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:f91c34b7006b6e069655e2ae7bf83ed0
URL:http://pytorchconferenceeu2026.sched.com/event/f91c34b7006b6e069655e2ae7bf83ed0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T142500Z
DTEND:20260408T143500Z
SUMMARY:Lightning Talk: Bridging the Gap: Engineering Compliant "Glass Box" Medical AI With PyTorch - Muhammad Saqib Hussain\, Neurosonic & Mohaddisa Maryam\, Neurosonic Academy
DESCRIPTION:While state-of-the-art models like NeuroBOLT demonstrate mathematical excellence in EEG-to-fMRI synthesis\, they often remain clinically opaque. With the EU AI Act classifying medical AI as "high-risk\," hospitals cannot deploy "black boxes"\; they require systems that are transparent\, auditable\, and legally compliant.\n ​This session presents a "Clinical Auditing System" built within the PyTorch ecosystem\, designed to transform opaque deep learning models into transparent "Glass Boxes." I will demonstrate a workflow that backpropagates gradients from high-dimensional 4D fMRI volumes to identify the specific EEG spectral signatures driving those predictions.\n ​Key Technical Takeaways:\n ​1. The Audit Layer: Implementing IntegratedGradients (Captum) to verify model fidelity\, ensuring predictions stem from valid neural oscillations rather than noise artifacts.\n ​2. Cross-Modal Reasoning: A technical demonstration of mapping 4D volumetric outputs back to 1D EEG frequency bands\, enabling the model to "reason" through neurovascular coupling.\n ​This presentation is designed for developers seeking to wrap PyTorch models in safety layers that satisfy demands of healthcare regulation.
CATEGORIES:APPLICATIONS & CASE STUDIES
LOCATION:Founders Cafe\, Paris\, France
SEQUENCE:0
UID:693e91cb16bc361ec38fc26cbad9fad9
URL:http://pytorchconferenceeu2026.sched.com/event/693e91cb16bc361ec38fc26cbad9fad9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260414T215423Z
DTSTART:20260408T142500Z
DTEND:20260408T145000Z
SUMMARY:De-mystifying PyTorch for ASICs: When (and Why) To Move Your Development To AI Accelerators - Alpha Romer Coma\, Kollab Philippines
DESCRIPTION:GPU availability and cost are squeezing ML teams\, making ASICs like Google TPUs and AWS Trainium attractive alternatives. But does the software stack hold up? This session moves beyond the datasheets to provide a practical\, code-first reality check on migrating PyTorch workloads to ASICs. We will de-mystify the underlying compiler stacks\, comparing PyTorch/XLA (TPU) and TorchNeuron (Trainium)\, and analyze the 'Compiler Tax' that often surprises developers. Through side-by-side code diffs and real-world benchmarks on fine-tuning Llama 4\, Gemma 3\, Qwen 3\, and training CNNs and ViTs\, we will answer: 1. The Code: How much rewriting is actually required? 2. The Performance: Which model architectures thrive on ASICs\, and which ones fail due to dynamic shapes? 3. The Debugging: What happens when you hit an OOM or a compilation hang? Attendees will leave with a clear 'Migration Decision Matrix' to determine if their specific workload is ready for the ASIC leap.
CATEGORIES:FRAMEWORKS & COMPILERS
LOCATION:Central Room\, Paris\, France
SEQUENCE:0
UID:808ad628e58359f47a415752e5cfa4b7
URL:http://pytorchconferenceeu2026.sched.com/event/808ad628e58359f47a415752e5cfa4b7
END:VEVENT
END:VCALENDAR