Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Audience: Yes clear filter
arrow_back View All Dates
Tuesday, April 7
 

09:00 CEST

Keynote: Co-Evolution: How the Open Source Intelligence Stack Compounds - Mark Collier, Executive Director, PyTorch Foundation, General Manager, AI & Infrastructure, Linux Foundation
Tuesday April 7, 2026 09:00 - 09:10 CEST
Agentic coding systems have crossed a threshold from experimentation to measurable economic impact. Their rapid adoption reveals a deeper shift: modern AI capability emerges from the co-evolution of models, training frameworks, inference engines, reinforcement systems, hardware, and cloud infrastructure, with open source enabling the flow of code, research, and operational knowledge across the stack. As performance gaps narrow and costs fall, this compounding intelligence system accelerates innovation and spreads capability across companies, industries, and hardware platforms, raising a simple question for the community: how fast do we want to evolve?
Speakers
avatar for Mark Collier

Mark Collier

Executive Director, PyTorch Foundation, General Manager, AI & Infrastructure, The Linux Foundation

Tuesday April 7, 2026 09:00 - 09:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:10 CEST

Keynote: PyTorch Updates - Edward Yang, Research Engineer, Meta
Tuesday April 7, 2026 09:10 - 09:30 CEST

Speakers
avatar for Edward Yang

Edward Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.
Tuesday April 7, 2026 09:10 - 09:30 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:55 CEST

Keynote: Stream Everything - Moving from Request input to Streaming input - Patrick von Platen, Research Engineer, Mistral AI
Tuesday April 7, 2026 09:55 - 10:10 CEST

Speakers
avatar for Patrick von Platen

Patrick von Platen

Research Engineer, Mistral AI
Patrick von Platen is a Research Engineer at Mistral AI, focussed on natural language processing and scalable AI systems. Currently, he contributes to vLLM, is a former core maintainer of Transformers, and created Diffusers.
Tuesday April 7, 2026 09:55 - 10:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

11:00 CEST

Lightning Talk: Why Your Forecasting Transformer Isn’t Working (And How To Fix It in Python) - Rosheen Naeem, Open Climate Fix
Tuesday April 7, 2026 11:00 - 11:10 CEST
Renewable energy is clean — but it’s also inherently variable. Solar PV generation can change dramatically within minutes due to cloud cover and weather conditions, making accurate short-term forecasts essential for grid stability, energy trading, and smart-home optimisation.
Open Climate Fix builds open and high-impact forecasting tools to accelerate the transition to a low-carbon energy system. One of these projects is Open Quartz Solar Forecast: an open-source model that uses public PV generation data, site metadata, and numerical weather prediction variables to forecast solar power for any location.
In this talk, I’ll present a real case study from my Google Summer of Code project where I implemented and trained a Temporal Fusion Transformer for multi-horizon solar forecasting. I’ll cover the practical engineering challenges behind making transformer forecasting work in Python: building continuous training windows, aligning weather forecast steps with observations, separating static vs time-varying features, and stabilising training using PyTorch Forecasting and PyTorch Lightning.
Attendees will leave with reusable patterns for real-world time-series forecasting pipelines.
Speakers
avatar for Rosheen Naeem

Rosheen Naeem

Software Engineer, Miro
I am a Software Engineer at Miro and a community member at Open Climate Fix. I completed the Erasmus Mundus Master’s in Software Engineering for the Green Deal (SE4GD), a joint degree program across Vrije Universiteit Amsterdam (Netherlands), LUT University (Finland), and Universit... Read More →
Tuesday April 7, 2026 11:00 - 11:10 CEST
Central Room
  Applications & Case Studies

11:00 CEST

Lights, Camera, Inference! Video Generation as a Service With VLLM-Omni - Ricardo Noriega, Red Hat & Doug Smith, Red Hat, Inc
Tuesday April 7, 2026 11:00 - 11:25 CEST
LLMs made for text generation as a service. What does it take to do the same for video?
We built an experimental Video Generation as a Service stack using vLLM-Omni and the LTX-2 open weights video model to explore how far an open, multimodal stack can go toward production use. We’ll share what worked, what busted, and what it takes to treat generative video as a first-class workload.
vLLM is known for high-performance autoregressive inference, and vLLM-Omni extends that foundation to multimodal inputs and outputs. We pushed those capabilities further by adding support for LTX-2, extending the OpenAI-compatible API surface, integrating with front ends, and packaging for scalable deployment. We’re here to walk you through and get you familiar with the touch points for just how we put all the Legos together with vLLM-Omni.
Finally, we’ll examine the gap between novelty demos and real applications: going from quirky spaghetti eating videos to generating consistent characters, personalized media, customized video game cutscenes, and interactive storytelling, and highlight what’s still missing to make generative video truly production-ready.
Speakers
avatar for Doug Smith

Doug Smith

Principal Software Engineer, Red Hat
Doug Smith is a Principal MLOps Engineer at Red Hat, where he works on the AI Inference Server team and contributes upstream to the vLLM project through its CI Special Interest Group. Recently, he's also been looking into contributions to vLLM-Omni. He’s spent years bridging telecom... Read More →
avatar for Ricardo Noriega

Ricardo Noriega

Principal SW Engineer, Red Hat
Ricardo is a Principal Software Engineer working at the Red Hat's Office of the CTO in the Emerging Technologies organization. Ricardo is currently focused on AI multimodality and researching the benefits of Small Language Models.
He is a former member of the Akraino TSC and PTL of the Kubernetes-Native-Infrastructure blueprint family, and contributor to Kubernetes, OpenStack, OpenDaylight and OPNFV... Read More →
Tuesday April 7, 2026 11:00 - 11:25 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

11:15 CEST

Lightning Talk: Deep Learning in the Wild: Embedded PyTorch for Real-World Conservation Bioacoustics - Taraqur Rahman & Owen O'Donnell, OWL Integrations
Tuesday April 7, 2026 11:15 - 11:25 CEST
Passive acoustic monitoring is a powerful tool for wildlife conservation, but deploying deep learning models in remote rainforest environments introduces strict constraints on power, memory, and compute. In this talk, we present an end-to-end PyTorch-based pipeline for detecting and analyzing the endangered three-wattled bellbird using embedded deep learning systems.

We cover the full lifecycle from audio preprocessing and model training in PyTorch to optimization and deployment on resource-constrained embedded devices. Topics include model architectures for sparse bioacoustic event detection, handling extreme class imbalance, model compression and quantization, and practical trade-offs between accuracy, latency, and power consumption.

The session emphasizes real-world lessons learned deploying machine learning at the edge, where unreliable connectivity, noisy signals, and limited hardware define success more than benchmark metrics. Attendees will gain practical patterns for building and deploying PyTorch models for embedded and edge AI applications with real environmental impact.
Speakers
avatar for Owen O'Donnell

Owen O'Donnell

Embedded Systems and Machine Learning Engineer, OWL Integrations
Owen O'Donnell is a Machine Learning and Embedded Systems Engineer at OWL integrations. He works with training ML models to deploy in remote locations that will be running on resource constrained electronics. This introduces challenges such as needing smaller sized models and having... Read More →
avatar for Taraqur Rahman

Taraqur Rahman

Chief Data Scientist, OWL Integrations
Taraqur Rahman is Chief Data Scientist and Co-Founder at OWL Integrations and Organizer/Co-Founder of Biased Outliers, where he leads applied machine learning and data science initiatives with real-world impact. He combines deep technical expertise in Python with practical deployment... Read More →
Tuesday April 7, 2026 11:15 - 11:25 CEST
Central Room
  Applications & Case Studies
  • Audience Level Any
  • Slides Attached Yes

11:15 CEST

Lightning Talk: Flexible Deployment of PyTorch Models on MCU-Class Devices Using ExecuTorch - Robert Kalmar & Martin Pavella, NXP
Tuesday April 7, 2026 11:15 - 11:25 CEST
ExecuTorch has recently matured into a production ready framework designed specifically for efficient edge deployment of PyTorch models. Its architecture supports a broad spectrum of hardware targets—from low power, bare metal or RTOS based microcontrollers (MCU) to higher performance Linux or Android based microprocessor platforms—while meeting the demanding constraints of memory, compute, and power typically found in real world embedded applications.
This talk focuses on the deployment flexibility ExecuTorch offers for MCU class devices, highlighting how different backends enable efficient execution across heterogeneous compute units. We will explore CPU, DSP, and NPU acceleration paths using the Cortex-M, Cadence, Ethos-U, and eIQ Neutron backends, and discuss how these integrate into typical ML model deployment workflows.
To make the session practical and application oriented, we will present an optimization journey aimed at reducing power consumption—an essential requirement for ML workloads in energy constrained environments. Attendees will gain insights into backend selection, performance trade offs, and best practices for suitable deploying PyTorch models on edge devices.
Speakers
avatar for Robert Kalmar

Robert Kalmar

Principal AI/ML Engineer at NXP Semiconductors, NXP Semiconductors
Robert Kalmar is a Principal Machine Learning Engineer at NXP Semiconductors. He received his master’s degree in machine learning and intelligent systems from Brno University of Technology. At NXP he focus on machine learning solution enablement for embedded and mobile devices... Read More →
avatar for Martin Pavella

Martin Pavella

ML SW Engineer, NXP Semiconductors
I hold a Master’s degree in Machine Learning from the Brno University of Technology, graduating with distinction at both bachelor’s and master’s levels. I am a mid-level AI/ML Software Engineer at NXP Semiconductors with 2.5+ years of experience. I won the 2025 iGEM overgraduate... Read More →
Tuesday April 7, 2026 11:15 - 11:25 CEST
Junior Stage
  Inference & Production

11:30 CEST

Lightning Talk: Coding Agents for Compiler Construction: Beyond the AI Assistant Paradigm - Reza Rahimi, yasp.ai & Stefan Krassin, yasp
Tuesday April 7, 2026 11:30 - 11:40 CEST
Modern ML compilers follow a familiar pattern: a frontend lowers models into an intermediate representation, while a backend applies graph and kernel optimizations before generating code for target accelerators. PyTorch provides strong foundations through nn.Module, FX, and graph capture, but implementing optimized backends remains challenging due to hardware diversity and kernel-level complexity.

Optimizing GPU kernels is hard. Few engineers do it well. Hardware architectures evolve yearly, and with hyperscalers, chip makers, and AI labs building custom silicon, demand for efficient kernel generation keeps growing. This creates a gap between model developers and hardware capabilities.

This talk explores coding agents as engineering tools for compiler construction, not general-purpose assistants. We discuss how agents can generate and refine backend components by analyzing model mathematics and hardware specifications to produce optimized kernels tailored to specific targets.

We present a compiler architecture built as a PyTorch add-on that accepts PyTorch models or FX graphs and produces executable artifacts, demonstrating practical integration with existing PyTorch workflows.
Speakers
avatar for Reza Rahimi

Reza Rahimi

CTO, yasp
Reza Rahimi is a seasoned technologist with a strong background in accelerating engineering software and scaling machine learning systems. With experience leading teams across embedded AI, compiler design, and model optimization, he now serves as CTO of yasp, where he is pioneering... Read More →
avatar for Stefan Krassin

Stefan Krassin

CEO, yasp.ai
With a background in electrical engineering and a career spanning embedded systems to executive leadership, he combines technical expertise with a vision for scale. After 10+ years of leading companies to outstanding growth, he co-founded yasp in 2023. His mission is to eliminate... Read More →
Tuesday April 7, 2026 11:30 - 11:40 CEST
Founders Cafe
  Agents & Interop

11:30 CEST

Lightning Talk: How DeepInverse Is Solving Imaging in Science and Healthcare With PyTorch - Andrew Wang, DeepInverse; Minh Hai Nguyen, Université de Toulouse
Tuesday April 7, 2026 11:30 - 11:40 CEST
Deep learning has revolutionised imaging, a foundation of science and healthcare. DeepInverse is the PyTorch library for solving imaging problems, unifying deep learning methods (e.g. diffusion models), physics (medical, optics) and modern tooling. In this talk, we’ll show how the PyTorch community can get involved in this exciting yet accessible application of open-source AI.

AI methods in imaging must model the imaging physics, leading to interesting engineering problems e.g. efficient differentiable ops, physics-informed losses. We’ll show notebooks on real use-cases: accelerating brain MRI, reducing radiation in CT scans, imaging black holes.

PyTorch enthusiasts at any level/background can contribute - from training infra for scientific data to high-level generative modelling frameworks - their AI engineering skills can directly impact imaging across multiple fields.

DeepInverse is supported by a growing international user community and proudly rooted in Paris. We’ve joined the PyTorch Ecosystem and received the Prix Science Ouverte in 2024. We’re excited to join the PyTorch Conf to celebrate the vibrant French developer community!
Speakers
avatar for Andrew Wang

Andrew Wang

CTO & Co-founder, Blur Labs
Andrew is a lead developer of DeepInverse as well as the CTO & co-founder of Blur Labs, a startup based in Paris building AI models for imaging. Andrew did his PhD at the University of Edinburgh in magnetic resonance image reconstruction.
avatar for Minh Hai Nguyen

Minh Hai Nguyen

PhD candidate, Toulouse University
Tuesday April 7, 2026 11:30 - 11:40 CEST
Central Room
  Applications & Case Studies
  • Audience Level Any
  • Slides Attached Yes

11:30 CEST

Why Classic IAM Collapses for Agents: Rethinking IAM for Agentic Systems - Parul Singh, Red Hat
Tuesday April 7, 2026 11:30 - 11:55 CEST
Autonomous AI agents increasingly reason, plan and act across tools, services and organizational boundaries. In these environments, traditional Identity and Access Management models begin to fail. Agents are not users and they are not static services. They act on behalf of others, change context during execution and operate with different levels of autonomy and risk.

This talk examines why classic IAM assumptions like long lived identities, static permissions and check once trust always authorization do not hold for agentic systems. We focus on three core questions.
1. How should agent identity be defined when tools, workflows, and execution context are dynamic?
2. How do we preserve accountability when agents act on behalf of users or other agents?
3. How should access change as agent behavior and risk evolve during execution?

We will outline a practical migration path from traditional IAM to agent aware identity workflows. Our insights come from our work collaborating with the COSAI Secure Design for Agentic Systems working group focused on creating shared guidance and emerging standards. We will close with current open questions and how you can join us to help solve them.
Speakers
avatar for Parul Singh

Parul Singh

Principal Software Engineer, Red Hat
Parul is a Principal Software Engineer in Red Hat's Office of the CTO, working on agentic systems and security. Her work focuses on trust, identity, and observability for autonomous AI agents, including delegation, provenance, and zero trust architectures for agentic workflows. She... Read More →
slides pdf
Tuesday April 7, 2026 11:30 - 11:55 CEST
Junior Stage
  Security & Privacy

11:45 CEST

Lightning Talk: ExecuTorch on Microcontrollers: Deploying PyTorch To the Smallest Edge - RJ Ascani & Matthias Cremon, Meta
Tuesday April 7, 2026 11:45 - 11:55 CEST
ExecuTorch extends PyTorch's reach to the most resource-constrained devices: microcontrollers, DSPs, and specialized neural processing units powering always-on sensors, wearables, and embedded systems. In this talk, we'll share the current state and roadmap for running ExecuTorch on platforms where every kilobyte of memory and milliwatt of power matters.

What you'll learn:
- How ExecuTorch's design enables deployment from ultra-low-power MCUs to DSP and NPU accelerators, all from a single PyTorch workflow
- The state of backend support for Cadence DSPs, ARM Ethos-U and Cortex-M
- Practical considerations for deploying models with sub-megabyte footprints and milliwatt power budgets
- Case studies spanning always-on audio, embedded vision, and TinyML applications
Speakers
avatar for Matthias Cremon

Matthias Cremon

Software Engineering Manager, Meta
Matthias Cremon is a Software Engineering Manager at Meta in the Silicon AI Software Team, working on AI compilers for various edge devices. He focuses on the frontend, graph level optimization side, as well as the integration of low-level, vendor specific implementations to run on... Read More →
avatar for RJ Ascani

RJ Ascani

Software Engineer, Meta
RJ Ascani is an embedded software engineer on Meta’s PyTorch Edge team, focusing on advancing ExecuTorch for microcontroller platforms.
Tuesday April 7, 2026 11:45 - 11:55 CEST
Central Room
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

11:45 CEST

Lightning Talk: TorchJD: Jacobian Descent in PyTorch - Pierre Quinton, EPFL & Valérian Rey, Simplex Lab
Tuesday April 7, 2026 11:45 - 11:55 CEST
Jacobian descent (JD) is an extension of gradient descent supporting the optimization of vector-valued functions. This algorithm can be used to train neural networks with multiple loss functions (e.g. multi-task learning). JD iteratively updates the parameters of the model using the Jacobian matrix of the vector of losses (the matrix stacking each individual loss' gradient).

To support and extend our research, we have developed the TorchJD library. With it, it's easy and efficient to compute the Jacobians with respect to the model parameters, and to aggregate them into an update direction that is beneficial to every objective. In contrast, if we had averaged the losses and used gradient descent, the update would have been beneficial to the average loss, but may have actually increased one of the individual losses.

In this session, we will give a quick introduction to the theory behind Jacobian descent, and then show how to use TorchJD on a variety of use-cases, beyond multi-task learning.

Library: https://github.com/TorchJD/torchjd
Paper: https://arxiv.org/abs/2406.16232
Speakers
avatar for Pierre Quinton

Pierre Quinton

Teacher, EPFL
PhD in Information Theory and Master in Data Science, specializing in fundamental math and multi-objective optimization (MOO). I am the co-author of TorchJD, a PyTorch library for Jacobian Descent developed with Valerian, currently at ~300 GitHub stars. My work aims to translate complex... Read More →
avatar for Valérian Rey

Valérian Rey

Research Engineer, Simplex Lab
I graduated from EPFL with a MSc in Data Science in 2021. Since then, I worked as a Data Scientist as Withings, and I worked on Jacobian descent, initially as a side-project, but now as a full-time occupation. I now spend most of my time developing and maintaining TorchJD, and I love... Read More →
Tuesday April 7, 2026 11:45 - 11:55 CEST
Founders Cafe
  Training Systems

12:00 CEST

Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy - Ankita Luthra & Trinadh Kotturu, Google
Tuesday April 7, 2026 12:00 - 12:10 CEST
As PyTorch models scale to billions of parameters, the bottleneck has quietly shifted from compute to storage. Modern GPU clusters often sit idle, "starving" for data while waiting on legacy REST-based protocols. This talk introduces Rapid Storage: a fundamental architectural shift bringing Google’s Colossus stateful protocol (that powers many Google’s products) to PyTorch via fsspec , a common Pythonic file interface used by many frameworks within PyTorch ecosystem.
By bypassing REST APIs entirely via persistent gRPC streams to the storage layer, we eliminate protocol overhead. In this talk, we also dive into how Rapid achieves <1ms random read/write latency, 20x faster data access, and a massive 6 TB/s of aggregate throughput. Crucially, it delivers up to 10x lower tail latency for random I/O, preventing the stragglers that often stall distributed training jobs.
Beyond raw speed, we will deconstruct the integration with gcsfs and the broader fsspec ecosystem. This ensures that high-performance I/O is available across the entire data stack including Dask, Ray, HF Datasets and vLLM etc. Join us to learn how to stop wasting GPU cycles and achieve linear scaling in the cloud.
Speakers
avatar for Ankita Luthra

Ankita Luthra

Senior Software Engineer, Google
Ankita Luthra is a Software Developer at Google, focused on AI/ML infrastructure and scalable data pipelines. Her work with open-source tools like fsspec(gcsfs) and gcsfuse improves how frameworks such as PyTorch/ JAX efficiently access data from Google Cloud Storage.
avatar for Trinadh Kotturu

Trinadh Kotturu

Senior Product Manager, Google
Trinadh Kotturu is a Senior Product Manager specializing in AI/ML and analytics client strategy at Google. An alumnus of IIM Bangalore with 12 years of experience, he has a proven track record of shipping v1 products and scaling them into robust platform services. His expertise spans large-scale distributed storage systems, autonomous driving, and system resiliency... Read More →
Tuesday April 7, 2026 12:00 - 12:10 CEST
Master Stage
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

13:45 CEST

Lightning Talk: From Pretrained To Personal: Privacy-First Fine-Tuning on AI PCs - Daniel Holanda Noronha & Iswarya Alex, AMD
Tuesday April 7, 2026 13:45 - 13:55 CEST
Pytorch on AI PCs crossed a threshold: local hardware can now support meaningful model fine-tuning, not just inference. This unlocks a new class of enterprise workflows where sensitive data never leaves the device, yet models can still be personalized and adapted using PyTorch.

In this session, we’ll show how to design on-device fine-tuning pipelines for AI PCs, focusing on enterprise scenarios where privacy is non-negotiable: regulated healthcare data, government and public-sector workloads, financial services, and proprietary enterprise systems. We’ll walk through key decisions such as selecting efficient pre-trained models, and how the right PyTorch optimizations enable effective personalization on large private datasets.

We'll also showcase practical fine-tuning techniques such as supervised fine-tuning (SFT), LoRA, and QLoRA, and show how mixed-precision training and correct use of training vs. evaluation modes make these approaches efficient and practical on AI PCs while preserving privacy. The result is a cloud-free, privacy-first fine-tuning blueprint that turns AI PCs into secure personalization engines for enterprise AI.
Speakers
avatar for Daniel Holanda

Daniel Holanda

Solutions Architect & ML Engineer, AMD
Daniel is a Sr. ML Engineer at AMD, specializing in local AI. He leads the development of local fine-tuning workflows for AI PCs and co-leads several open-source projects where he designs production-grade LLM/VLM tooling to accelerate the AI development lifecycle.

Previously, he was a Machine Learning Engineer at Groq and a contributor to Microsoft’s Project Brainwave. Daniel holds a PhD in AI understanding and hardware architecture from UBC... Read More →
avatar for Iswarya Alex

Iswarya Alex

Iswarya Alex, AMD
I am an ML Engineer at AMD focused on enabling high-performance on-device AI experiences. I work on optimizing and deploying models on AMD's Ryzen AI powered devices with GPUs and NPUs efficiently
Tuesday April 7, 2026 13:45 - 13:55 CEST
Founders Cafe
  Security & Privacy

13:45 CEST

Bringing ExecuTorch To the Next Frontiers of Edge AI - Mergen Nachin, Meta
Tuesday April 7, 2026 13:45 - 14:10 CEST
Since the General Availability release of ExecuTorch 1.0 in October 2025, our team has continued to advance the state of the on-device AI software stack. In this talk, we will share our upcoming roadmap and present demos that highlight ExecuTorch’s deployment across the next frontiers, such as AI PCs, robotics, TinyML devices, and the integration of AI agents to improve productivity for on-device deployment.

ExecuTorch is built on open source collaboration, encouraging community adoption, contributions from hardware partners, and interoperability with other ecosystem libraries. We will discuss how these foundations set the stage for the next phase of edge AI with ExecuTorch.
Speakers
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →
Tuesday April 7, 2026 13:45 - 14:10 CEST
Master Stage
  Applications & Case Studies

14:15 CEST

Lightning Talk: Accelerating On-Device ML Inference With ExecuTorch and Arm SME2 - Jason Zhu, Arm
Tuesday April 7, 2026 14:15 - 14:25 CEST
As on-device AI workloads grow in complexity, achieving low-latency inference within mobile power constraints remains a central challenge. We examine how ExecuTorch, combined with Arm’s Scalable Matrix Extension 2 (SME2), enables efficient CPU deployments of production AI workloads. We present a case study of SqueezeSAM, a segmentation model deployed in real-world mobile applications. Using ExecuTorch with XNNPACK delegation and SME2-optimized kernels, we evaluate INT8 and FP16 inference on a flagship smartphone. Moving beyond aggregate latency, we apply operator-level profiling to decompose runtime across convolution, GEMM, elementwise, and data movement operators, showing how hardware acceleration reshapes bottlenecks in the execution stack. SME2 delivers up to 3.9x end-to-end speedup on a single CPU core, materially altering runtime composition and revealing data movement as the primary post-acceleration bottleneck. This session presents a practical workflow for deploying, profiling, and systematically optimizing on-device PyTorch models, demonstrating how SME2 expands the viable design space for interactive mobile AI.
Speakers
avatar for Jason Zhihuai Zhu

Jason Zhihuai Zhu

Senior Principal Engineer, Arm
Jason Zhu is a Senior Principal Engineer at Arm focused on hardware and software co-optimization for AI systems. With a background in quantum physics and experience spanning AI research and product engineering across major technology companies, he works across the full execution stack... Read More →
Tuesday April 7, 2026 14:15 - 14:25 CEST
Master Stage
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

14:15 CEST

Sponsored Session: TorchTPU: Expanding TPU Programmability to Pytorch - Kat Ko & Claudio Basile, Google; Jana van Greunen, Meta
Tuesday April 7, 2026 14:15 - 14:40 CEST
Google Tensor Processing Units (TPUs) are designed for ML at massive scale, offering significant benefits in performance, energy, and cost. While TPUs have historically been associated with the TensorFlow and JAX ecosystems, we introduce TorchTPU: a new Google effort to expand TPU programmability to PyTorch.

This talk charts TorchTPU’s evolution, from the initial RFC to establishing a native, eager-first PyTorch backend. We will outline the core technical challenges overcome during this transition—particularly the complexities of translating dynamic, eager execution into highly optimized TPU computations.

We’ll highlight current milestones, including native integration with torch.compile, DTensor, and robust support for the latest Ironwood (TPU v7) architecture. These advancements collectively enable multi-billion parameter models to run on TPUs with minimal code changes, while retaining the ability for users to apply model-specific optimizations (e.g., custom kernels, quantization, sharding) to reach peak performance. Finally, we’ll provide a sneak peek at our roadmap for 2026.
Speakers
avatar for Jana van Greunen

Jana van Greunen

Director of PyTorch Engineering, Meta
Jana van Greunen is the Director of PyTorch Engineering at Meta, where she leads efforts to ensure PyTorch remains the leading AI/ML framework for researchers and developers worldwide. With deep expertise in distributed systems, large-scale infrastructure, and over 15 years of experience... Read More →
avatar for Kat Ko

Kat Ko

Senior Eng Manager, Google
Kat Ko is a Senior Engineering Manager at Google and a lead on TorchTPU, where she drives the integration of PyTorch with TPU technology to enable high-performance computing at scale. An EECS graduate of UC Berkeley, she brings over 15 years of experience building large-scale systems... Read More →
avatar for Claudio Basile

Claudio Basile

Software Engineer, Google
Claudio Basile is a Google Software Engineer and the co-founder and technical lead of TorchTPU. During his tenure at Google, he also authored LiteRT, the company’s new on-device ML framework. With a Ph.D. in ECE from UIUC and over 15 years of experience spanning machine learning... Read More →
Tuesday April 7, 2026 14:15 - 14:40 CEST
Founders Cafe
  Frameworks & Compilers
  • Slides Attached Yes

14:30 CEST

Lightning Talk: Combo Kernels: Horizontal Fusion Optimization in Torch.compile - Karthick Panner Selvam, & Elias Ellison, Meta
Tuesday April 7, 2026 14:30 - 14:40 CEST
Combo kernels are a compiler optimization in PyTorch Inductor that horizontally fuses multiple independent operations into a single Triton kernel launch, reducing GPU kernel launch overhead and improving memory locality.

The Problem: Models generate many small, independent operations like weight preprocessing and tensor copies. Each launch incurs overhead. For models with many such operations, this becomes a bottleneck.

The Solution: Combo kernels combine multiple operations into one kernel using a dispatch mechanism. A single program ID routes execution to the appropriate subkernel based on cumulative block boundaries. This eliminates redundant launches while preserving correctness.

Key Innovations:

Per-subkernel block dimensions: Each subkernel gets its own optimized block size instead of sharing one size across all, enabling better autotuning.

Flattened grid dispatch: We collapse the multi-dimensional block grid into a single dimension.

Results: On H100 GPUs, combo kernels deliver geomean speedups of +7.38% for HuggingFace, and +5.97% for TorchBench. The optimization is enabled by default in the vLLM repository for LLM inference acceleration.
Speakers
avatar for Elias Ellison

Elias Ellison

Software Engineer, Meta
Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack
avatar for Karthick Panner Selvam

Karthick Panner Selvam

Software Engineer, Meta
Karthick Panner Selvam is a SWE at Meta Superintelligence Lab, working on the PyTorch compiler team to enhance performance and scalability for large models. He earned his PhD in Machine for Systems at the University of Luxembourg, collaborating with Google DeepMind, ECMWF, and Frontier... Read More →
Tuesday April 7, 2026 14:30 - 14:40 CEST
Master Stage
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

14:45 CEST

Lightning Talk: Implementing Single-Dim Strategies With Sharding Validator - Anshul Sinha, Meta
Tuesday April 7, 2026 14:45 - 14:55 CEST
DTensor sharding propagation is a major bottleneck to full operator coverage: adding or fixing an op strategy is complex, bug‑prone, and gaps often surface as unexpected resharding and extra collectives. A key source of complexity is that today’s rules conflate (1) semantic correctness—valid input/output sharding combinations for an operator—with (2) search‑space pruning to avoid combinatorial blowups on N‑dimensional meshes.

This talk presents a landed prototype that separates these concerns via Single Mesh Dim Strategies: each operator specifies valid placement combinations for one mesh dimension, while infra expands/composes them across the full mesh and selects low‑cost strategies. For contributors, this provides a clear path to refactor existing op_strategies into single‑dim rules that are easier to review and extend. We also introduce a Truth Table‑style sharding validator that systematically tests shapes and sharding specs to check soundness/completeness and to flag unnecessary redistribution/collectives caused by missing cases.

The goal of this presentation is faster, higher‑confidence contributions that improve correctness and expand DTensor operator coverage.
Speakers
avatar for Anshul Sinha

Anshul Sinha

Software Engineer, Meta
I graduated from the University of Michigan with a B.S in Computer Science in December 2024. I joined Meta's PyTorch Distributed as a SWE in June 2025.
Tuesday April 7, 2026 14:45 - 14:55 CEST
Founders Cafe
  Frameworks & Compilers

14:45 CEST

Brevitas Quantization Library - Pablo Monteagudo Lago, AMD
Tuesday April 7, 2026 14:45 - 15:10 CEST
Brevitas is an open‑source PyTorch library from AMD designed to support the research of state‑of‑the‑art quantization methods, including Qronos (ICLR 2026) and MixQuant (arXiv). Built for flexibility and composability, it offers modular components for exploring reduced‑precision data paths and accuracy‑preserving techniques.
As generative models scale, post‑training quantization (PTQ) has become the preferred strategy for maintaining quality without retraining, yet PTQ methods are often applied in isolation due to fragmented tooling. Brevitas provides a unified environment for modern PTQ algorithms—including Qronos, SpinQuant and AutoRound—enabling practitioners to combine complementary techniques effectively.
Brevitas leverages the latest PyTorch features, like Dynamo for tracing and selectively modifying compute graphs—for example, by inserting rotation ops to mitigate outliers. It integrates with frameworks like transformers and supports export flows including vLLM and GGUF, ensuring a smooth transition from experimentation to deployment.
This talk shows how to use Brevitas for an end‑to‑end quantization flow, showcasing how its flexibility enables new research directions.
Speakers
avatar for Pablo Monteagudo Lago

Pablo Monteagudo Lago

Research Scientist, AMD
Pablo Monteagudo is a research scientist in AMD Research and Advanced Development, based in Dublin. He specialises in co-design of neural networks and accelerators, in particular, working on topics involving neural network quantization, sparsity and accelerator design.
Tuesday April 7, 2026 14:45 - 15:10 CEST
Junior Stage
  Frameworks & Compilers

14:45 CEST

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA
Tuesday April 7, 2026 14:45 - 15:10 CEST
Rapid advances in AI have expanded the range of capabilities required for successful real-world deployment. Understanding where we are in this multi-dimensional frontier is essential for accelerating innovation through effective quality assurance. Rigorous evaluation is increasingly difficult to scale as development requires testing many checkpoints across numerous benchmarks. Model comparison is further complicated by limited transparency of reported results. This talk explores challenges, best practices, and open-source tools that elevate evaluation to a core component of LLM development, delivering continuous signals across the model lifecycle.
We discuss principles for standardizing evaluation methods and improving consistency through practical patterns and anti-patterns, and examples of integrating the science of evaluation directly into model development. Using Nemo-Evaluator, an open-source scalable evaluation tool, we demonstrate modular architectures that enable transparent, reproducible measurement. Finally, we show how Nemo-Evaluator supports reproducible evaluation for the Nemotron model family, helping enable one of the most open development processes in modern AI.
Speakers
avatar for Grzegorz Chlebus

Grzegorz Chlebus

Manager R&D, NVIDIA
Grzegorz Chlebus is a Manager at Frontier Model Evaluation at NVIDIA, where he leads tooling and infrastructure efforts for evaluating frontier AI models. He holds a PhD in Medical Sciences from Radboud University Nijmegen, focused on deep learning-based medical image segmentation... Read More →
Tuesday April 7, 2026 14:45 - 15:10 CEST
Central Room
  GenAI & Multimodal

15:00 CEST

Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology
Tuesday April 7, 2026 15:00 - 15:10 CEST
Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures.
Moreover, existing tensor parallelism strategies typically replicate input data across all processes, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting, medical imaging, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization.
Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations, convolutions, linear layers, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP, FSDP, and Megatron-LM approaches.
Speakers
avatar for Deifilia Kieckhefen

Deifilia Kieckhefen

Doctoral Researcher, Karlsruhe Institute of Technology
Deifilia Kieckhefen is a doctoral researcher at the Karlsruhe Institute of Technology. She works on scalable and distributed training of neural network architectures.
Tuesday April 7, 2026 15:00 - 15:10 CEST
Founders Cafe
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

15:40 CEST

Lightning Talk: Graph Based Pipeline Parallelism - Sanket Purandare, Meta & Simon Fan, Meta PyTorch
Tuesday April 7, 2026 15:40 - 15:50 CEST
Pipeline parallelism is vital for large models, but advanced schedules for SOTA LLMs are difficult to express in current PyTorch. MoE communication dominates the critical path, making latency hiding essential. Leading systems use fw-bw overlapping; fw-fw and bw-bw overlapping further boost throughput.

Schedules like ZeroBubbleV and DualPipeV rely on dI-dW backward splitting for fine-grained overlap. However, eager-mode implementations require a patchwork of fragile integrations (multi-threading, custom autograd functions, activation checkpointing, etc.) that rely on implicit behavior and hand-written logic with poor torch.compile compatibility and upstream composability.

We present Graph-Based PP: stages are compiled to reusable FX graphs executed via an explicit schedule language. Users write standard PyTorch code while specifying schedules at varying granularity; all manipulations run as graph passes, abstracting complexity away from user code and into the compiler/runtime, allowing for greater composability.

We have integrated Graph-PP into TorchTitan and AutoParallel on real MoE workloads, targeting upstream inclusion in torch.distributed.
Speakers
avatar for Simon Fan

Simon Fan

Software Engineer, Meta
I work on the PyTorch team at Meta, focusing on distributed training efficiency.
avatar for Sanket Purandare

Sanket Purandare

Research Engineer, Meta
Currently, Sanket serves as a Research Engineer at Meta's SuperIntelligence Lab, in PyTorch Distributed and Compiler team. He specializes in performance optimization of large scale training of LLMs based on Mixture of Experts architectures.

Prior to this he obtained his PhD in A... Read More →
Tuesday April 7, 2026 15:40 - 15:50 CEST
Master Stage
  Frameworks & Compilers

15:40 CEST

Lightning Talk: Cross-Region Model Serving: PyTorch Inference, Observability & LLMOps - Suraj Muraleedharan, Amazon Web Services
Tuesday April 7, 2026 15:40 - 15:50 CEST
As PyTorch models move to production, organizations face a critical challenge: deploying, monitoring, and operating inference at scale across multiple regions. Single-region serving is well-understood, but multi-region LLMOps—model distribution, observability, failover, and cost management—remains ad-hoc and challenging for multiple customers.

This session presents production-tested architectures for multi-region PyTorch inference and LLMOps workflows. We cover:

Serving: Multi-region TorchServe/KServe on Kubernetes with latency-based routing, blue-green deployments, model versioning, and automated failover with circuit breakers.

Observability: OpenTelemetry distributed tracing, Prometheus/Grafana dashboards for latency, throughput, GPU utilization, and LLM-specific metrics like time-to-first-token and KV-cache hit rate.

LLMOps: CI/CD pipelines for cross-region model deployment with automated rollback, drift detection, and SLO-based alerting.

Attendees leave with serving architectures, dashboards, and deployment pipelines using open-source tooling.
Speakers
avatar for Suraj Muraleedharan

Suraj Muraleedharan

Principal Platform Engineer, Amazon Web Services
Principal Engineer driving technical strategy and building mission-critical foundational platforms for AI, HPC, and distributed systems, bridging the gap between infrastructure, AI research, and product organizations.
Tuesday April 7, 2026 15:40 - 15:50 CEST
Founders Cafe
  Inference & Production

15:40 CEST

Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams - Michael Lazos, Meta
Tuesday April 7, 2026 15:40 - 16:05 CEST
CUDA streams are a widely-used method for parallelizing GPU computation on NVIDIA GPUs. They have long been requested by our users and enable multiple key capabilities - overlapping communication and compute kernels, training on multiple batches in parallel and parallelizing kernels, all of which are needed for achieving SOTA training performance. Another key capability is activation offloading - this can be applied to any model to prevent OOMs by asynchronously storing activations in cpu memory until they are needed by the model.

Before this work, torch.compile previously would graph break on CUDA stream contexts, which can be costly for models that utilize streams. Although workarounds exist (e.g. wrapping stream manipulation into custom ops), these solutions add complexity and create friction in the user experience. By enabling seamless CUDA stream support in PT2, we allow our users to leverage the familiar eager APIs for stream assignment and synchronization directly within torch.compile. This not only simplifies the workflow but also ensures that models using custom streaming patterns can run efficiently out-of-the-box without manual intervention or code restructuring.
Speakers
avatar for Michael Lazos

Michael Lazos

Software Engineer, Meta
Michael Lazos is a software engineer at Meta where he contributes to torch.compile. His expertise spans both graph extraction with TorchDynamo and generating optimized kernels with the backend compiler TorchInductor. Previously, he was at Microsoft contributing to project Brainwave... Read More →
Tuesday April 7, 2026 15:40 - 16:05 CEST
Central Room
  Frameworks & Compilers

16:10 CEST

Build PyTorch to Understand PyTorch - Vijay Janapa Reddi, Harvard University; Andrea Mattia Garavagno, University of Genoa
Tuesday April 7, 2026 16:10 - 16:35 CEST
PyTorch's success depends on more than users—it needs engineers who understand what's inside. Engineers who can debug framework issues, optimize at the systems level, contribute upstream, and build what comes next. But ML education today produces practitioners who call APIs without understanding them. They train models without knowing why Adam needs 3× the memory of SGD, or what happens when they call loss.backward().

TinyTorch is a 20-module open-source curriculum that closes this gap. Students construct PyTorch's core components—tensors, autograd, optimizers, CNNs, transformers—in pure Python, building a complete framework where every operation is code they wrote. By the final module, they don't just use PyTorch; they understand how to build it.

The curriculum uses progressive disclosure, systems-first profiling from Module 01, and build-to-validate milestones—recreating ML breakthroughs from Perceptron (1958) through Transformers (2017), culminating in MLPerf-style benchmarking.

TinyTorch is how we grow the next generation of PyTorch contributors and the engineers who will build what comes after.

Open source: mlsysbook.ai/tinytorch
Speakers
avatar for Vijay Janapa Reddi

Vijay Janapa Reddi

Professor, Harvard University
Vijay Janapa Reddi is a Professor at Harvard University, where he leads research at the intersection of machine learning and computer systems. He is the author of the open-source Machine Learning Systems textbook (mlsysbook.ai) and co-founder of MLCommons, the organization behind... Read More →
avatar for Andrea Mattia Garavagno

Andrea Mattia Garavagno

Research Fellow, University of Genoa & Scuola Superiore Sant'Anna
I am a Research Fellow holding a joint position at the University of Genoa and Scuola Superiore Sant'Anna. My research is centered on Edge AI, where I am currently working to automate the design of applications through Hardware-Aware Neural Architecture Search (NAS). By running these... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Central Room
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

16:10 CEST

Optimizing Reinforcement Learning at Trillion-Parameter Scale - Songlin Jiang, Aalto University & Mind Lab
Tuesday April 7, 2026 16:10 - 16:35 CEST
This talk will dive into how we implemented and optimized reinforcement learning on trillion-parameter Mixture-of-Experts reasoning models using veRL, Megatron-Bridge and vLLM. The session is useful to anyone building large-scale RL training systems.

For the first part, I will walk through the system design required to make RL work at this scale using LoRA: how LoRA adapters are implemented for expert layers, how adapters are sharded and fused under tensor/pipeline/expert parallelism, and most importantly, how refit (parameter sync) is implemented for LoRA between training backend (Megatron) and rollout engine (vLLM).

The second part of the talk focuses on training–inference mismatch in MoE RL. I will explain why common mitigations such as clipping and importance sampling can fail, and how we implement fixed Router Replay R3 across vLLM, veRL, and Megatron to align routing decisions between rollout and training.

These works are done together with Mind Lab and some of the related blog posts are at:
- https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- https://macaron.im/mindlab/research/router-replay-r3-why-it-failed-and-how-we-fixed-it
Speakers
avatar for Songlin Jiang

Songlin Jiang

Doctoral Researcher, Aalto University & Mind Lab
I am a doctoral researcher at Aalto University, focusing on reducing training and inference latency for Reinforcement Learning and Large Language Models (LLMs) on High-Performance Computing (HPC) clusters. I am also a passionate free software developer, a maintainer of VeRL, and a... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Junior Stage
  Training Systems

16:40 CEST

Lightning Talk: TerraKit: Standardising AI-Ready Geospatial Data Preparation for the TorchGeo Ecosystem - Rosie Lickorish & Romeo Kienzler, IBM
Tuesday April 7, 2026 16:40 - 16:50 CEST
With the advent of geospatial foundation models, unexplored use cases are emerging that require well-curated datasets. Currently, no standardised approach exists for creating such AI-ready geospatial datasets. In this session, we introduce TerraKit: a comprehensive open-source Python library for retrieving, and processing geospatial data, that seamlessly integrates with upstream geospatial model training libraries such as TorchGeo or TerraTorch.

From raster/vector annotations, TerraKit will match, download, process, align and split the requested data source (e.g., EarthData, CDSE, Planetary Computer) based on user specifications provided by a simple configuration file. TerraKit also supports spatial train/val splits and exports datasets in standard formats such as TACO datasets. TerraKit streamlines the pipeline from raw EO data to AI-ready datasets, accelerating the development of custom geospatial applications, and ensuring query and processing pipelines are reproducible. By lowering the barrier to entry, a wider community of TorchGeo and TerraTorch users are empowered to leverage foundation models for Earth observation.
Speakers
avatar for Romeo Kienzler

Romeo Kienzler

AI Research Engineer, IBM
Romeo is a data scientist working for IBM Research and an advocate for ethical machine learning, transparency and privacy
avatar for Rosie Lickorish

Rosie Lickorish

Research Software Engineer, IBM
Rosie is a Research Software Engineer at IBM, specializing in the development of next-generation tools and technologies designed to drastically accelerate solutions for today’s most urgent global challenges. Her technical focus involves leveraging geospatial data, AI models... Read More →
Tuesday April 7, 2026 16:40 - 16:50 CEST
Central Room
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

16:40 CEST

Optimizing PyTorch on CPU-GPU Coherent Platforms - Matthias Jouanneaux, Nvidia
Tuesday April 7, 2026 16:40 - 17:05 CEST
In recent years, both Nvidia and AMD have introduced hardware coherent platforms: GH200, GB200 and MI300A. These coherent platforms provide both many new features and challenges for PyTorch applications attempting to make the most out of the platform.
This talk will focus on Nvidia's GB200 and walk through techniques to utilize the features of the coherent architecture in PyTorch, such as the high CPU-GPU interconnect bandwidth, unified memory, as well as the advantages and caveats of sharing system memory between CPU and GPU.
Speakers
avatar for Matthias Jouanneaux

Matthias Jouanneaux

Sr Software Engineer - PyTorch, NVIDIA
After his master’s degree, Matthias Jouanneaux worked at Konica Minolta's european research lab on medical image analysis using deep learning for 2 years.
He then joined Nvidia, focusing on optimizing application performance for Nvidia hardware as a Developer Technology enginee... Read More →
Tuesday April 7, 2026 16:40 - 17:05 CEST
Founders Cafe
  Frameworks & Compilers

16:55 CEST

Lightning Talk: Bayesian Neural Networks With Variational Inference in PyTorch - Lars Heyen, Karlsruhe Instute of Technology, Scientific Computing Center
Tuesday April 7, 2026 16:55 - 17:05 CEST
Uncertainty quantification is becoming more and more important as neural networks are used for increasingly critical tasks. Bayesian neural networks (BNNs) inherently provide a measure of their own uncertainty, but can be either hard to implement or inflexible if one uses common frameworks. In this session I discuss how to efficiently implement BNNs using Variational Inference within PyTorch and present torch_blue, a light-weight open source library that implements these methods with the goal of being easy to pick up, yet flexible enough for research on BNNs.
Speakers
avatar for Lars Heyen

Lars Heyen

PostDoc, Karlsruhe Institute of Technology
I am a postdoctoral researcher working on uncertainty quantification in the research group "Robust and Efficient AI" at the Scientific Computing Center of the Karlsruhe Institute of Technology. I also coauthored the PyTorch-based library torch_blue for implementing Bayesian neural... Read More →
Tuesday April 7, 2026 16:55 - 17:05 CEST
Central Room
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Level
  • Slides Attached
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -