Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Company: Any clear filter
Tuesday, April 7
 

09:00 CEST

Keynote: Co-Evolution: How the Open Source Intelligence Stack Compounds - Mark Collier, Executive Director, PyTorch Foundation, General Manager, AI & Infrastructure, Linux Foundation
Tuesday April 7, 2026 09:00 - 09:10 CEST
Agentic coding systems have crossed a threshold from experimentation to measurable economic impact. Their rapid adoption reveals a deeper shift: modern AI capability emerges from the co-evolution of models, training frameworks, inference engines, reinforcement systems, hardware, and cloud infrastructure, with open source enabling the flow of code, research, and operational knowledge across the stack. As performance gaps narrow and costs fall, this compounding intelligence system accelerates innovation and spreads capability across companies, industries, and hardware platforms, raising a simple question for the community: how fast do we want to evolve?
Speakers
avatar for Mark Collier

Mark Collier

Executive Director, PyTorch Foundation, General Manager, AI & Infrastructure, The Linux Foundation

Tuesday April 7, 2026 09:00 - 09:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:10 CEST

Keynote: PyTorch Updates - Edward Yang, Research Engineer, Meta
Tuesday April 7, 2026 09:10 - 09:30 CEST

Speakers
avatar for Edward Yang

Edward Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.
Tuesday April 7, 2026 09:10 - 09:30 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:35 CEST

Keynote: Community Led Open Source RL - Joe Spisak, VP of Product & Head of Open Source, Reflection AI
Tuesday April 7, 2026 09:35 - 09:45 CEST

Speakers
avatar for Joe Spisak

Joe Spisak

VP of Product & Head of Open Source, Reflection AI
Joe Spisak is Product Director for AI at Meta with leadership roles in PyTorch, Llama and FAIR research. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, building developer tools... Read More →
Tuesday April 7, 2026 09:35 - 09:45 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

09:45 CEST

Sponsored Keynote: From One Node to Distributed Training and Inference. How the PyTorch Ecosystem Changed AI - Ramine Roane, Corporate Vice President of AI Product Management and Ecosystem Development, AMD
Tuesday April 7, 2026 09:45 - 09:50 CEST
PyTorch has evolved from a research framework into a distributed-first platform powering production AI at massive scale. As models grow to hundreds of billions of parameters, this talk explores the challenges of scaling inference across nodes and the emerging ecosystem from Monarch and TorchTitan to open, hardware-agnostic systems that makes it possible.
Speakers
avatar for Ramine Roane

Ramine Roane

Corporate Vice President of AI Product Management and Ecosystem Development, AMD
Ramine Roane is the Corporate Vice President of AI Product Management and ecosystem development at AMD, based in San Jose, California. Prior to this role, he served as Vice President of Data Center Acceleration within AMD’s Adaptive and Embedded Computing Group in 2022. Before the... Read More →
Tuesday April 7, 2026 09:45 - 09:50 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

09:55 CEST

Keynote: Stream Everything - Moving from Request input to Streaming input - Patrick von Platen, Research Engineer, Mistral AI
Tuesday April 7, 2026 09:55 - 10:10 CEST

Speakers
avatar for Patrick von Platen

Patrick von Platen

Research Engineer, Mistral AI
Patrick von Platen is a Research Engineer at Mistral AI, focussed on natural language processing and scalable AI systems. Currently, he contributes to vLLM, is a former core maintainer of Transformers, and created Diffusers.
Tuesday April 7, 2026 09:55 - 10:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

10:10 CEST

Sponsored Keynote: Any [ Agent | Model | Accelerator | Cloud ]. Open Source AI Unlocks the World's Potential - Maryam Tahhan, Principal Engineer & Nicolò Lucchesi, Senior Machine Learning Engineer, Red Hat
Tuesday April 7, 2026 10:10 - 10:15 CEST
Red Hat is shaping an open future for AI, delivering on the promise of 'Any Agent, Any Model, Any Accelerator, Any Cloud.' Discover the community advancements contributed in the PyTorch Foundation that empower enterprises to rapidly enable, test, and seamlessly scale AI workloads across their choice of infrastructure
Speakers
avatar for Maryam Tahhan

Maryam Tahhan

Principal Engineer, Red Hat
Maryam is a Principal Engineer in Red Hat's Office of the CTO, where she focuses on standardising CPU inferencing performance evaluation to help effectively validate and scale ML workloads.
avatar for Nicolò Lucchesi

Nicolò Lucchesi

Senior Machine Learning Engineer, Red Hat
Nicolò is a Senior Machine Learning Engineer at Red Hat with a background in Deep Learning and Computer Vision. He works on Inference Optimization for vLLM, where he is a maintainer.
Tuesday April 7, 2026 10:10 - 10:15 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

10:15 CEST

Keynote: The Unbearable Lightness of (Agentic) Evaluations - Besmira Nushi, Senior Manager, AI Research, NVIDIA
Tuesday April 7, 2026 10:15 - 10:25 CEST
The discipline of evaluating large language models underwent a major transformation with the rise of general AI capabilities. Today, the field is undergoing yet another challenging transformation following the groundbreaking improvements in agentic tasks, which expect models and systems to plan and take autonomous actions in the real world. Measuring how well models and systems perform in such tasks is however still i) fragile from a methodological perspective, and ii) difficult to scale and generalize across different domains. This talk will first discuss common challenges in reproducing agentic evaluations, including differences in reference implementation, error handling, trajectory post processing, and tooling definitions. Next, it will cover infrastructural requirements that need to be addressed for such evaluations to run efficiently at scale. Finally, we will conclude with a set of (still nascent) best practices that can help alleviate “lightness” and build more consistent measurement pipelines.
Speakers
avatar for Besmira Nushi

Besmira Nushi

Senior Manager - AI Research, NVIDIA
Besmira Nushi is a Senior AI Research Manager at NVIDIA in Zurich, where she leads research on LLM evaluation, model analysis and generalization, and real-world and agentic AI system measurements. Previously, she spent 7+ years at Microsoft Research advancing responsible AI, model... Read More →
Tuesday April 7, 2026 10:15 - 10:25 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

10:30 CEST

Birds of A Feather: Engineering for the EU AI Act: What Should PyTorch Expose Natively? - Roy Saurabh, AffectLog
Tuesday April 7, 2026 10:30 - 11:00 CEST
The EU AI Act introduces concrete technical obligations for ML systems: traceability, risk management, monitoring, and auditability. Today, most of this burden is handled outside the ML framework—through ad-hoc tooling, documentation, or bespoke infrastructure.

This Birds of a Feather session is an open, practitioner-driven discussion on a forward-looking question:
What primitives, hooks, or abstractions should PyTorch expose natively to better support AI accountability and regulatory readiness?

Topics for discussion may include:
- Native support for provenance, lineage, and training/inference traces
- Standardized hooks for fairness, robustness, and drift monitoring
- Model and dataset metadata as first-class PyTorch objects
- Privacy-preserving logging and zero-retention execution patterns

Gaps between regulatory requirements (e.g. EU AI Act) and current ML frameworks
The goal is not consensus, but shared understanding and concrete ideas that can inform community practices, tooling, and potential upstream contributions. This BoF is intended for PyTorch users, maintainers, researchers, and infra engineers interested in the future of responsible, production-grade ML.
Speakers
avatar for Roy Saurabh

Roy Saurabh

Président, AffectLog
Roy Saurabh is Founder & CEO of AffectLog and an applied researcher in AI governance, privacy engineering, and accountable ML systems. He has worked with UNESCO, the European Commission, and national governments on operationalising trustworthy AI, and leads EU-funded projects focused... Read More →
Tuesday April 7, 2026 10:30 - 11:00 CEST
Open Platform
  Birds of A Feather
  • Audience Level Any

10:30 CEST

Meet the Developers of PyTorch Module Maintainers
Tuesday April 7, 2026 10:30 - 11:00 CEST
These sessions give participants an opportunity to meet some of the developers leading PyTorch to foster collaboration, gather feedback, and inspire contributions and collaboration .

PyTorch core modules (e.g. torch.autograd, torch.optim, torch.nn) form the foundation for most AI research and development, either directly through PyTorch or indirectly via higher-level framework. The core libraries prioritize API stability, backward compatibility, modular design, and simplicity.
Speakers
avatar for Edward Yang

Edward Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.
avatar for Alban Desmaison

Alban Desmaison

Research Engineer, Meta

avatar for Driss Guessous

Driss Guessous

Machine Learning Engineer, Meta
I am currently a machine learning engineer working on core development of PyTorch. I received my Masters in Computer Science from the University of Illinois at Urbana-Champaign. I received a dual degree in Physics and Applied Mathematics from The Ohio State University. I also won... Read More →
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →
avatar for Natalia Gimelshein

Natalia Gimelshein

Software Engineer, Meta
Natalia Gimelshein is a software engineer at Meta. She is one of the pytorch leads, and works on GPU performance and support, including low precision, distributed and symmetric memory.
avatar for Jason Ansel

Jason Ansel

Research Scientist, Meta
Jason Ansel is a Research Scientist at Meta AI and a technical lead for PyTorch compilers. He started the TorchDynamo and TorchInductor projects, which bring flexible graph capture and a high performance compiler to PyTorch 2. He received a Ph.D. from MIT and has over 15 years of... Read More →
Tuesday April 7, 2026 10:30 - 11:00 CEST
Open Platform
  Meet the Developers
  • Audience Level Any

11:00 CEST

Lights, Camera, Inference! Video Generation as a Service With VLLM-Omni - Ricardo Noriega, Red Hat & Doug Smith, Red Hat, Inc
Tuesday April 7, 2026 11:00 - 11:25 CEST
LLMs made for text generation as a service. What does it take to do the same for video?
We built an experimental Video Generation as a Service stack using vLLM-Omni and the LTX-2 open weights video model to explore how far an open, multimodal stack can go toward production use. We’ll share what worked, what busted, and what it takes to treat generative video as a first-class workload.
vLLM is known for high-performance autoregressive inference, and vLLM-Omni extends that foundation to multimodal inputs and outputs. We pushed those capabilities further by adding support for LTX-2, extending the OpenAI-compatible API surface, integrating with front ends, and packaging for scalable deployment. We’re here to walk you through and get you familiar with the touch points for just how we put all the Legos together with vLLM-Omni.
Finally, we’ll examine the gap between novelty demos and real applications: going from quirky spaghetti eating videos to generating consistent characters, personalized media, customized video game cutscenes, and interactive storytelling, and highlight what’s still missing to make generative video truly production-ready.
Speakers
avatar for Doug Smith

Doug Smith

Principal Software Engineer, Red Hat
Doug Smith is a Principal MLOps Engineer at Red Hat, where he works on the AI Inference Server team and contributes upstream to the vLLM project through its CI Special Interest Group. Recently, he's also been looking into contributions to vLLM-Omni. He’s spent years bridging telecom... Read More →
avatar for Ricardo Noriega

Ricardo Noriega

Principal SW Engineer, Red Hat
Ricardo is a Principal Software Engineer working at the Red Hat's Office of the CTO in the Emerging Technologies organization. Ricardo is currently focused on AI multimodality and researching the benefits of Small Language Models.
He is a former member of the Akraino TSC and PTL of the Kubernetes-Native-Infrastructure blueprint family, and contributor to Kubernetes, OpenStack, OpenDaylight and OPNFV... Read More →
Tuesday April 7, 2026 11:00 - 11:25 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

11:15 CEST

Lightning Talk: Deep Learning in the Wild: Embedded PyTorch for Real-World Conservation Bioacoustics - Taraqur Rahman & Owen O'Donnell, OWL Integrations
Tuesday April 7, 2026 11:15 - 11:25 CEST
Passive acoustic monitoring is a powerful tool for wildlife conservation, but deploying deep learning models in remote rainforest environments introduces strict constraints on power, memory, and compute. In this talk, we present an end-to-end PyTorch-based pipeline for detecting and analyzing the endangered three-wattled bellbird using embedded deep learning systems.

We cover the full lifecycle from audio preprocessing and model training in PyTorch to optimization and deployment on resource-constrained embedded devices. Topics include model architectures for sparse bioacoustic event detection, handling extreme class imbalance, model compression and quantization, and practical trade-offs between accuracy, latency, and power consumption.

The session emphasizes real-world lessons learned deploying machine learning at the edge, where unreliable connectivity, noisy signals, and limited hardware define success more than benchmark metrics. Attendees will gain practical patterns for building and deploying PyTorch models for embedded and edge AI applications with real environmental impact.
Speakers
avatar for Owen O'Donnell

Owen O'Donnell

Embedded Systems and Machine Learning Engineer, OWL Integrations
Owen O'Donnell is a Machine Learning and Embedded Systems Engineer at OWL integrations. He works with training ML models to deploy in remote locations that will be running on resource constrained electronics. This introduces challenges such as needing smaller sized models and having... Read More →
avatar for Taraqur Rahman

Taraqur Rahman

Chief Data Scientist, OWL Integrations
Taraqur Rahman is Chief Data Scientist and Co-Founder at OWL Integrations and Organizer/Co-Founder of Biased Outliers, where he leads applied machine learning and data science initiatives with real-world impact. He combines deep technical expertise in Python with practical deployment... Read More →
Tuesday April 7, 2026 11:15 - 11:25 CEST
Central Room
  Applications & Case Studies
  • Audience Level Any
  • Slides Attached Yes

11:30 CEST

Lightning Talk: How DeepInverse Is Solving Imaging in Science and Healthcare With PyTorch - Andrew Wang, DeepInverse; Minh Hai Nguyen, Université de Toulouse
Tuesday April 7, 2026 11:30 - 11:40 CEST
Deep learning has revolutionised imaging, a foundation of science and healthcare. DeepInverse is the PyTorch library for solving imaging problems, unifying deep learning methods (e.g. diffusion models), physics (medical, optics) and modern tooling. In this talk, we’ll show how the PyTorch community can get involved in this exciting yet accessible application of open-source AI.

AI methods in imaging must model the imaging physics, leading to interesting engineering problems e.g. efficient differentiable ops, physics-informed losses. We’ll show notebooks on real use-cases: accelerating brain MRI, reducing radiation in CT scans, imaging black holes.

PyTorch enthusiasts at any level/background can contribute - from training infra for scientific data to high-level generative modelling frameworks - their AI engineering skills can directly impact imaging across multiple fields.

DeepInverse is supported by a growing international user community and proudly rooted in Paris. We’ve joined the PyTorch Ecosystem and received the Prix Science Ouverte in 2024. We’re excited to join the PyTorch Conf to celebrate the vibrant French developer community!
Speakers
avatar for Andrew Wang

Andrew Wang

CTO & Co-founder, Blur Labs
Andrew is a lead developer of DeepInverse as well as the CTO & co-founder of Blur Labs, a startup based in Paris building AI models for imaging. Andrew did his PhD at the University of Edinburgh in magnetic resonance image reconstruction.
avatar for Minh Hai Nguyen

Minh Hai Nguyen

PhD candidate, Toulouse University
Tuesday April 7, 2026 11:30 - 11:40 CEST
Central Room
  Applications & Case Studies
  • Audience Level Any
  • Slides Attached Yes

11:45 CEST

Lightning Talk: ExecuTorch on Microcontrollers: Deploying PyTorch To the Smallest Edge - RJ Ascani & Matthias Cremon, Meta
Tuesday April 7, 2026 11:45 - 11:55 CEST
ExecuTorch extends PyTorch's reach to the most resource-constrained devices: microcontrollers, DSPs, and specialized neural processing units powering always-on sensors, wearables, and embedded systems. In this talk, we'll share the current state and roadmap for running ExecuTorch on platforms where every kilobyte of memory and milliwatt of power matters.

What you'll learn:
- How ExecuTorch's design enables deployment from ultra-low-power MCUs to DSP and NPU accelerators, all from a single PyTorch workflow
- The state of backend support for Cadence DSPs, ARM Ethos-U and Cortex-M
- Practical considerations for deploying models with sub-megabyte footprints and milliwatt power budgets
- Case studies spanning always-on audio, embedded vision, and TinyML applications
Speakers
avatar for Matthias Cremon

Matthias Cremon

Software Engineering Manager, Meta
Matthias Cremon is a Software Engineering Manager at Meta in the Silicon AI Software Team, working on AI compilers for various edge devices. He focuses on the frontend, graph level optimization side, as well as the integration of low-level, vendor specific implementations to run on... Read More →
avatar for RJ Ascani

RJ Ascani

Software Engineer, Meta
RJ Ascani is an embedded software engineer on Meta’s PyTorch Edge team, focusing on advancing ExecuTorch for microcontroller platforms.
Tuesday April 7, 2026 11:45 - 11:55 CEST
Central Room
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

12:00 CEST

Lightning Talk: Ethical, Privacy and Sustainability Considerations in PyTorch Systems - Paula Mesa Macias, Pau&Company
Tuesday April 7, 2026 12:00 - 12:10 CEST
PyTorch models are part of larger systems that handle data, logs, APIs and other services. Ethical, privacy, security and environmental considerations appear not only around the AI itself, but across the whole system.
Using the Ethical Software Framework and the Ethical IT Audit, this session explores practical ways to think about these issues in real workflows. It highlights situations where decisions in data handling, model deployment, logging or infrastructure have ethical, compliance or sustainability implications. It also shows considerations for using AI responsibly, such as dataset choices, bias awareness and evaluating risks before deployment.
The goal is to provide a clear, structured lens for identifying risks and trade-offs, making ethical, privacy, security, and sustainability concerns easier to discuss in everyday work.
Speakers
avatar for Paula Mesa Macias

Paula Mesa Macias

Founder and Ethical Technology Consultant, Pau&Company
Founder of Pau&Company (https://pau.company/), which offers Ethical IT Audits (https://pau.company/ethical-it-audit/) based on the Ethical Software Framework (https://pau.company/ethical-software-framework/), Paula focuses on ethical considerations in technology. Through Pau&Company... Read More →
Tuesday April 7, 2026 12:00 - 12:10 CEST
Founders Cafe

12:00 CEST

Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy - Ankita Luthra & Trinadh Kotturu, Google
Tuesday April 7, 2026 12:00 - 12:10 CEST
As PyTorch models scale to billions of parameters, the bottleneck has quietly shifted from compute to storage. Modern GPU clusters often sit idle, "starving" for data while waiting on legacy REST-based protocols. This talk introduces Rapid Storage: a fundamental architectural shift bringing Google’s Colossus stateful protocol (that powers many Google’s products) to PyTorch via fsspec , a common Pythonic file interface used by many frameworks within PyTorch ecosystem.
By bypassing REST APIs entirely via persistent gRPC streams to the storage layer, we eliminate protocol overhead. In this talk, we also dive into how Rapid achieves <1ms random read/write latency, 20x faster data access, and a massive 6 TB/s of aggregate throughput. Crucially, it delivers up to 10x lower tail latency for random I/O, preventing the stragglers that often stall distributed training jobs.
Beyond raw speed, we will deconstruct the integration with gcsfs and the broader fsspec ecosystem. This ensures that high-performance I/O is available across the entire data stack including Dask, Ray, HF Datasets and vLLM etc. Join us to learn how to stop wasting GPU cycles and achieve linear scaling in the cloud.
Speakers
avatar for Ankita Luthra

Ankita Luthra

Senior Software Engineer, Google
Ankita Luthra is a Software Developer at Google, focused on AI/ML infrastructure and scalable data pipelines. Her work with open-source tools like fsspec(gcsfs) and gcsfuse improves how frameworks such as PyTorch/ JAX efficiently access data from Google Cloud Storage.
avatar for Trinadh Kotturu

Trinadh Kotturu

Senior Product Manager, Google
Trinadh Kotturu is a Senior Product Manager specializing in AI/ML and analytics client strategy at Google. An alumnus of IIM Bangalore with 12 years of experience, he has a proven track record of shipping v1 products and scaling them into robust platform services. His expertise spans large-scale distributed storage systems, autonomous driving, and system resiliency... Read More →
Tuesday April 7, 2026 12:00 - 12:10 CEST
Master Stage
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

12:00 CEST

Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face
Tuesday April 7, 2026 12:00 - 12:25 CEST
The Hugging Face transformers library is built on pure PyTorch and can be succinctly described as a model-definition framework. It provides an unified, familiar, clear and concise interface to multiple machine learning architectures across modalities.

Serving and inference optimizations are not its focus.

However, transformers model definitions become the de-facto reference implementations multiple other projects use. This includes training libraries, fast deployment engines such as vLLM and SGLang, and on-device libraries like MLX and llama.cpp.

This session describes the path towards increasingly simpler downstream integration of transformers models into inference and deployment libraries, and how transformers and PyTorch core features enable the ecosystem to enjoy newly-released models as soon as they are released.

We'll go through the journey towards easier modeling, which implies easier downstream porting and adaptation. The end-game is pure interoperability, where no code changes are required! This is now possible with vLLM and SGLang, and we'll show how. We'll end up discussing our ideas on upcoming interop features with MLX and llama.cpp.
Speakers
avatar for Pedro Cuenca

Pedro Cuenca

ML Engineer, Hugging Face
Pedro Cuenca is a machine learning engineer at Hugging Face, working in developer advocacy and on-device ML. He has 20+ years of software development experience across internet applications and iOS. He worked on the technology behind Camera+, an iPhone app using custom ML for photography... Read More →
Tuesday April 7, 2026 12:00 - 12:25 CEST
Central Room

13:45 CEST

Why WideEP Inference Needs Data-Parallel-Aware Scheduling - Maroon Ayoub, IBM; Tyler Michael Smith, Red Hat
Tuesday April 7, 2026 13:45 - 14:10 CEST
WideEP—wide expert parallelism fails not because experts are expensive, but because routing ignores where state already lives. In PyTorch LLM serving with vLLM, WideEP fans tokens across many experts while KV caches accumulate unevenly across data-parallel replicas. When routing is unaware of KV placement and per-replica load, requests land on replicas that cannot reuse cache or make progress efficiently and latency spikes as expert fan-out grows.
The fix is not reshaping expert parallelism, but making routing data-parallel aware using signals vLLM already exposes. In this talk, we show how llm-d extends its router to leverage KV-cache locality and load awareness when routing WideEP flows. Rather than treating replicas as interchangeable, the router prefers replicas with warm KV state and available capacity, aligning routing decisions with vLLM’s execution reality and reducing cache fragmentation.
This session walks through how KV-aware, data-parallel routing changes WideEP inference in practice: which signals matter, how routing behavior evolves, and where the gains come from. Attendees leave with a clear mental model for when KV- and load-aware routing unlocks higher throughput.
Speakers
avatar for Maroon Ayoub

Maroon Ayoub

Research Scientist & Architect, IBM Research
Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.
avatar for Tyler Michael Smith

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat
Tyler received a PhD in Computer Science at The University of Texas at Austin, studying high performance dense linear algebra - microkernels, parallelism, and theoretical lower bounds on data movement.. After a postdoc at ETH Zürich, he joined Neural Magic, first working on a graph... Read More →
Tuesday April 7, 2026 13:45 - 14:10 CEST
Central Room

14:15 CEST

Lightning Talk: Accelerating On-Device ML Inference With ExecuTorch and Arm SME2 - Jason Zhu, Arm
Tuesday April 7, 2026 14:15 - 14:25 CEST
As on-device AI workloads grow in complexity, achieving low-latency inference within mobile power constraints remains a central challenge. We examine how ExecuTorch, combined with Arm’s Scalable Matrix Extension 2 (SME2), enables efficient CPU deployments of production AI workloads. We present a case study of SqueezeSAM, a segmentation model deployed in real-world mobile applications. Using ExecuTorch with XNNPACK delegation and SME2-optimized kernels, we evaluate INT8 and FP16 inference on a flagship smartphone. Moving beyond aggregate latency, we apply operator-level profiling to decompose runtime across convolution, GEMM, elementwise, and data movement operators, showing how hardware acceleration reshapes bottlenecks in the execution stack. SME2 delivers up to 3.9x end-to-end speedup on a single CPU core, materially altering runtime composition and revealing data movement as the primary post-acceleration bottleneck. This session presents a practical workflow for deploying, profiling, and systematically optimizing on-device PyTorch models, demonstrating how SME2 expands the viable design space for interactive mobile AI.
Speakers
avatar for Jason Zhihuai Zhu

Jason Zhihuai Zhu

Senior Principal Engineer, Arm
Jason Zhu is a Senior Principal Engineer at Arm focused on hardware and software co-optimization for AI systems. With a background in quantum physics and experience spanning AI research and product engineering across major technology companies, he works across the full execution stack... Read More →
Tuesday April 7, 2026 14:15 - 14:25 CEST
Master Stage
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

14:30 CEST

Lightning Talk: Combo Kernels: Horizontal Fusion Optimization in Torch.compile - Karthick Panner Selvam, & Elias Ellison, Meta
Tuesday April 7, 2026 14:30 - 14:40 CEST
Combo kernels are a compiler optimization in PyTorch Inductor that horizontally fuses multiple independent operations into a single Triton kernel launch, reducing GPU kernel launch overhead and improving memory locality.

The Problem: Models generate many small, independent operations like weight preprocessing and tensor copies. Each launch incurs overhead. For models with many such operations, this becomes a bottleneck.

The Solution: Combo kernels combine multiple operations into one kernel using a dispatch mechanism. A single program ID routes execution to the appropriate subkernel based on cumulative block boundaries. This eliminates redundant launches while preserving correctness.

Key Innovations:

Per-subkernel block dimensions: Each subkernel gets its own optimized block size instead of sharing one size across all, enabling better autotuning.

Flattened grid dispatch: We collapse the multi-dimensional block grid into a single dimension.

Results: On H100 GPUs, combo kernels deliver geomean speedups of +7.38% for HuggingFace, and +5.97% for TorchBench. The optimization is enabled by default in the vLLM repository for LLM inference acceleration.
Speakers
avatar for Elias Ellison

Elias Ellison

Software Engineer, Meta
Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack
avatar for Karthick Panner Selvam

Karthick Panner Selvam

Software Engineer, Meta
Karthick Panner Selvam is a SWE at Meta Superintelligence Lab, working on the PyTorch compiler team to enhance performance and scalability for large models. He earned his PhD in Machine for Systems at the University of Luxembourg, collaborating with Google DeepMind, ECMWF, and Frontier... Read More →
Tuesday April 7, 2026 14:30 - 14:40 CEST
Master Stage
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

15:00 CEST

Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology
Tuesday April 7, 2026 15:00 - 15:10 CEST
Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures.
Moreover, existing tensor parallelism strategies typically replicate input data across all processes, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting, medical imaging, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization.
Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations, convolutions, linear layers, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP, FSDP, and Megatron-LM approaches.
Speakers
avatar for Deifilia Kieckhefen

Deifilia Kieckhefen

Doctoral Researcher, Karlsruhe Institute of Technology
Deifilia Kieckhefen is a doctoral researcher at the Karlsruhe Institute of Technology. She works on scalable and distributed training of neural network architectures.
Tuesday April 7, 2026 15:00 - 15:10 CEST
Founders Cafe
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

15:10 CEST

Meet the Developers of Helion
Tuesday April 7, 2026 15:10 - 15:40 CEST
This session offers a unique opportunity to connect with the core developers of Helion (https://github.com/pytorch/helion)—ask questions, share feedback, and explore collaboration opportunities with the team.

About Helion
At PTC 2025, we launched Helion (in Beta), a PyTorch-native kernel authoring DSL designed to deliver portable performance across heterogeneous hardware. Since then, Helion has outperformed expert-tuned Triton and CuTe DSL kernels and seen meaningful adoption across research labs, production teams, and OSS frameworks like vLLM.

At PyTorch Conference Europe 2026, we are excited to announce Helion 1.0 (General Availability). Join us to learn how Helion works under the hood and discover what's new in the GA release.

Core Developers
Jason Ansel: Research Scientist, creator of PyTorch Compiler and Helion
Oguz Ulgen: Software Engineer, creator of PyTorch Compiler cache, working on Helion
Will Feng: Software Engineer working on TorchInductor and Helion
Markus Hoehnerbach: Software Engineer focusing on Helion development and kernel authoring

Drop in for an informal discussion, share your experiences, and explore opportunities to collaborate with the team!
Speakers
avatar for Will Feng

Will Feng

Software Engineer, Meta
Will Feng is a Software Engineer in PyTorch Compiler team at Meta. He has been working in PyTorch core and ecosystem for the past 7 years. He is now working on and most excited about torch.compile for distributed training performance.
avatar for Oguz Ulgen

Oguz Ulgen

Software Engineer, Meta
I'm a software engineer at Meta where I used to work on the Hack programming language and now work on PyTorch.
avatar for Jason Ansel

Jason Ansel

Research Scientist, Meta
Jason Ansel is a Research Scientist at Meta AI and a technical lead for PyTorch compilers. He started the TorchDynamo and TorchInductor projects, which bring flexible graph capture and a high performance compiler to PyTorch 2. He received a Ph.D. from MIT and has over 15 years of... Read More →
Tuesday April 7, 2026 15:10 - 15:40 CEST
Open Platform
  Meet the Developers
  • Audience Level Any

15:55 CEST

Lightning Talk: Running ExecuTorch Applications With Silicon Acceleration, in Ultra-low Power - George Gekov, Arm; Aki Makkonen, Alif Semiconductor
Tuesday April 7, 2026 15:55 - 16:05 CEST
Efficient deployment of ML models on low-power embedded systems has been a significant challenge for a number of years. At the same time, these embedded SoCs are all around us—from everyday appliances to the latest smart glasses.

ExecuTorch is a PyTorch-native framework for deploying neural networks on resource-constrained systems. In this session, we show how to build an end-to-end speech recognition application using PyTorch and ExecuTorch—from training a Transformer-based neural network in PyTorch, through quantization, all the way to deployment on a low-power embedded device.

We will introduce the key ExecuTorch APIs for quantization and explain how models are transformed and lowered into a form that can run efficiently on a device. The application is running on the Alif Ensemble E8 SoC, the first implementation of the leading Arm® Ethos-U85 NPU which brings native support for Transformer models to the ultra-low power domain.

Join the experts from Arm and Alif Semiconductor to see how we are bridging the gap between PyTorch and embedded deployment—and how you can bring PyTorch models to silicon-accelerated, ultra-low-power systems.
Speakers
avatar for George Gekov

George Gekov

ML Engineer, Arm
George Gekov is a Staff Software Engineer in Arm’s Machine Learning team, where he focuses on machine learning inference on embedded systems. He has extensive experience deploying neural networks on resource-constrained devices with Neural Processing Units (NPUs) to enable hardware-accelerated... Read More →
avatar for Aki Makkonen

Aki Makkonen

Senior Staff Application Engineer, Alif Semiconductor
Software engineer with background in telecommunication, medical imaging, robotics and embedded systems.
Tuesday April 7, 2026 15:55 - 16:05 CEST
Founders Cafe

16:10 CEST

Build PyTorch to Understand PyTorch - Vijay Janapa Reddi, Harvard University; Andrea Mattia Garavagno, University of Genoa
Tuesday April 7, 2026 16:10 - 16:35 CEST
PyTorch's success depends on more than users—it needs engineers who understand what's inside. Engineers who can debug framework issues, optimize at the systems level, contribute upstream, and build what comes next. But ML education today produces practitioners who call APIs without understanding them. They train models without knowing why Adam needs 3× the memory of SGD, or what happens when they call loss.backward().

TinyTorch is a 20-module open-source curriculum that closes this gap. Students construct PyTorch's core components—tensors, autograd, optimizers, CNNs, transformers—in pure Python, building a complete framework where every operation is code they wrote. By the final module, they don't just use PyTorch; they understand how to build it.

The curriculum uses progressive disclosure, systems-first profiling from Module 01, and build-to-validate milestones—recreating ML breakthroughs from Perceptron (1958) through Transformers (2017), culminating in MLPerf-style benchmarking.

TinyTorch is how we grow the next generation of PyTorch contributors and the engineers who will build what comes after.

Open source: mlsysbook.ai/tinytorch
Speakers
avatar for Vijay Janapa Reddi

Vijay Janapa Reddi

Professor, Harvard University
Vijay Janapa Reddi is a Professor at Harvard University, where he leads research at the intersection of machine learning and computer systems. He is the author of the open-source Machine Learning Systems textbook (mlsysbook.ai) and co-founder of MLCommons, the organization behind... Read More →
avatar for Andrea Mattia Garavagno

Andrea Mattia Garavagno

Research Fellow, University of Genoa & Scuola Superiore Sant'Anna
I am a Research Fellow holding a joint position at the University of Genoa and Scuola Superiore Sant'Anna. My research is centered on Edge AI, where I am currently working to automate the design of applications through Hardware-Aware Neural Architecture Search (NAS). By running these... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Central Room
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

16:10 CEST

On-Device LLM Inference on Android With ExecuTorch and Qualcomm QNN - Shivay Lamba & Kartikey Rawat, Qualcomm
Tuesday April 7, 2026 16:10 - 16:35 CEST
Multimodal models like CLIP are typically deployed in the cloud due to their size and computational demands, limiting their use in latency-sensitive, privacy-preserving, and offline-first applications. This talk demonstrates how one can run fully on-device CLIP inference on Android using ExecuTorch with the Qualcomm QNN backend, enabling real-time vision–language understanding without server dependency.

One can run models like CLIP (ViT-B/32) model entirely on edge devices, leveraging QNN for hardware-accelerated inference. A key focus of the talk is a deep dive into ExecuTorch optimizations for QNN, including graph lowering, operator fusion, quantization strategies, memory planning, and backend-specific execution choices that materially impact latency, memory footprint, and power consumption.

The talk will cover architectural insights, model export and compilation workflows, and real-world benchmarks covering latency, memory usage, and power efficiency. This talk highlights how large multimodal PyTorch models can be made production-ready on edge devices, unlocking new classes of private, offline-capable AI applications.
Speakers
avatar for Shivay Lamba

Shivay Lamba

Senior ML Engineer, Qualcomm
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked at organizations like Amazon, EY, Genpact. He is a Tensorflow.JS SIG member and community lead from In... Read More →
avatar for Kartikey Rawat

Kartikey Rawat

Senior Developer Advocate, Qualcomm
Senior Developer Advocate at Qualcomm| Google Developer Expert in AI and Google Cloud
Tuesday April 7, 2026 16:10 - 16:35 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any

16:40 CEST

Lightning Talk: TerraKit: Standardising AI-Ready Geospatial Data Preparation for the TorchGeo Ecosystem - Rosie Lickorish & Romeo Kienzler, IBM
Tuesday April 7, 2026 16:40 - 16:50 CEST
With the advent of geospatial foundation models, unexplored use cases are emerging that require well-curated datasets. Currently, no standardised approach exists for creating such AI-ready geospatial datasets. In this session, we introduce TerraKit: a comprehensive open-source Python library for retrieving, and processing geospatial data, that seamlessly integrates with upstream geospatial model training libraries such as TorchGeo or TerraTorch.

From raster/vector annotations, TerraKit will match, download, process, align and split the requested data source (e.g., EarthData, CDSE, Planetary Computer) based on user specifications provided by a simple configuration file. TerraKit also supports spatial train/val splits and exports datasets in standard formats such as TACO datasets. TerraKit streamlines the pipeline from raw EO data to AI-ready datasets, accelerating the development of custom geospatial applications, and ensuring query and processing pipelines are reproducible. By lowering the barrier to entry, a wider community of TorchGeo and TerraTorch users are empowered to leverage foundation models for Earth observation.
Speakers
avatar for Romeo Kienzler

Romeo Kienzler

AI Research Engineer, IBM
Romeo is a data scientist working for IBM Research and an advocate for ethical machine learning, transparency and privacy
avatar for Rosie Lickorish

Rosie Lickorish

Research Software Engineer, IBM
Rosie is a Research Software Engineer at IBM, specializing in the development of next-generation tools and technologies designed to drastically accelerate solutions for today’s most urgent global challenges. Her technical focus involves leveraging geospatial data, AI models... Read More →
Tuesday April 7, 2026 16:40 - 16:50 CEST
Central Room
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

16:55 CEST

Lightning Talk: Bayesian Neural Networks With Variational Inference in PyTorch - Lars Heyen, Karlsruhe Instute of Technology, Scientific Computing Center
Tuesday April 7, 2026 16:55 - 17:05 CEST
Uncertainty quantification is becoming more and more important as neural networks are used for increasingly critical tasks. Bayesian neural networks (BNNs) inherently provide a measure of their own uncertainty, but can be either hard to implement or inflexible if one uses common frameworks. In this session I discuss how to efficiently implement BNNs using Variational Inference within PyTorch and present torch_blue, a light-weight open source library that implements these methods with the goal of being easy to pick up, yet flexible enough for research on BNNs.
Speakers
avatar for Lars Heyen

Lars Heyen

PostDoc, Karlsruhe Institute of Technology
I am a postdoctoral researcher working on uncertainty quantification in the research group "Robust and Efficient AI" at the Scientific Computing Center of the Karlsruhe Institute of Technology. I also coauthored the PyTorch-based library torch_blue for implementing Bayesian neural... Read More →
Tuesday April 7, 2026 16:55 - 17:05 CEST
Central Room
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes
 
Wednesday, April 8
 

09:00 CEST

Keynote: PyTorch CTO - Matt White, Global CTO of AI, Linux Foundation
Wednesday April 8, 2026 09:00 - 09:10 CEST
Matt White, Global CTO of AI and CTO at PyTorch Foundation will provide an update on technical strategy, ecosystem and projects and working groups
Speakers
avatar for Matt White

Matt White

Global CTO of AI, Linux Foundation, The Linux Foundation
Matt White is the Executive Director of the PyTorch Foundation and GM of AI at the Linux Foundation. He is also the Director of the Generative AI Commons. Matt has years of experience in applied research and standards in AI and data in telecom, media and gaming industries. Matt is... Read More →
Wednesday April 8, 2026 09:00 - 09:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:10 CEST

Keynote: vLLM & Ray Updates - Tyler Michael Smith, Chief Architect - Inference Engineering, Red Hat & Artur Niederfahrenhorst, Member of Technical Staff,Anyscale
Wednesday April 8, 2026 09:10 - 09:25 CEST

Speakers
avatar for Tyler Michael Smith

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat
Tyler received a PhD in Computer Science at The University of Texas at Austin, studying high performance dense linear algebra - microkernels, parallelism, and theoretical lower bounds on data movement.. After a postdoc at ETH Zürich, he joined Neural Magic, first working on a graph... Read More →
avatar for Artur Niederfahrenhorst

Artur Niederfahrenhorst

Member of Technical Staff, Anyscale
Artur is a member of the technical staff at Anyscale, the company that recently donated Ray to the Linux Foundation. He has been contributing to Ray since early 2022, where his main contributions have been in distributed reinforcement learning. Artur majored in Computer Science at... Read More →
Wednesday April 8, 2026 09:10 - 09:25 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:25 CEST

Keynote: The Hub as Infrastructure. From Open PyTorch Models, to a Safe and Performant Distribution Hub - Lysandre Debut, Chief Open-Source Officer, Hugging Face
Wednesday April 8, 2026 09:25 - 09:40 CEST

Speakers
avatar for Lysandre Debut

Lysandre Debut

Chief Open-Source Officer, Hugging Face
Lysandre is the Chief Open-Source Officer at Hugging Face; ensuring that the ecosystem is as well supported as possible in the ML lifecycle, with open-source tools.

He has been at Hugging Face for the past six years and was the first open-source employee at Hugging Face; working on transformers and the entire stack of Hugging Face open-source libraries since then... Read More →
Wednesday April 8, 2026 09:25 - 09:40 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:45 CEST

Sponsored Keynote: Open Source Infrastructure for the AI Native Era - Jonathan Bryce, Executive Director, Cloud Native Computing Foundation
Wednesday April 8, 2026 09:45 - 09:50 CEST
AI adoption will not be limited by model ideas alone. It will be limited by how fast we can deploy, secure, observe, and scale AI systems in production. Inference is where AI becomes real for most organizations. As AI moves from frontier labs into mainstream production, the operational challenges start to look increasingly cloud native: orchestration, autoscaling, routing, security, policy, and observability. This keynote explores why the next phase of AI adoption will move faster if PyTorch and cloud native communities work together to extend proven open source patterns.
Speakers
avatar for Jonathan Bryce

Jonathan Bryce

Executive Director, Cloud and Infrastructure, The Linux Foundation
Jonathan Bryce is the Executive Director of Cloud & Infrastructure at the Linux Foundation, where he leads both the Cloud Native Computing Foundation (CNCF) and the OpenInfra Foundation—two of the largest and most influential open source communities in the world. With over... Read More →
Wednesday April 8, 2026 09:45 - 09:50 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:50 CEST

Keynote: Gemma 4: Compacting Intelligence for the Edge - Léonard Hussenot, Research Scientist, Google Deepmind
Wednesday April 8, 2026 09:50 - 10:05 CEST
This talk explores the philosophy and engineering behind Gemma 4, arguing that the future of AI isn't only about size, but about "intelligence per byte."
We will dive into why compacting intelligence—maximizing the reasoning and instruction following ability of every single token—is the ultimate bottleneck for truly useful AI. By optimizing for token efficiency and memory footprints, we unlock a new class of applications that are faster, private, and more accessible.
Speakers
avatar for Leonard Hussenot

Leonard Hussenot

Research Scientist, Google Deepmind
I am a Research Scientist at Google DeepMind, where I lead the Gemma post-training team focused on developing the most useful compact models for on-device applications. Since joining Google Brain, I have contributed to the evolution of Bard, Gemini, and Gemma, specializing in scaling... Read More →
Wednesday April 8, 2026 09:50 - 10:05 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

10:05 CEST

Meet the vLLM Maintainers
Wednesday April 8, 2026 10:05 - 10:35 CEST
Meet the core maintainers of vLLM at this session! Come and discuss use cases, features, roadmap with us, or just learn how the vLLM development happens under the hood.
Speakers
avatar for Tyler Michael Smith

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat
Tyler received a PhD in Computer Science at The University of Texas at Austin, studying high performance dense linear algebra - microkernels, parallelism, and theoretical lower bounds on data movement.. After a postdoc at ETH Zürich, he joined Neural Magic, first working on a graph... Read More →
avatar for Nicolò Lucchesi

Nicolò Lucchesi

Senior Machine Learning Engineer, Red Hat
Nicolò is a Senior Machine Learning Engineer at Red Hat with a background in Deep Learning and Computer Vision. He works on Inference Optimization for vLLM, where he is a maintainer.
Wednesday April 8, 2026 10:05 - 10:35 CEST
Open Platform
  Meet the Developers
  • Audience Level Any

10:35 CEST

Lightning Talk: Monarch: An API To Your Supercomputer - Marius Eriksen, Meta
Wednesday April 8, 2026 10:35 - 10:45 CEST
The training systems driving today’s most advanced AIs are distributed, dynamic, and complex. Pre-training relies on layered parallelism and careful fault isolation. Post-training RL spans thousands of GPUs while coordinating verifiers, compilers, and code execution.

Systems complexity pulls focus away from the core algorithms: developers are forced to assemble systems from schedulers, RPC stacks, container orchestrators, observability tooling, service discovery, and app frameworks just to begin work.

Monarch is a distributed programming framework for PyTorch that makes the cluster programmable through a single-program Python API. It exposes the supercomputer as a coherent, directly controllable system—bringing the experience of local development to large-scale training; handling fault tolerance, orchestration, tooling integration, etc.

In this talk, we will demonstrate how Monarch enables developers to focus on training logic rather than glue, extend systems easily, and supervise and debug distributed systems through a unified programming interface.

Attendees will leave with a clear model for building robust, scalable and customizable distributed PyTorch systems using Monarch.
Speakers
avatar for Marius Eriksen

Marius Eriksen

Software Engineer, Meta
Marius Eriksen is a software engineer at Meta, where he works on infrastructure for large-scale training systems.
Wednesday April 8, 2026 10:35 - 10:45 CEST
Master Stage

11:05 CEST

Lightning Talk: KV-Cache Centric Inference: Building a State-Aware Serving Platform With Llm-d and VLLM - Maroon Ayoub & Martin Hickey, IBM Research
Wednesday April 8, 2026 11:05 - 11:15 CEST
We’ve spent years optimizing LLM inference around compute - faster kernels, better batching, smarter parallelism. But in production, the bottleneck increasingly isn’t FLOPs. It’s state. Specifically, the KV-cache: the attention state that makes the difference between a 4-second prefill and a sub-second cache hit. Lose it to eviction, isolate it on a single node, or fail to route to it - and you’re paying the full compute cost again for work already done.

KV-cache centric inference flips the design priority. Instead of treating cache as a byproduct, it becomes the organizing principle of the serving platform. This means tiered memory management - offloading KV blocks from GPU to CPU to shared storage so capacity scales beyond any single node. It means cross-replica visibility - so cached state computed on one instance is reusable by any other. And it means cache-aware scheduling - routing requests to where their prefix already lives.

We cover how llm-d and vLLM implement each layer, how they compose into a coherent system, and what it looks like in practice - with benchmarks, deployment patterns, and lessons from building a KV-cache centric platform in the open.​​​​​​​​​​​​​​​​
Speakers
avatar for Martin Hickey

Martin Hickey

Senior Technical Staff Member, IBM Research
Martin Hickey is a STSM at IBM Research, focused on Open Source, Cloud Native Computing, and AI. Martin has notable contributions to open source projects like vLLM, LMCache, Kubernetes, Helm, OpenTelemetry and OpenStack. Martin is a core maintainer for LMCache and an emeritus core... Read More →
avatar for Maroon Ayoub

Maroon Ayoub

Research Scientist & Architect, IBM Research
Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.
Wednesday April 8, 2026 11:05 - 11:15 CEST
Central Room

11:05 CEST

Bringing PyTorch Monarch to AMD GPUs: Single-Controller Distributed Training on ROCm - Liz Li & Zachary Streeter, AMD
Wednesday April 8, 2026 11:05 - 11:30 CEST
PyTorch Monarch introduces a new distributed programming paradigm that enables developers to orchestrate entire GPU clusters from a single Python program. With its actor-based runtime, process mesh abstraction, and asynchronous execution model, Monarch simplifies large-scale distributed training and enables complex workflows that combine training, evaluation, and reinforcement learning within one unified script.

In this talk, we present our work enabling PyTorch Monarch on AMD Instinct GPUs with ROCm, expanding the single-controller model beyond CUDA environments and bringing this emerging runtime to a broader hardware ecosystem. We describe the engineering effort required to port Monarch’s GPU runtime and distributed communication stack to ROCm, including HIPification of CUDA-specific components, adaptation of memory management and synchronization semantics, and integration with high-performance GPU-to-GPU communication on multi-node clusters through RDMA.

We will share lessons learned from running Monarch workloads on MI300-class clusters, including performance considerations, debugging workflows, and developer experience improvements. Our results demonstrate that Monarch’s architecture can be successfully extended to heterogeneous hardware environments while preserving scalability and ease of use.

This work advances hardware diversity in distributed PyTorch and highlights how portable runtimes can simplify large-scale training while enabling scalable, cluster-wide experimentation across accelerator platforms.
Speakers
avatar for Liz Li

Liz Li

Principal AI engineer, AMD
Liz Li is a Principal AI Engineer in the AMD AI group, specializing in enabling and optimizing cutting-edge AI models on AMD Instinct GPUs for both distributed inference and training. With over 10 years of experience in computer, graphics, and AI architecture, she has previously led... Read More →
avatar for Zachary Streeter

Zachary Streeter

Senior Member of Technical Staff, AMD
I'm a computational physicist working in the field of AI the past 5 years. I have a wide range of expertise from mathematics to performance optimizations and system engineering. Feel free to nerd out with me! Please connect with me on LinkedIn.
Wednesday April 8, 2026 11:05 - 11:30 CEST
Founders Cafe
  Training Systems
  • Audience Level Any

13:30 CEST

PyTorch on RISC-V: From Cross-Compilation To Native CI - Ludovic Henry, Meta
Wednesday April 8, 2026 13:30 - 13:55 CEST
As RISC-V matures into a viable architecture for AI and data center workloads, bringing first-class PyTorch support to the ecosystem is a critical milestone. This session provides a technical deep dive into the ongoing efforts to port PyTorch natively to RISC-V, moving beyond experimental cross-compilation toward a stable, tested, and optimized environment. We detail the challenges of reconciling native math library dependencies like OpenBLAS and oneDNN with RISC-V Vector (RVV) extensions, alongside the work required to upstream these accelerations to ensure sustainable, long-term performance.

The talk also addresses the critical "last mile" of the Python ecosystem: ensuring that the broader dependency tree—including NumPy, SciPy, and ONNX—is natively available and performant on the architecture. Finally, we examine the primary bottleneck for official support: CI infrastructure. We outline the roadmap for transitioning from tagged cross-compilation to a native testing pool, discussing the logistics of maintaining a reliable hardware fleet to meet the high-volume validation standards required for the PyTorch master branch and pull request workflows.
Speakers
avatar for Ludovic Henry

Ludovic Henry

Software Engineering Lead, Rivos
Ludovic works at the intersection of open-source software and emerging hardware. He is a key contributor to the RISC-V ecosystem, focusing on the performance and stability of the AI stack. His recent work involves optimizing native dependencies like OpenBLAS and oneDNN and establishing... Read More →
Wednesday April 8, 2026 13:30 - 13:55 CEST
Junior Stage

13:30 CEST

PyTorch Symmetric Memory + NCCL Device APIs: A New Path Towards Multi-GPU Kernels - Ke Wen & Sylvain Jeaugey, NVIDIA
Wednesday April 8, 2026 13:30 - 13:55 CEST
As large models shift toward inference and Mixture-of-Experts (MoE) architectures, small batch sizes and dynamic routing present new scaling challenges. Fused, customized multi-GPU kernels are emerging as the solution, but programming them for high performance remains difficult. This talk introduces a paradigm shift enabled by PyTorch Symmetric Memory and NCCL device APIs.

PyTorch Symmetric Memory provides a unified infrastructure for direct GPU-to-GPU memory access without CPU involvement. By leveraging symmetric tensor allocation and CUDA Graph-compatible signaling, it enables fine-grained, dynamic data exchange while bypassing traditional "send/receive" overhead.

We further demonstrate how NCCL device APIs simplify this model using in-kernel primitives for NVLink and GPU-Initiated Networking (GIN). We will showcase practical examples of compute-communication fusion, such as AllGather-Matmul, and customized patterns like deduplicated expert all-to-all.

These abstractions represent one of the most significant evolutions in the PyTorch and NCCL ecosystems, offering a versatile path to high-performance distributed programming.
Speakers
avatar for Ke Wen

Ke Wen

Principal Software Architect, NVIDIA
Ke Wen works on distributed features, including Symmetric Memory, multi-GPU kernels, Expert Parallelism, inference, pipelining and graph analysis.
avatar for Sylvain Jeaugey

Sylvain Jeaugey

Distinguished Engineer, NVIDIA
Sylvain has been developing the NCCL library since its inception in 2015. He has been working on optimizing communication libraries for large parallel systems for more than 20 years.
Wednesday April 8, 2026 13:30 - 13:55 CEST
Master Stage

14:00 CEST

Deploying PyTorch Models To the Browser and Beyond With Transformers.js - Joshua Lochner, Hugging Face
Wednesday April 8, 2026 14:00 - 14:25 CEST
This session presents a comprehensive engineering roadmap for running Hugging Face Transformers entirely locally in your web browser using Transformers.js. We will explore the end-to-end pipeline required to export, optimize, and deploy PyTorch models to the web, leveraging emerging web technologies like WebGPU for efficient, cross-platform inference.

We will dive into the technical nuances of converting PyTorch models to ONNX using torch.export (Dynamo) and applying runtime-specific optimizations via ONNX Runtime GenAI. This workflow enables the production of highly efficient, quantized model artifacts suitable for browser-based execution. Finally, we will demonstrate how to deploy these optimized models using Transformers.js to create performant, interactive, and visually stunning WebAI experiences.
Speakers
avatar for Joshua Lochner

Joshua Lochner

Creator of Transformers.js, Hugging Face
Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)
Wednesday April 8, 2026 14:00 - 14:25 CEST
Master Stage

14:15 CEST

Lightning Talk: Inside VLLM's KV Offloading Connector: Async Memory Transfers for Higher Inference Throughput - Nicolò Lucchesi, Red Hat
Wednesday April 8, 2026 14:15 - 14:25 CEST
Every LLM request produces KV-cache state that is expensive to recompute. However, GPU memory is limited in size and when memory fills up, entries are discarded from cache. A natural mitigation is expanding the KV cache to CPU DRAM which is meaningfully larger than GPU memory.
vLLM 0.11.0 introduced the Offloading Connector - an asynchronous, pluggable API for KV-cache offloading which is bundled with a native CPU backend. This new feature executes transfers concurrently with model computation on the GPU cores by using GPU DMA. This solution offers speedy loading of KV data from DRAM and near zero overhead from offloading. Getting here required rethinking vLLM's memory layout. The default per-layer KV fragmentation devastated transfer throughput. A new contiguous block layout, upstreamed in 0.12.0, increased effective block sizes by up to 125× and delivered an order-of-magnitude improvement in offloading performance.
We'll walk through the connector architecture, discuss memory transfer tradeoffs, the memory layout redesign, and practical guidance for enabling CPU offloading in production.
Speakers
avatar for Nicolò Lucchesi

Nicolò Lucchesi

Senior Machine Learning Engineer, Red Hat
Nicolò is a Senior Machine Learning Engineer at Red Hat with a background in Deep Learning and Computer Vision. He works on Inference Optimization for vLLM, where he is a maintainer.
Wednesday April 8, 2026 14:15 - 14:25 CEST
Central Room
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

14:45 CEST

Lightning Talk: Full-Stack PyTorch Robotics VLA: From Data To Edge Via ExecuTorch/OpenVINO - Samet Akcay & Dmitriy Pastushenkov, Intel
Wednesday April 8, 2026 14:45 - 14:55 CEST
While research-centric tools have lowered the entry barrier for robotics data collection, transitioning Vision-Language-Action models to production remains challenging due to fragmented edge deployment paths. This session presents a unified, PyTorch-native workflow spanning the full robotics lifecycle, from data capture and curation to optimized edge execution. We introduce a modular Physical AI pipeline designed to resolve the disconnect between research scripts and real-time hardware. The talk details practical patterns for robotics data capture and policy training in a unified PyTorch ecosystem, followed by concrete steps to export models via ExecuTorch. Using an OpenVINO backend, Quantizer, and AOT compilation, we address latency, accuracy, and operator coverage gaps, and demonstrate efficient on-device VLA inference. Using a WidowX pick-and-sort task as a case study, we demonstrate how to validate latency and numerical tolerances under physical constraints. Attendees will leave with a reference architecture and a checklist for monitoring, safety gates, and managing dataset drift, providing a roadmap for moving robotics VLA from research to production-grade edge deployment.
Speakers
avatar for Dmitriy Pastushenkov

Dmitriy Pastushenkov

AI Software Product Manager, Intel
Dmitriy Pastushenkov is a passionate Software Product Manager at Intel with more than 20 years of comprehensive and international experience in the industrial automation, industrial Internet of Things (IIoT) and real-time operating systems and AI. Dmitriy has held various roles in... Read More →
avatar for Samet Akcay

Samet Akcay

Principal AI Engineer, Intel
Samet Akcay is a Principal AI Engineer at Intel who leads ML R&D efforts across Open Edge Platform libraries, including Intel Geti, Datumaro, Anomalib, Training Extensions, and Inference libraries. His research specializes self-supervised learning and multi-modal object detection... Read More →
Wednesday April 8, 2026 14:45 - 14:55 CEST
Central Room
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

14:55 CEST

Meet the Ray Maintainers
Wednesday April 8, 2026 14:55 - 15:25 CEST
Meet the core maintainers of Ray at this session! Come and discuss use cases, features, roadmap with us, or just learn how the Ray development happens under the hood.
Speakers
avatar for Artur Niederfahrenhorst

Artur Niederfahrenhorst

Member of Technical Staff, Anyscale
Artur is a member of the technical staff at Anyscale, the company that recently donated Ray to the Linux Foundation. He has been contributing to Ray since early 2022, where his main contributions have been in distributed reinforcement learning. Artur majored in Computer Science at... Read More →
Wednesday April 8, 2026 14:55 - 15:25 CEST
Open Platform
  Meet the Developers
  • Audience Level Any

15:25 CEST

Beyond the Theory: What Actually Breaks When You Scale Your Disaggregated Pytorch Models - Ekin Karabulut & Ron Kahn, NVIDIA
Wednesday April 8, 2026 15:25 - 15:50 CEST
As inference demand explodes, new techniques to optimize these deployments have emerged. One such technique is disaggregated inference, which splits inference into differently optimized workloads (e.g. prefill and decode) on separate workers. The theory is straightforward–better GPU utilization, inference performance, and tighter control over SLAs.The deployment in production is not.
Scaling happens at multiple connected levels. Adding prefill workers for a traffic spike? Those workers belong to a prefill leader and must scale as a unit. But your prefill-to-decode ratio matters too, scale prefill without matching decode capacity and you've moved the bottleneck.Placement also plays a role: place prefill and decode far apart in your network topology and KV-cache transfers will kill your latency.Standard autoscaling treats these as independent components.They're not.
In this talk, we'll share what we've learned running disaggregated vLLM and SGLang deployments on K8s: what broke,what worked, and how we're improving performance. We'll evaluate approaches from standard deployments to specialized APIs like LWS and Grove, discuss how these integrate with frameworks like llm-d and Dynamo.
Speakers
avatar for Ekin Karabulut

Ekin Karabulut

AI/ML Developer Advocate, NVIDIA
Ekin is a Developer Advocate at NVIDIA, following the acquisition of Run:ai. Prior to that, she specialized in the privacy implications of federated learning systems with DNNs in distributed environments as a data scientist. Currently, she is exploring the efficient usage of large... Read More →
avatar for Ron Kahn

Ron Kahn

Senior Software Engineer, NVIDIA
Ron Kahn is a Senior Software Engineer in the NVIDIA Run:ai platform team. Ron works on the design and implementation of workload management systems that abstract Kubernetes complexity for AI practitioners. When not simplifying AI training jobs, Ron can be found cooking something... Read More →
Wednesday April 8, 2026 15:25 - 15:50 CEST
Central Room
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

15:25 CEST

Building Trust for Users and Regulators Alike: A Cost-Efficient PyTorch Path To Compliance-as-Code - Raja Gopal Hari Vijay, Zoho Corporation
Wednesday April 8, 2026 15:25 - 15:50 CEST
Traditional compliance relies on retroactive logs and manually stitched audit trails, while Opacus, CrypTen, and Captum address isolated concerns without providing end-to-end lifecycle traceability. Compliance-as-Code embeds regulatory controls as executable logic within training and inference pipelines, turning compliance into a continuous engineering function and reducing audit costs.

PyTorch’s dynamic execution model enables real-time auditing and compliance gates across the model lifecycle. Features such as the Dispatcher, custom Autograd functions, and the hook system allow logging, constraint checks, and risk controls to be embedded directly into execution. For ex, a fairness gate using training hooks can block model export if disparity exceeds thresholds. Dataset initialization can detect imbalance, while dispatcher-level monitoring generates tamper-resistant audit trails linking data, model versions, and outputs. In deployment, metrics, inference hooks track bias drift, accuracy degradation, and human-intervention counts.

The talk presents practical PyTorch patterns for automated documentation, immutable audit trails, and faster certification in regulated AI deployments.
Speakers
avatar for Raja Gopal Hari Vijay -

Raja Gopal Hari Vijay -

Member Leadership Staff, Zoho Corporation
At Zoho, Raja builds large-scale Video AI (CCTV analytics, edge inference, privacy-aware deployments) on PyTorch, drives green computing via custom accelerators and FPGAs, and owns a custom Linux distribution for Zoho products and agentic workflows with security reasoning across LSM... Read More →
Wednesday April 8, 2026 15:25 - 15:50 CEST
Junior Stage
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Level
  • Slides Attached
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.