Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
arrow_back View All Dates
Tuesday, April 7
 

07:30 CEST

Registration & Badge Pick-Up
Tuesday April 7, 2026 07:30 - 18:00 CEST

Tuesday April 7, 2026 07:30 - 18:00 CEST
Lobby

07:30 CEST

Community Expo
Tuesday April 7, 2026 07:30 - 18:35 CEST

Tuesday April 7, 2026 07:30 - 18:35 CEST
Open Platform

09:00 CEST

Keynote: Co-Evolution: How the Open Source Intelligence Stack Compounds - Mark Collier, Executive Director, PyTorch Foundation, General Manager, AI & Infrastructure, Linux Foundation
Tuesday April 7, 2026 09:00 - 09:10 CEST
Agentic coding systems have crossed a threshold from experimentation to measurable economic impact. Their rapid adoption reveals a deeper shift: modern AI capability emerges from the co-evolution of models, training frameworks, inference engines, reinforcement systems, hardware, and cloud infrastructure, with open source enabling the flow of code, research, and operational knowledge across the stack. As performance gaps narrow and costs fall, this compounding intelligence system accelerates innovation and spreads capability across companies, industries, and hardware platforms, raising a simple question for the community: how fast do we want to evolve?
Speakers
avatar for Mark Collier

Mark Collier

Executive Director, PyTorch Foundation, General Manager, AI & Infrastructure, The Linux Foundation

Tuesday April 7, 2026 09:00 - 09:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:10 CEST

Keynote: PyTorch Updates - Edward Yang, Research Engineer, Meta
Tuesday April 7, 2026 09:10 - 09:30 CEST

Speakers
avatar for Edward Yang

Edward Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.
Tuesday April 7, 2026 09:10 - 09:30 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

09:35 CEST

Keynote: Community Led Open Source RL - Joe Spisak, VP of Product & Head of Open Source, Reflection AI
Tuesday April 7, 2026 09:35 - 09:45 CEST

Speakers
avatar for Joe Spisak

Joe Spisak

VP of Product & Head of Open Source, Reflection AI
Joe Spisak is Product Director for AI at Meta with leadership roles in PyTorch, Llama and FAIR research. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, building developer tools... Read More →
Tuesday April 7, 2026 09:35 - 09:45 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

09:45 CEST

Sponsored Keynote: From One Node to Distributed Training and Inference. How the PyTorch Ecosystem Changed AI - Ramine Roane, Corporate Vice President of AI Product Management and Ecosystem Development, AMD
Tuesday April 7, 2026 09:45 - 09:50 CEST
PyTorch has evolved from a research framework into a distributed-first platform powering production AI at massive scale. As models grow to hundreds of billions of parameters, this talk explores the challenges of scaling inference across nodes and the emerging ecosystem from Monarch and TorchTitan to open, hardware-agnostic systems that makes it possible.
Speakers
avatar for Ramine Roane

Ramine Roane

Corporate Vice President of AI Product Management and Ecosystem Development, AMD
Ramine Roane is the Corporate Vice President of AI Product Management and ecosystem development at AMD, based in San Jose, California. Prior to this role, he served as Vice President of Data Center Acceleration within AMD’s Adaptive and Embedded Computing Group in 2022. Before the... Read More →
Tuesday April 7, 2026 09:45 - 09:50 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

09:55 CEST

Keynote: Stream Everything - Moving from Request input to Streaming input - Patrick von Platen, Research Engineer, Mistral AI
Tuesday April 7, 2026 09:55 - 10:10 CEST

Speakers
avatar for Patrick von Platen

Patrick von Platen

Research Engineer, Mistral AI
Patrick von Platen is a Research Engineer at Mistral AI, focussed on natural language processing and scalable AI systems. Currently, he contributes to vLLM, is a former core maintainer of Transformers, and created Diffusers.
Tuesday April 7, 2026 09:55 - 10:10 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any
  • Slides Attached Yes

10:10 CEST

Sponsored Keynote: Any [ Agent | Model | Accelerator | Cloud ]. Open Source AI Unlocks the World's Potential - Maryam Tahhan, Principal Engineer & Nicolò Lucchesi, Senior Machine Learning Engineer, Red Hat
Tuesday April 7, 2026 10:10 - 10:15 CEST
Red Hat is shaping an open future for AI, delivering on the promise of 'Any Agent, Any Model, Any Accelerator, Any Cloud.' Discover the community advancements contributed in the PyTorch Foundation that empower enterprises to rapidly enable, test, and seamlessly scale AI workloads across their choice of infrastructure
Speakers
avatar for Maryam Tahhan

Maryam Tahhan

Principal Engineer, Red Hat
Maryam is a Principal Engineer in Red Hat's Office of the CTO, where she focuses on standardising CPU inferencing performance evaluation to help effectively validate and scale ML workloads.
avatar for Nicolò Lucchesi

Nicolò Lucchesi

Senior Machine Learning Engineer, Red Hat
Nicolò is a Senior Machine Learning Engineer at Red Hat with a background in Deep Learning and Computer Vision. He works on Inference Optimization for vLLM, where he is a maintainer.
Tuesday April 7, 2026 10:10 - 10:15 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

10:15 CEST

Keynote: The Unbearable Lightness of (Agentic) Evaluations - Besmira Nushi, Senior Manager, AI Research, NVIDIA
Tuesday April 7, 2026 10:15 - 10:25 CEST
The discipline of evaluating large language models underwent a major transformation with the rise of general AI capabilities. Today, the field is undergoing yet another challenging transformation following the groundbreaking improvements in agentic tasks, which expect models and systems to plan and take autonomous actions in the real world. Measuring how well models and systems perform in such tasks is however still i) fragile from a methodological perspective, and ii) difficult to scale and generalize across different domains. This talk will first discuss common challenges in reproducing agentic evaluations, including differences in reference implementation, error handling, trajectory post processing, and tooling definitions. Next, it will cover infrastructural requirements that need to be addressed for such evaluations to run efficiently at scale. Finally, we will conclude with a set of (still nascent) best practices that can help alleviate “lightness” and build more consistent measurement pipelines.
Speakers
avatar for Besmira Nushi

Besmira Nushi

Senior Manager - AI Research, NVIDIA
Besmira Nushi is a Senior AI Research Manager at NVIDIA in Zurich, where she leads research on LLM evaluation, model analysis and generalization, and real-world and agentic AI system measurements. Previously, she spent 7+ years at Microsoft Research advancing responsible AI, model... Read More →
Tuesday April 7, 2026 10:15 - 10:25 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

10:30 CEST

Birds of A Feather: Engineering for the EU AI Act: What Should PyTorch Expose Natively? - Roy Saurabh, AffectLog
Tuesday April 7, 2026 10:30 - 11:00 CEST
The EU AI Act introduces concrete technical obligations for ML systems: traceability, risk management, monitoring, and auditability. Today, most of this burden is handled outside the ML framework—through ad-hoc tooling, documentation, or bespoke infrastructure.

This Birds of a Feather session is an open, practitioner-driven discussion on a forward-looking question:
What primitives, hooks, or abstractions should PyTorch expose natively to better support AI accountability and regulatory readiness?

Topics for discussion may include:
- Native support for provenance, lineage, and training/inference traces
- Standardized hooks for fairness, robustness, and drift monitoring
- Model and dataset metadata as first-class PyTorch objects
- Privacy-preserving logging and zero-retention execution patterns

Gaps between regulatory requirements (e.g. EU AI Act) and current ML frameworks
The goal is not consensus, but shared understanding and concrete ideas that can inform community practices, tooling, and potential upstream contributions. This BoF is intended for PyTorch users, maintainers, researchers, and infra engineers interested in the future of responsible, production-grade ML.
Speakers
avatar for Roy Saurabh

Roy Saurabh

Président, AffectLog
Roy Saurabh is Founder & CEO of AffectLog and an applied researcher in AI governance, privacy engineering, and accountable ML systems. He has worked with UNESCO, the European Commission, and national governments on operationalising trustworthy AI, and leads EU-funded projects focused... Read More →
Tuesday April 7, 2026 10:30 - 11:00 CEST
Open Platform
  Birds of A Feather
  • Audience Level Any

10:30 CEST

Coffee Break
Tuesday April 7, 2026 10:30 - 11:00 CEST
Menu: 
-Apple and pecan nut cake (Vegan, Vegetarian)
-Granola bar (Gluten Free, Vegetarian)
-Seasonal fruits (Vegan, GF, Vegetarian)
-Egg sandwich (Vegetarian)
-Dry fruits and dry grapes mix (Vegan, GF, Vegetarian)
Tuesday April 7, 2026 10:30 - 11:00 CEST
Open Platform

10:30 CEST

Meet the Developers of PyTorch Module Maintainers
Tuesday April 7, 2026 10:30 - 11:00 CEST
These sessions give participants an opportunity to meet some of the developers leading PyTorch to foster collaboration, gather feedback, and inspire contributions and collaboration .

PyTorch core modules (e.g. torch.autograd, torch.optim, torch.nn) form the foundation for most AI research and development, either directly through PyTorch or indirectly via higher-level framework. The core libraries prioritize API stability, backward compatibility, modular design, and simplicity.
Speakers
avatar for Edward Yang

Edward Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.
avatar for Alban Desmaison

Alban Desmaison

Research Engineer, Meta

avatar for Driss Guessous

Driss Guessous

Machine Learning Engineer, Meta
I am currently a machine learning engineer working on core development of PyTorch. I received my Masters in Computer Science from the University of Illinois at Urbana-Champaign. I received a dual degree in Physics and Applied Mathematics from The Ohio State University. I also won... Read More →
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →
avatar for Natalia Gimelshein

Natalia Gimelshein

Software Engineer, Meta
Natalia Gimelshein is a software engineer at Meta. She is one of the pytorch leads, and works on GPU performance and support, including low precision, distributed and symmetric memory.
avatar for Jason Ansel

Jason Ansel

Research Scientist, Meta
Jason Ansel is a Research Scientist at Meta AI and a technical lead for PyTorch compilers. He started the TorchDynamo and TorchInductor projects, which bring flexible graph capture and a high performance compiler to PyTorch 2. He received a Ph.D. from MIT and has over 15 years of... Read More →
Tuesday April 7, 2026 10:30 - 11:00 CEST
Open Platform
  Meet the Developers
  • Audience Level Any

11:00 CEST

Lightning Talk: Why Your Forecasting Transformer Isn’t Working (And How To Fix It in Python) - Rosheen Naeem, Open Climate Fix
Tuesday April 7, 2026 11:00 - 11:10 CEST
Renewable energy is clean — but it’s also inherently variable. Solar PV generation can change dramatically within minutes due to cloud cover and weather conditions, making accurate short-term forecasts essential for grid stability, energy trading, and smart-home optimisation.
Open Climate Fix builds open and high-impact forecasting tools to accelerate the transition to a low-carbon energy system. One of these projects is Open Quartz Solar Forecast: an open-source model that uses public PV generation data, site metadata, and numerical weather prediction variables to forecast solar power for any location.
In this talk, I’ll present a real case study from my Google Summer of Code project where I implemented and trained a Temporal Fusion Transformer for multi-horizon solar forecasting. I’ll cover the practical engineering challenges behind making transformer forecasting work in Python: building continuous training windows, aligning weather forecast steps with observations, separating static vs time-varying features, and stabilising training using PyTorch Forecasting and PyTorch Lightning.
Attendees will leave with reusable patterns for real-world time-series forecasting pipelines.
Speakers
avatar for Rosheen Naeem

Rosheen Naeem

Software Engineer, Miro
I am a Software Engineer at Miro and a community member at Open Climate Fix. I completed the Erasmus Mundus Master’s in Software Engineering for the Green Deal (SE4GD), a joint degree program across Vrije Universiteit Amsterdam (Netherlands), LUT University (Finland), and Universit... Read More →
Tuesday April 7, 2026 11:00 - 11:10 CEST
Central Room
  Applications & Case Studies

11:00 CEST

Lightning Talk: Training Embedding Model Resiliently for Multimodal Model Inference Routing - Huamin Chen, Red Hat & Haichen Zhang, AMD
Tuesday April 7, 2026 11:00 - 11:10 CEST
LLM systems increasingly rely on intelligent routing to balance cost, latency, and quality tradeoffs. The vLLM Semantic Router, a vLLM Ecosystem project, provides both semantic and performance level routing intelligence for Mixture-of-Multimodal Models (MoM) architectures, but its effectiveness depends on fast and accurate classifiers.

This talk presents our end-to-end journey training production-grade embedding and classification models on AMD GPUs using native PyTorch, achieving high GPU utilization with distributed training optimizations.

We introduce a multilingual text embedding model with 32K context window and 2D Matryoshka support, and multimodal embedding models, trained on AMD GPUs using PyTorch DDP. The talk covers practical training optimizations for AMD ROCm. All training code uses native PyTorch distributed primitives, with additional enhancement to improve training stability and pipeline efficiency.

Attendees will learn how to train efficient classifiers for LLM routing systems and integrate these models into production inference pipelines.
Speakers
avatar for Huamin Chen

Huamin Chen

Technical Advisor, Microsoft
Dr. Huamin Chen is a passionate developer. He co-founded the Semantic Router project under vLLM community. His recent contributions to the CNCF ecosystem include Project Kepler, TAG Environmental Sustainability, and Cloud Native AI WG. He is also one of the founding members... Read More →
avatar for Haichen Zhang

Haichen Zhang

Senior AI Software Engineer, AMD
Haichen is the Senior AI Engineer for AMD AI Group, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Before joining AMD, Haichen worked at... Read More →
Tuesday April 7, 2026 11:00 - 11:10 CEST
Junior Stage

11:00 CEST

Helion 1.0: A High-Level DSL for Performance Portable Kernels - Oguz Ulgen, Meta
Tuesday April 7, 2026 11:00 - 11:25 CEST
ML practitioners increasingly author bespoke kernels, but achieving portable performance demands low-level expertise and repeated manual tuning for each accelerator generation and type. We introduce Helion, a Python-embedded DSL with a “PyTorch with tiles” programming model that preserves familiar PyTorch APIs while giving developers lower-level control over the generated kernels. Helion integrates tightly with TorchInductor to reuse PyTorch operator lowerings, automatically manages host/device boundaries, and provides rich language constructs for tiling, memory movement, and synchronization. The language defines an implicit high-dimensional configuration space that our autotuner explores, shifting the tuning burden from developers to automated search.

In this session, I will cover both the language and what is new since PTC'25, as well as announcing the official GA launch. This session will be open for both experienced and beginner kernel authors.
Speakers
avatar for Oguz Ulgen

Oguz Ulgen

Software Engineer, Meta
I'm a software engineer at Meta where I used to work on the Hack programming language and now work on PyTorch.
Tuesday April 7, 2026 11:00 - 11:25 CEST
Master Stage

11:00 CEST

Lights, Camera, Inference! Video Generation as a Service With VLLM-Omni - Ricardo Noriega, Red Hat & Doug Smith, Red Hat, Inc
Tuesday April 7, 2026 11:00 - 11:25 CEST
LLMs made for text generation as a service. What does it take to do the same for video?
We built an experimental Video Generation as a Service stack using vLLM-Omni and the LTX-2 open weights video model to explore how far an open, multimodal stack can go toward production use. We’ll share what worked, what busted, and what it takes to treat generative video as a first-class workload.
vLLM is known for high-performance autoregressive inference, and vLLM-Omni extends that foundation to multimodal inputs and outputs. We pushed those capabilities further by adding support for LTX-2, extending the OpenAI-compatible API surface, integrating with front ends, and packaging for scalable deployment. We’re here to walk you through and get you familiar with the touch points for just how we put all the Legos together with vLLM-Omni.
Finally, we’ll examine the gap between novelty demos and real applications: going from quirky spaghetti eating videos to generating consistent characters, personalized media, customized video game cutscenes, and interactive storytelling, and highlight what’s still missing to make generative video truly production-ready.
Speakers
avatar for Doug Smith

Doug Smith

Principal Software Engineer, Red Hat
Doug Smith is a Principal MLOps Engineer at Red Hat, where he works on the AI Inference Server team and contributes upstream to the vLLM project through its CI Special Interest Group. Recently, he's also been looking into contributions to vLLM-Omni. He’s spent years bridging telecom... Read More →
avatar for Ricardo Noriega

Ricardo Noriega

Principal SW Engineer, Red Hat
Ricardo is a Principal Software Engineer working at the Red Hat's Office of the CTO in the Emerging Technologies organization. Ricardo is currently focused on AI multimodality and researching the benefits of Small Language Models.
He is a former member of the Akraino TSC and PTL of the Kubernetes-Native-Infrastructure blueprint family, and contributor to Kubernetes, OpenStack, OpenDaylight and OPNFV... Read More →
Tuesday April 7, 2026 11:00 - 11:25 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

11:15 CEST

Lightning Talk: Deep Learning in the Wild: Embedded PyTorch for Real-World Conservation Bioacoustics - Taraqur Rahman & Owen O'Donnell, OWL Integrations
Tuesday April 7, 2026 11:15 - 11:25 CEST
Passive acoustic monitoring is a powerful tool for wildlife conservation, but deploying deep learning models in remote rainforest environments introduces strict constraints on power, memory, and compute. In this talk, we present an end-to-end PyTorch-based pipeline for detecting and analyzing the endangered three-wattled bellbird using embedded deep learning systems.

We cover the full lifecycle from audio preprocessing and model training in PyTorch to optimization and deployment on resource-constrained embedded devices. Topics include model architectures for sparse bioacoustic event detection, handling extreme class imbalance, model compression and quantization, and practical trade-offs between accuracy, latency, and power consumption.

The session emphasizes real-world lessons learned deploying machine learning at the edge, where unreliable connectivity, noisy signals, and limited hardware define success more than benchmark metrics. Attendees will gain practical patterns for building and deploying PyTorch models for embedded and edge AI applications with real environmental impact.
Speakers
avatar for Owen O'Donnell

Owen O'Donnell

Embedded Systems and Machine Learning Engineer, OWL Integrations
Owen O'Donnell is a Machine Learning and Embedded Systems Engineer at OWL integrations. He works with training ML models to deploy in remote locations that will be running on resource constrained electronics. This introduces challenges such as needing smaller sized models and having... Read More →
avatar for Taraqur Rahman

Taraqur Rahman

Chief Data Scientist, OWL Integrations
Taraqur Rahman is Chief Data Scientist and Co-Founder at OWL Integrations and Organizer/Co-Founder of Biased Outliers, where he leads applied machine learning and data science initiatives with real-world impact. He combines deep technical expertise in Python with practical deployment... Read More →
Tuesday April 7, 2026 11:15 - 11:25 CEST
Central Room
  Applications & Case Studies
  • Audience Level Any
  • Slides Attached Yes

11:15 CEST

Lightning Talk: Flexible Deployment of PyTorch Models on MCU-Class Devices Using ExecuTorch - Robert Kalmar & Martin Pavella, NXP
Tuesday April 7, 2026 11:15 - 11:25 CEST
ExecuTorch has recently matured into a production ready framework designed specifically for efficient edge deployment of PyTorch models. Its architecture supports a broad spectrum of hardware targets—from low power, bare metal or RTOS based microcontrollers (MCU) to higher performance Linux or Android based microprocessor platforms—while meeting the demanding constraints of memory, compute, and power typically found in real world embedded applications.
This talk focuses on the deployment flexibility ExecuTorch offers for MCU class devices, highlighting how different backends enable efficient execution across heterogeneous compute units. We will explore CPU, DSP, and NPU acceleration paths using the Cortex-M, Cadence, Ethos-U, and eIQ Neutron backends, and discuss how these integrate into typical ML model deployment workflows.
To make the session practical and application oriented, we will present an optimization journey aimed at reducing power consumption—an essential requirement for ML workloads in energy constrained environments. Attendees will gain insights into backend selection, performance trade offs, and best practices for suitable deploying PyTorch models on edge devices.
Speakers
avatar for Robert Kalmar

Robert Kalmar

Principal AI/ML Engineer at NXP Semiconductors, NXP Semiconductors
Robert Kalmar is a Principal Machine Learning Engineer at NXP Semiconductors. He received his master’s degree in machine learning and intelligent systems from Brno University of Technology. At NXP he focus on machine learning solution enablement for embedded and mobile devices... Read More →
avatar for Martin Pavella

Martin Pavella

ML SW Engineer, NXP Semiconductors
I hold a Master’s degree in Machine Learning from the Brno University of Technology, graduating with distinction at both bachelor’s and master’s levels. I am a mid-level AI/ML Software Engineer at NXP Semiconductors with 2.5+ years of experience. I won the 2025 iGEM overgraduate... Read More →
Tuesday April 7, 2026 11:15 - 11:25 CEST
Junior Stage
  Inference & Production

11:30 CEST

Lightning Talk: Coding Agents for Compiler Construction: Beyond the AI Assistant Paradigm - Reza Rahimi, yasp.ai & Stefan Krassin, yasp
Tuesday April 7, 2026 11:30 - 11:40 CEST
Modern ML compilers follow a familiar pattern: a frontend lowers models into an intermediate representation, while a backend applies graph and kernel optimizations before generating code for target accelerators. PyTorch provides strong foundations through nn.Module, FX, and graph capture, but implementing optimized backends remains challenging due to hardware diversity and kernel-level complexity.

Optimizing GPU kernels is hard. Few engineers do it well. Hardware architectures evolve yearly, and with hyperscalers, chip makers, and AI labs building custom silicon, demand for efficient kernel generation keeps growing. This creates a gap between model developers and hardware capabilities.

This talk explores coding agents as engineering tools for compiler construction, not general-purpose assistants. We discuss how agents can generate and refine backend components by analyzing model mathematics and hardware specifications to produce optimized kernels tailored to specific targets.

We present a compiler architecture built as a PyTorch add-on that accepts PyTorch models or FX graphs and produces executable artifacts, demonstrating practical integration with existing PyTorch workflows.
Speakers
avatar for Reza Rahimi

Reza Rahimi

CTO, yasp
Reza Rahimi is a seasoned technologist with a strong background in accelerating engineering software and scaling machine learning systems. With experience leading teams across embedded AI, compiler design, and model optimization, he now serves as CTO of yasp, where he is pioneering... Read More →
avatar for Stefan Krassin

Stefan Krassin

CEO, yasp.ai
With a background in electrical engineering and a career spanning embedded systems to executive leadership, he combines technical expertise with a vision for scale. After 10+ years of leading companies to outstanding growth, he co-founded yasp in 2023. His mission is to eliminate... Read More →
Tuesday April 7, 2026 11:30 - 11:40 CEST
Founders Cafe
  Agents & Interop

11:30 CEST

Lightning Talk: How DeepInverse Is Solving Imaging in Science and Healthcare With PyTorch - Andrew Wang, DeepInverse; Minh Hai Nguyen, Université de Toulouse
Tuesday April 7, 2026 11:30 - 11:40 CEST
Deep learning has revolutionised imaging, a foundation of science and healthcare. DeepInverse is the PyTorch library for solving imaging problems, unifying deep learning methods (e.g. diffusion models), physics (medical, optics) and modern tooling. In this talk, we’ll show how the PyTorch community can get involved in this exciting yet accessible application of open-source AI.

AI methods in imaging must model the imaging physics, leading to interesting engineering problems e.g. efficient differentiable ops, physics-informed losses. We’ll show notebooks on real use-cases: accelerating brain MRI, reducing radiation in CT scans, imaging black holes.

PyTorch enthusiasts at any level/background can contribute - from training infra for scientific data to high-level generative modelling frameworks - their AI engineering skills can directly impact imaging across multiple fields.

DeepInverse is supported by a growing international user community and proudly rooted in Paris. We’ve joined the PyTorch Ecosystem and received the Prix Science Ouverte in 2024. We’re excited to join the PyTorch Conf to celebrate the vibrant French developer community!
Speakers
avatar for Andrew Wang

Andrew Wang

CTO & Co-founder, Blur Labs
Andrew is a lead developer of DeepInverse as well as the CTO & co-founder of Blur Labs, a startup based in Paris building AI models for imaging. Andrew did his PhD at the University of Edinburgh in magnetic resonance image reconstruction.
avatar for Minh Hai Nguyen

Minh Hai Nguyen

PhD candidate, Toulouse University
Tuesday April 7, 2026 11:30 - 11:40 CEST
Central Room
  Applications & Case Studies
  • Audience Level Any
  • Slides Attached Yes

11:30 CEST

Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, Microsoft
Tuesday April 7, 2026 11:30 - 11:55 CEST
Making your GPUs go brrr is complex. Efficient LLM inference requires navigating a maze of optimization techniques each with different trade-offs. This session provides a practical journey through inference optimizations, clearly categorized by implementation effort.

We'll explore techniques across three levels:

- Model choices (start here): Model selection, quantization, smart routing

- Library-level improvements (using PyTorch-based frameworks like vLLM, SGLang, TensorRT-LLM): Continuous batching, KV-cache management, tensor parallelism

- Custom implementations: Speculative decoding with custom draft heads, disaggregated inference, fine-tuning smaller models

The session covers practical trade-offs and key metrics: time to first token, inter-token latency, throughput, and cost per token.

Whether deploying your first model or optimizing at scale, this talk delivers actionable insights into which techniques to prioritize for deeper investigation.
Speakers
avatar for Christin Pohl

Christin Pohl

Global Black Belt Solution Engineer AI Infrastructure, Microsoft
Christin Pohl is a Global Black Belt Solution Engineer for AI Infrastructure at Microsoft (Switzerland), now in her third year. After building her first chatbot in 2018 and 5+ years at SAP, she helps enterprises worldwide choose the right GPU, run LLM training and inference end-to-end... Read More →
Tuesday April 7, 2026 11:30 - 11:55 CEST
Master Stage

11:30 CEST

Why Classic IAM Collapses for Agents: Rethinking IAM for Agentic Systems - Parul Singh, Red Hat
Tuesday April 7, 2026 11:30 - 11:55 CEST
Autonomous AI agents increasingly reason, plan and act across tools, services and organizational boundaries. In these environments, traditional Identity and Access Management models begin to fail. Agents are not users and they are not static services. They act on behalf of others, change context during execution and operate with different levels of autonomy and risk.

This talk examines why classic IAM assumptions like long lived identities, static permissions and check once trust always authorization do not hold for agentic systems. We focus on three core questions.
1. How should agent identity be defined when tools, workflows, and execution context are dynamic?
2. How do we preserve accountability when agents act on behalf of users or other agents?
3. How should access change as agent behavior and risk evolve during execution?

We will outline a practical migration path from traditional IAM to agent aware identity workflows. Our insights come from our work collaborating with the COSAI Secure Design for Agentic Systems working group focused on creating shared guidance and emerging standards. We will close with current open questions and how you can join us to help solve them.
Speakers
avatar for Parul Singh

Parul Singh

Principal Software Engineer, Red Hat
Parul is a Principal Software Engineer in Red Hat's Office of the CTO, working on agentic systems and security. Her work focuses on trust, identity, and observability for autonomous AI agents, including delegation, provenance, and zero trust architectures for agentic workflows. She... Read More →
slides pdf
Tuesday April 7, 2026 11:30 - 11:55 CEST
Junior Stage
  Security & Privacy

11:45 CEST

Lightning Talk: ExecuTorch on Microcontrollers: Deploying PyTorch To the Smallest Edge - RJ Ascani & Matthias Cremon, Meta
Tuesday April 7, 2026 11:45 - 11:55 CEST
ExecuTorch extends PyTorch's reach to the most resource-constrained devices: microcontrollers, DSPs, and specialized neural processing units powering always-on sensors, wearables, and embedded systems. In this talk, we'll share the current state and roadmap for running ExecuTorch on platforms where every kilobyte of memory and milliwatt of power matters.

What you'll learn:
- How ExecuTorch's design enables deployment from ultra-low-power MCUs to DSP and NPU accelerators, all from a single PyTorch workflow
- The state of backend support for Cadence DSPs, ARM Ethos-U and Cortex-M
- Practical considerations for deploying models with sub-megabyte footprints and milliwatt power budgets
- Case studies spanning always-on audio, embedded vision, and TinyML applications
Speakers
avatar for Matthias Cremon

Matthias Cremon

Software Engineering Manager, Meta
Matthias Cremon is a Software Engineering Manager at Meta in the Silicon AI Software Team, working on AI compilers for various edge devices. He focuses on the frontend, graph level optimization side, as well as the integration of low-level, vendor specific implementations to run on... Read More →
avatar for RJ Ascani

RJ Ascani

Software Engineer, Meta
RJ Ascani is an embedded software engineer on Meta’s PyTorch Edge team, focusing on advancing ExecuTorch for microcontroller platforms.
Tuesday April 7, 2026 11:45 - 11:55 CEST
Central Room
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

11:45 CEST

Lightning Talk: TorchJD: Jacobian Descent in PyTorch - Pierre Quinton, EPFL & Valérian Rey, Simplex Lab
Tuesday April 7, 2026 11:45 - 11:55 CEST
Jacobian descent (JD) is an extension of gradient descent supporting the optimization of vector-valued functions. This algorithm can be used to train neural networks with multiple loss functions (e.g. multi-task learning). JD iteratively updates the parameters of the model using the Jacobian matrix of the vector of losses (the matrix stacking each individual loss' gradient).

To support and extend our research, we have developed the TorchJD library. With it, it's easy and efficient to compute the Jacobians with respect to the model parameters, and to aggregate them into an update direction that is beneficial to every objective. In contrast, if we had averaged the losses and used gradient descent, the update would have been beneficial to the average loss, but may have actually increased one of the individual losses.

In this session, we will give a quick introduction to the theory behind Jacobian descent, and then show how to use TorchJD on a variety of use-cases, beyond multi-task learning.

Library: https://github.com/TorchJD/torchjd
Paper: https://arxiv.org/abs/2406.16232
Speakers
avatar for Pierre Quinton

Pierre Quinton

Teacher, EPFL
PhD in Information Theory and Master in Data Science, specializing in fundamental math and multi-objective optimization (MOO). I am the co-author of TorchJD, a PyTorch library for Jacobian Descent developed with Valerian, currently at ~300 GitHub stars. My work aims to translate complex... Read More →
avatar for Valérian Rey

Valérian Rey

Research Engineer, Simplex Lab
I graduated from EPFL with a MSc in Data Science in 2021. Since then, I worked as a Data Scientist as Withings, and I worked on Jacobian descent, initially as a side-project, but now as a full-time occupation. I now spend most of my time developing and maintaining TorchJD, and I love... Read More →
Tuesday April 7, 2026 11:45 - 11:55 CEST
Founders Cafe
  Training Systems

12:00 CEST

Lightning Talk: Ethical, Privacy and Sustainability Considerations in PyTorch Systems - Paula Mesa Macias, Pau&Company
Tuesday April 7, 2026 12:00 - 12:10 CEST
PyTorch models are part of larger systems that handle data, logs, APIs and other services. Ethical, privacy, security and environmental considerations appear not only around the AI itself, but across the whole system.
Using the Ethical Software Framework and the Ethical IT Audit, this session explores practical ways to think about these issues in real workflows. It highlights situations where decisions in data handling, model deployment, logging or infrastructure have ethical, compliance or sustainability implications. It also shows considerations for using AI responsibly, such as dataset choices, bias awareness and evaluating risks before deployment.
The goal is to provide a clear, structured lens for identifying risks and trade-offs, making ethical, privacy, security, and sustainability concerns easier to discuss in everyday work.
Speakers
avatar for Paula Mesa Macias

Paula Mesa Macias

Founder and Ethical Technology Consultant, Pau&Company
Founder of Pau&Company (https://pau.company/), which offers Ethical IT Audits (https://pau.company/ethical-it-audit/) based on the Ethical Software Framework (https://pau.company/ethical-software-framework/), Paula focuses on ethical considerations in technology. Through Pau&Company... Read More →
Tuesday April 7, 2026 12:00 - 12:10 CEST
Founders Cafe

12:00 CEST

Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy - Ankita Luthra & Trinadh Kotturu, Google
Tuesday April 7, 2026 12:00 - 12:10 CEST
As PyTorch models scale to billions of parameters, the bottleneck has quietly shifted from compute to storage. Modern GPU clusters often sit idle, "starving" for data while waiting on legacy REST-based protocols. This talk introduces Rapid Storage: a fundamental architectural shift bringing Google’s Colossus stateful protocol (that powers many Google’s products) to PyTorch via fsspec , a common Pythonic file interface used by many frameworks within PyTorch ecosystem.
By bypassing REST APIs entirely via persistent gRPC streams to the storage layer, we eliminate protocol overhead. In this talk, we also dive into how Rapid achieves <1ms random read/write latency, 20x faster data access, and a massive 6 TB/s of aggregate throughput. Crucially, it delivers up to 10x lower tail latency for random I/O, preventing the stragglers that often stall distributed training jobs.
Beyond raw speed, we will deconstruct the integration with gcsfs and the broader fsspec ecosystem. This ensures that high-performance I/O is available across the entire data stack including Dask, Ray, HF Datasets and vLLM etc. Join us to learn how to stop wasting GPU cycles and achieve linear scaling in the cloud.
Speakers
avatar for Ankita Luthra

Ankita Luthra

Senior Software Engineer, Google
Ankita Luthra is a Software Developer at Google, focused on AI/ML infrastructure and scalable data pipelines. Her work with open-source tools like fsspec(gcsfs) and gcsfuse improves how frameworks such as PyTorch/ JAX efficiently access data from Google Cloud Storage.
avatar for Trinadh Kotturu

Trinadh Kotturu

Senior Product Manager, Google
Trinadh Kotturu is a Senior Product Manager specializing in AI/ML and analytics client strategy at Google. An alumnus of IIM Bangalore with 12 years of experience, he has a proven track record of shipping v1 products and scaling them into robust platform services. His expertise spans large-scale distributed storage systems, autonomous driving, and system resiliency... Read More →
Tuesday April 7, 2026 12:00 - 12:10 CEST
Master Stage
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

12:00 CEST

Parameterized CUDA Graph Launch in PyTorch: CUDA Graphs Without the Pain - Daniel Galvez, NVIDIA
Tuesday April 7, 2026 12:00 - 12:25 CEST
Modern GPUs are fast enough that CPU kernel launch overhead has become a real bottleneck. CUDA Graphs can eliminate this overhead, but in practice they are hard to use and easy to get wrong.

When CUDA Graph capture fails, PyTorch users typically face two choices: fix the code that breaks capture—often with limited guidance—or capture only parts of the workload. Partial capture comes with sharp footguns, most notably large increases in device memory usage due to CUDA Graphs’ private memory pools.

This talk walks through the most common CUDA Graph capture failures seen in real PyTorch workloads and shows how to diagnose and fix them. It then presents an alternative to CUDA Graph Trees: Parameterized CUDA Graph launch, which automatically applies CUDA Graphs to only the compatible regions of a workload. All you need to do is make your workload compatible with torch.compile(). This enables CUDA Graph acceleration with minimal user effort and without increased memory usage.

Using this approach, llama3.1-70B in torchtitan runs with only a 2 GB memory increase over a non-graph baseline, compared to ~10 GB using traditional CUDA Graph techniques.
Speakers
avatar for Daniel Galvez

Daniel Galvez

Manager, NVIDIA
Daniel Galvez is an AI developer technology engineer working on speech recognition and natural language processing inference and training. He has contributed to software like PyTorch, NeMo, Megatron, ESPNet, vLLM, and TRT-LLM. He is currently working on reducing CPU overheads in CUDA... Read More →
Tuesday April 7, 2026 12:00 - 12:25 CEST
Junior Stage

12:00 CEST

Write Once, Run Everywhere with Pytorch Transformers - Pedro Cuenca, Hugging Face
Tuesday April 7, 2026 12:00 - 12:25 CEST
The Hugging Face transformers library is built on pure PyTorch and can be succinctly described as a model-definition framework. It provides an unified, familiar, clear and concise interface to multiple machine learning architectures across modalities.

Serving and inference optimizations are not its focus.

However, transformers model definitions become the de-facto reference implementations multiple other projects use. This includes training libraries, fast deployment engines such as vLLM and SGLang, and on-device libraries like MLX and llama.cpp.

This session describes the path towards increasingly simpler downstream integration of transformers models into inference and deployment libraries, and how transformers and PyTorch core features enable the ecosystem to enjoy newly-released models as soon as they are released.

We'll go through the journey towards easier modeling, which implies easier downstream porting and adaptation. The end-game is pure interoperability, where no code changes are required! This is now possible with vLLM and SGLang, and we'll show how. We'll end up discussing our ideas on upcoming interop features with MLX and llama.cpp.
Speakers
avatar for Pedro Cuenca

Pedro Cuenca

ML Engineer, Hugging Face
Pedro Cuenca is a machine learning engineer at Hugging Face, working in developer advocacy and on-device ML. He has 20+ years of software development experience across internet applications and iOS. He worked on the technology behind Camera+, an iPhone app using custom ML for photography... Read More →
Tuesday April 7, 2026 12:00 - 12:25 CEST
Central Room

12:15 CEST

Lightning Talk: FlexAttention + FlashAttention-4: Fast and Flexible - Driss Guessous, Meta
Tuesday April 7, 2026 12:15 - 12:25 CEST
FlexAttention democratized attention research by letting researchers prototype custom attention variants in PyTorch without hand-written CUDA. Over 1,000 repos have adopted it, and dozens of papers cite it. But flexibility came at a cost: FlexAttention achieved only ~60% of FlashAttention-3's throughput on Hopper, and the gap widened dramatically on Blackwell GPUs.

We bridged this gap by integrating FlexAttention with FlashAttention-4, the new CuTeDSL-based implementation optimized for Blackwell's async pipelines and tensor memory. PyTorch's Inductor now generates CuTeDSL score/mask modifications directly, enabling JIT instantiation of FA4 for arbitrary attention variants.

Results: 1.2–3.2× speedups over the Triton backend on compute-bound workloads. On B200, patterns like ALiBi, document masking, and sliding window see up to 2.7× forward and 3× backward speedups. On Hopper, gains range from 1.3–2× across all sequence lengths.

This talk covers the technical integration: how Inductor lowers score mods to CuTeDSL, how FA4's warp-specialized kernel accommodates block-sparse iteration, and practical considerations for users adopting the Flash backend today.
Speakers
avatar for Driss Guessous

Driss Guessous

Machine Learning Engineer, Meta
I am currently a machine learning engineer working on core development of PyTorch. I received my Masters in Computer Science from the University of Illinois at Urbana-Champaign. I received a dual degree in Physics and Applied Mathematics from The Ohio State University. I also won... Read More →
Tuesday April 7, 2026 12:15 - 12:25 CEST
Master Stage

12:25 CEST

Attendee Lunch
Tuesday April 7, 2026 12:25 - 13:55 CEST
Menu | Boxed Lunches:

Vegan: (Vegetarian)
-Marocaintaboule
-Indian vegetable wrap with sesame oil and tandoori spices
-Chocolate Chip Cookie

Gluten-Free:
-Bowl Niçoise salad (350 g)
-Potatoes, green beans, cherry tomatoes, tuna, black olives, iceberg lettuce,eggs, chopped red onions
-Chocolate cookie

Classic:
-Bird's tongue pasta salad with baby vegetables (Vegetarian)
Round baguette sandwich with sliced chicken and avocado
Or
Tuna mimosa burger with red onions and baby spinach
Or
Semolina burger, tzatziki, falafel (Vegetarian)
Chocolate and hazelnut muffin  (Vegetarian)
Tuesday April 7, 2026 12:25 - 13:55 CEST
Open Platform

12:25 CEST

Women & Non-Binary in PyTorch Lunch
Tuesday April 7, 2026 12:25 - 13:55 CEST
We’d like to invite all attendees who identify as women or non-binary to join each other for a networking lunch at the event. We will begin with a brief introduction and then attendees will be free to enjoy lunch and mingle with one another. All attendees must identify as a woman or non-binary and must be registered for the conference to attend.

Menu
-Burrata with basil pesto (Vegetarian, Gluten Free)
-Trecce di mozzarella di bufala (Vegetarian, Gluten Free)
-Charcuterie plates (prosciutto, mortadella, fennel or truffle sausage, speck) (Gluten Free)
-Soft focaccia (Vegetarian, Vegan)
-Pizza Al Taglio - Vegetarian
-Seasonal salad (Vegetarian, Gluten Free, Vegan)
-Risotto dello Chef (Vegetarian, Gluten Free) + Vegan Option

-Tiramisù (Gluten Free, Vegetarian)
-Panna Cotta (Vegan)

-Water, soft drinks, and coffee/tea
-Flavored water bottles
Tuesday April 7, 2026 12:25 - 13:55 CEST
Biblioteca Room at La Felicità 5 Parv. Alan Turing, 75013 Paris, France

13:45 CEST

Lightning Talk: From Pretrained To Personal: Privacy-First Fine-Tuning on AI PCs - Daniel Holanda Noronha & Iswarya Alex, AMD
Tuesday April 7, 2026 13:45 - 13:55 CEST
Pytorch on AI PCs crossed a threshold: local hardware can now support meaningful model fine-tuning, not just inference. This unlocks a new class of enterprise workflows where sensitive data never leaves the device, yet models can still be personalized and adapted using PyTorch.

In this session, we’ll show how to design on-device fine-tuning pipelines for AI PCs, focusing on enterprise scenarios where privacy is non-negotiable: regulated healthcare data, government and public-sector workloads, financial services, and proprietary enterprise systems. We’ll walk through key decisions such as selecting efficient pre-trained models, and how the right PyTorch optimizations enable effective personalization on large private datasets.

We'll also showcase practical fine-tuning techniques such as supervised fine-tuning (SFT), LoRA, and QLoRA, and show how mixed-precision training and correct use of training vs. evaluation modes make these approaches efficient and practical on AI PCs while preserving privacy. The result is a cloud-free, privacy-first fine-tuning blueprint that turns AI PCs into secure personalization engines for enterprise AI.
Speakers
avatar for Daniel Holanda

Daniel Holanda

Solutions Architect & ML Engineer, AMD
Daniel is a Sr. ML Engineer at AMD, specializing in local AI. He leads the development of local fine-tuning workflows for AI PCs and co-leads several open-source projects where he designs production-grade LLM/VLM tooling to accelerate the AI development lifecycle.

Previously, he was a Machine Learning Engineer at Groq and a contributor to Microsoft’s Project Brainwave. Daniel holds a PhD in AI understanding and hardware architecture from UBC... Read More →
avatar for Iswarya Alex

Iswarya Alex

Iswarya Alex, AMD
I am an ML Engineer at AMD focused on enabling high-performance on-device AI experiences. I work on optimizing and deploying models on AMD's Ryzen AI powered devices with GPUs and NPUs efficiently
Tuesday April 7, 2026 13:45 - 13:55 CEST
Founders Cafe
  Security & Privacy

13:45 CEST

Bringing ExecuTorch To the Next Frontiers of Edge AI - Mergen Nachin, Meta
Tuesday April 7, 2026 13:45 - 14:10 CEST
Since the General Availability release of ExecuTorch 1.0 in October 2025, our team has continued to advance the state of the on-device AI software stack. In this talk, we will share our upcoming roadmap and present demos that highlight ExecuTorch’s deployment across the next frontiers, such as AI PCs, robotics, TinyML devices, and the integration of AI agents to improve productivity for on-device deployment.

ExecuTorch is built on open source collaboration, encouraging community adoption, contributions from hardware partners, and interoperability with other ecosystem libraries. We will discuss how these foundations set the stage for the next phase of edge AI with ExecuTorch.
Speakers
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →
Tuesday April 7, 2026 13:45 - 14:10 CEST
Master Stage
  Applications & Case Studies

13:45 CEST

Teaching PyTorch To Read Your Worst PDFs With Docling - Mingxuan Zhao & Peter Staar, IBM & Carol Chen, Red Hat
Tuesday April 7, 2026 13:45 - 14:10 CEST
Building production RAG pipelines starts with a problem most teams underestimate: getting clean, structured data out of real-world documents. PDFs lose table structure, figures get separated from captions, and multi-column layouts become unreadable. Before your PyTorch models even see your data, crucial information is already lost.
Docling is an open-source, MIT-licensed document parsing library that uses PyTorch-based deep learning models to understand documents the way humans read them. It preserves hierarchy, extracts structured data from tables and figures, and supports over ten common file formats through a consistent API. Because everything runs locally, it integrates cleanly into PyTorch-native workflows with low latency and no data leaving your infrastructure.
In this talk, I'll walk through Docling's PyTorch-powered architecture and show how to build document processing pipelines for RAG and other GenAI applications. I'll also share the architecture of real-world applications of Docling and how it has improved workflows. You'll leave with practical patterns for connecting Docling to your own PyTorch-based GenAI stack.
Speakers
avatar for Carol Chen

Carol Chen

Principal AI Community Architect, Red Hat
Carol Chen is a Community Architect at Red Hat, having led several upstream communities including InstructLab, Ansible and ManageIQ. She has been actively involved in open source communities while working for Jolla and Nokia previously. In addition, she also has experiences in software... Read More →
avatar for Mingxuan Zhao

Mingxuan Zhao

Software Developer/Developer Advocate, IBM
Ming Zhao is an open source developer and Developer Advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers... Read More →
Tuesday April 7, 2026 13:45 - 14:10 CEST
Junior Stage

13:45 CEST

Why WideEP Inference Needs Data-Parallel-Aware Scheduling - Maroon Ayoub, IBM; Tyler Michael Smith, Red Hat
Tuesday April 7, 2026 13:45 - 14:10 CEST
WideEP—wide expert parallelism fails not because experts are expensive, but because routing ignores where state already lives. In PyTorch LLM serving with vLLM, WideEP fans tokens across many experts while KV caches accumulate unevenly across data-parallel replicas. When routing is unaware of KV placement and per-replica load, requests land on replicas that cannot reuse cache or make progress efficiently and latency spikes as expert fan-out grows.
The fix is not reshaping expert parallelism, but making routing data-parallel aware using signals vLLM already exposes. In this talk, we show how llm-d extends its router to leverage KV-cache locality and load awareness when routing WideEP flows. Rather than treating replicas as interchangeable, the router prefers replicas with warm KV state and available capacity, aligning routing decisions with vLLM’s execution reality and reducing cache fragmentation.
This session walks through how KV-aware, data-parallel routing changes WideEP inference in practice: which signals matter, how routing behavior evolves, and where the gains come from. Attendees leave with a clear mental model for when KV- and load-aware routing unlocks higher throughput.
Speakers
avatar for Maroon Ayoub

Maroon Ayoub

Research Scientist & Architect, IBM Research
Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.
avatar for Tyler Michael Smith

Tyler Michael Smith

Chief Architect - Inference Engineering, Red Hat
Tyler received a PhD in Computer Science at The University of Texas at Austin, studying high performance dense linear algebra - microkernels, parallelism, and theoretical lower bounds on data movement.. After a postdoc at ETH Zürich, he joined Neural Magic, first working on a graph... Read More →
Tuesday April 7, 2026 13:45 - 14:10 CEST
Central Room

14:15 CEST

Lightning Talk: Accelerating On-Device ML Inference With ExecuTorch and Arm SME2 - Jason Zhu, Arm
Tuesday April 7, 2026 14:15 - 14:25 CEST
As on-device AI workloads grow in complexity, achieving low-latency inference within mobile power constraints remains a central challenge. We examine how ExecuTorch, combined with Arm’s Scalable Matrix Extension 2 (SME2), enables efficient CPU deployments of production AI workloads. We present a case study of SqueezeSAM, a segmentation model deployed in real-world mobile applications. Using ExecuTorch with XNNPACK delegation and SME2-optimized kernels, we evaluate INT8 and FP16 inference on a flagship smartphone. Moving beyond aggregate latency, we apply operator-level profiling to decompose runtime across convolution, GEMM, elementwise, and data movement operators, showing how hardware acceleration reshapes bottlenecks in the execution stack. SME2 delivers up to 3.9x end-to-end speedup on a single CPU core, materially altering runtime composition and revealing data movement as the primary post-acceleration bottleneck. This session presents a practical workflow for deploying, profiling, and systematically optimizing on-device PyTorch models, demonstrating how SME2 expands the viable design space for interactive mobile AI.
Speakers
avatar for Jason Zhihuai Zhu

Jason Zhihuai Zhu

Senior Principal Engineer, Arm
Jason Zhu is a Senior Principal Engineer at Arm focused on hardware and software co-optimization for AI systems. With a background in quantum physics and experience spanning AI research and product engineering across major technology companies, he works across the full execution stack... Read More →
Tuesday April 7, 2026 14:15 - 14:25 CEST
Master Stage
  Inference & Production
  • Audience Level Any
  • Slides Attached Yes

14:15 CEST

Sponsored Session: TorchTPU: Expanding TPU Programmability to Pytorch - Kat Ko & Claudio Basile, Google; Jana van Greunen, Meta
Tuesday April 7, 2026 14:15 - 14:40 CEST
Google Tensor Processing Units (TPUs) are designed for ML at massive scale, offering significant benefits in performance, energy, and cost. While TPUs have historically been associated with the TensorFlow and JAX ecosystems, we introduce TorchTPU: a new Google effort to expand TPU programmability to PyTorch.

This talk charts TorchTPU’s evolution, from the initial RFC to establishing a native, eager-first PyTorch backend. We will outline the core technical challenges overcome during this transition—particularly the complexities of translating dynamic, eager execution into highly optimized TPU computations.

We’ll highlight current milestones, including native integration with torch.compile, DTensor, and robust support for the latest Ironwood (TPU v7) architecture. These advancements collectively enable multi-billion parameter models to run on TPUs with minimal code changes, while retaining the ability for users to apply model-specific optimizations (e.g., custom kernels, quantization, sharding) to reach peak performance. Finally, we’ll provide a sneak peek at our roadmap for 2026.
Speakers
avatar for Jana van Greunen

Jana van Greunen

Director of PyTorch Engineering, Meta
Jana van Greunen is the Director of PyTorch Engineering at Meta, where she leads efforts to ensure PyTorch remains the leading AI/ML framework for researchers and developers worldwide. With deep expertise in distributed systems, large-scale infrastructure, and over 15 years of experience... Read More →
avatar for Kat Ko

Kat Ko

Senior Eng Manager, Google
Kat Ko is a Senior Engineering Manager at Google and a lead on TorchTPU, where she drives the integration of PyTorch with TPU technology to enable high-performance computing at scale. An EECS graduate of UC Berkeley, she brings over 15 years of experience building large-scale systems... Read More →
avatar for Claudio Basile

Claudio Basile

Software Engineer, Google
Claudio Basile is a Google Software Engineer and the co-founder and technical lead of TorchTPU. During his tenure at Google, he also authored LiteRT, the company’s new on-device ML framework. With a Ph.D. in ECE from UIUC and over 15 years of experience spanning machine learning... Read More →
Tuesday April 7, 2026 14:15 - 14:40 CEST
Founders Cafe
  Frameworks & Compilers
  • Slides Attached Yes

14:15 CEST

The Token Slice: Implementing Preemptive Scheduling Via Chunked Decoding - Maroon Ayoub, IBM & Kellen Swain, Google
Tuesday April 7, 2026 14:15 - 14:40 CEST
Production LLM serving faces a critical trade-off: while continuous batching maximizes throughput, it often sacrifices SLAs due to Head-of-Line (HoL) blocking. When long-context requests hijack the engine, tail latencies spike. Without fine-grained preemption, guaranteeing priority or fairness remains nearly impossible.

We propose a solution: Chunked Decoding. By treating a fixed number of tokens as a "time slice," we bring 50 years of OS scheduling wisdom to inference. This technique decouples generation from completion, enabling a preemptive multitasking environment for LLMs.

In this talk, we present a sidecar implementation for PyTorch-based servers (like vLLM) that orchestrates decoding in manageable chunks. This allows the system to pause, hold, or swap requests mid-stream without discarding the KV cache. We will share early evaluation results, discussing how varying chunk sizes impact priority handling and tail latency. Attendees will learn how a sidecar approach enables sophisticated scheduling while keeping the core engine lean—offering a blueprint for integrating preemptive scheduling into the next generation of model servers.
Speakers
avatar for Maroon Ayoub

Maroon Ayoub

Research Scientist & Architect, IBM Research
Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.
avatar for Kellen Swain

Kellen Swain

Senior Software Engineer, Google
Kellen is a Senior Engineer at Google, and is a maintainer of both the llm-d and Inference Gateway projects.
Tuesday April 7, 2026 14:15 - 14:40 CEST
Central Room

14:30 CEST

Lightning Talk: Combo Kernels: Horizontal Fusion Optimization in Torch.compile - Karthick Panner Selvam, & Elias Ellison, Meta
Tuesday April 7, 2026 14:30 - 14:40 CEST
Combo kernels are a compiler optimization in PyTorch Inductor that horizontally fuses multiple independent operations into a single Triton kernel launch, reducing GPU kernel launch overhead and improving memory locality.

The Problem: Models generate many small, independent operations like weight preprocessing and tensor copies. Each launch incurs overhead. For models with many such operations, this becomes a bottleneck.

The Solution: Combo kernels combine multiple operations into one kernel using a dispatch mechanism. A single program ID routes execution to the appropriate subkernel based on cumulative block boundaries. This eliminates redundant launches while preserving correctness.

Key Innovations:

Per-subkernel block dimensions: Each subkernel gets its own optimized block size instead of sharing one size across all, enabling better autotuning.

Flattened grid dispatch: We collapse the multi-dimensional block grid into a single dimension.

Results: On H100 GPUs, combo kernels deliver geomean speedups of +7.38% for HuggingFace, and +5.97% for TorchBench. The optimization is enabled by default in the vLLM repository for LLM inference acceleration.
Speakers
avatar for Elias Ellison

Elias Ellison

Software Engineer, Meta
Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack
avatar for Karthick Panner Selvam

Karthick Panner Selvam

Software Engineer, Meta
Karthick Panner Selvam is a SWE at Meta Superintelligence Lab, working on the PyTorch compiler team to enhance performance and scalability for large models. He earned his PhD in Machine for Systems at the University of Luxembourg, collaborating with Google DeepMind, ECMWF, and Frontier... Read More →
Tuesday April 7, 2026 14:30 - 14:40 CEST
Master Stage
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

14:45 CEST

Lightning Talk: Implementing Single-Dim Strategies With Sharding Validator - Anshul Sinha, Meta
Tuesday April 7, 2026 14:45 - 14:55 CEST
DTensor sharding propagation is a major bottleneck to full operator coverage: adding or fixing an op strategy is complex, bug‑prone, and gaps often surface as unexpected resharding and extra collectives. A key source of complexity is that today’s rules conflate (1) semantic correctness—valid input/output sharding combinations for an operator—with (2) search‑space pruning to avoid combinatorial blowups on N‑dimensional meshes.

This talk presents a landed prototype that separates these concerns via Single Mesh Dim Strategies: each operator specifies valid placement combinations for one mesh dimension, while infra expands/composes them across the full mesh and selects low‑cost strategies. For contributors, this provides a clear path to refactor existing op_strategies into single‑dim rules that are easier to review and extend. We also introduce a Truth Table‑style sharding validator that systematically tests shapes and sharding specs to check soundness/completeness and to flag unnecessary redistribution/collectives caused by missing cases.

The goal of this presentation is faster, higher‑confidence contributions that improve correctness and expand DTensor operator coverage.
Speakers
avatar for Anshul Sinha

Anshul Sinha

Software Engineer, Meta
I graduated from the University of Michigan with a B.S in Computer Science in December 2024. I joined Meta's PyTorch Distributed as a SWE in June 2025.
Tuesday April 7, 2026 14:45 - 14:55 CEST
Founders Cafe
  Frameworks & Compilers

14:45 CEST

Brevitas Quantization Library - Pablo Monteagudo Lago, AMD
Tuesday April 7, 2026 14:45 - 15:10 CEST
Brevitas is an open‑source PyTorch library from AMD designed to support the research of state‑of‑the‑art quantization methods, including Qronos (ICLR 2026) and MixQuant (arXiv). Built for flexibility and composability, it offers modular components for exploring reduced‑precision data paths and accuracy‑preserving techniques.
As generative models scale, post‑training quantization (PTQ) has become the preferred strategy for maintaining quality without retraining, yet PTQ methods are often applied in isolation due to fragmented tooling. Brevitas provides a unified environment for modern PTQ algorithms—including Qronos, SpinQuant and AutoRound—enabling practitioners to combine complementary techniques effectively.
Brevitas leverages the latest PyTorch features, like Dynamo for tracing and selectively modifying compute graphs—for example, by inserting rotation ops to mitigate outliers. It integrates with frameworks like transformers and supports export flows including vLLM and GGUF, ensuring a smooth transition from experimentation to deployment.
This talk shows how to use Brevitas for an end‑to‑end quantization flow, showcasing how its flexibility enables new research directions.
Speakers
avatar for Pablo Monteagudo Lago

Pablo Monteagudo Lago

Research Scientist, AMD
Pablo Monteagudo is a research scientist in AMD Research and Advanced Development, based in Dublin. He specialises in co-design of neural networks and accelerators, in particular, working on topics involving neural network quantization, sparsity and accelerator design.
Tuesday April 7, 2026 14:45 - 15:10 CEST
Junior Stage
  Frameworks & Compilers

14:45 CEST

Model-Changing Transforms With Torch.compile - Thomas Viehmann, Lightning AI
Tuesday April 7, 2026 14:45 - 15:10 CEST
torch.compile is the goto mechanism to increase performance of PyTorch models of all shapes and forms.

While it is widely understood how to change the computation by manipulating the FX trace representation, it becomes a much more general tool by also transforming model and input expectations (the guards):
This enables model-changing transformations like quantization and distributed without needing to adapt the model to it.

We take a deep dive into the torch.compile internals to see what's going on under the hood and how we can hook into the gears to enable distributed (starting from a single-GPU model) and quantization.
In this quest, marvel at the interplay between PyTorch's Python code, the Pyton interpreter and PyTorch's C++ code that enable the Dynamo frontend of torch.compile and then use a big hammer to use it in unexpected ways. Building on our experience with Lightning Thunder, an experimental compiler for PyTorch models, we propose a transform mechanism taking care of compute, model, and weights.
Speakers
avatar for Thomas Viehmann

Thomas Viehmann

Thunder, Lightning AI
Thomas Viehmann does PyTorch and Optimization at Lightning AI, PyTorch contributor since 2017, founded MathInf GmbH in 2018, co-authored of “Deep Learning with PyTorch” in 2020.
Tuesday April 7, 2026 14:45 - 15:10 CEST
Master Stage

14:45 CEST

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA
Tuesday April 7, 2026 14:45 - 15:10 CEST
Rapid advances in AI have expanded the range of capabilities required for successful real-world deployment. Understanding where we are in this multi-dimensional frontier is essential for accelerating innovation through effective quality assurance. Rigorous evaluation is increasingly difficult to scale as development requires testing many checkpoints across numerous benchmarks. Model comparison is further complicated by limited transparency of reported results. This talk explores challenges, best practices, and open-source tools that elevate evaluation to a core component of LLM development, delivering continuous signals across the model lifecycle.
We discuss principles for standardizing evaluation methods and improving consistency through practical patterns and anti-patterns, and examples of integrating the science of evaluation directly into model development. Using Nemo-Evaluator, an open-source scalable evaluation tool, we demonstrate modular architectures that enable transparent, reproducible measurement. Finally, we show how Nemo-Evaluator supports reproducible evaluation for the Nemotron model family, helping enable one of the most open development processes in modern AI.
Speakers
avatar for Grzegorz Chlebus

Grzegorz Chlebus

Manager R&D, NVIDIA
Grzegorz Chlebus is a Manager at Frontier Model Evaluation at NVIDIA, where he leads tooling and infrastructure efforts for evaluating frontier AI models. He holds a PhD in Medical Sciences from Radboud University Nijmegen, focused on deep learning-based medical image segmentation... Read More →
Tuesday April 7, 2026 14:45 - 15:10 CEST
Central Room
  GenAI & Multimodal

15:00 CEST

Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology
Tuesday April 7, 2026 15:00 - 15:10 CEST
Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures.
Moreover, existing tensor parallelism strategies typically replicate input data across all processes, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting, medical imaging, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization.
Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations, convolutions, linear layers, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP, FSDP, and Megatron-LM approaches.
Speakers
avatar for Deifilia Kieckhefen

Deifilia Kieckhefen

Doctoral Researcher, Karlsruhe Institute of Technology
Deifilia Kieckhefen is a doctoral researcher at the Karlsruhe Institute of Technology. She works on scalable and distributed training of neural network architectures.
Tuesday April 7, 2026 15:00 - 15:10 CEST
Founders Cafe
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

15:10 CEST

Coffee Break
Tuesday April 7, 2026 15:10 - 15:40 CEST
Menu: 
-Chocolate cake
-Red frutis yogurt (and its wooden spoon) (Gluten Free, Vegetarian)
-Seasonal fruits (GF, Vegan)
-Hummus and vegetable brioche roll
-Dry fruits and dry grapes mix
-Chocolate Cookie (GF, Vegan)
Tuesday April 7, 2026 15:10 - 15:40 CEST
Open Platform

15:10 CEST

Meet the Developers of Helion
Tuesday April 7, 2026 15:10 - 15:40 CEST
This session offers a unique opportunity to connect with the core developers of Helion (https://github.com/pytorch/helion)—ask questions, share feedback, and explore collaboration opportunities with the team.

About Helion
At PTC 2025, we launched Helion (in Beta), a PyTorch-native kernel authoring DSL designed to deliver portable performance across heterogeneous hardware. Since then, Helion has outperformed expert-tuned Triton and CuTe DSL kernels and seen meaningful adoption across research labs, production teams, and OSS frameworks like vLLM.

At PyTorch Conference Europe 2026, we are excited to announce Helion 1.0 (General Availability). Join us to learn how Helion works under the hood and discover what's new in the GA release.

Core Developers
Jason Ansel: Research Scientist, creator of PyTorch Compiler and Helion
Oguz Ulgen: Software Engineer, creator of PyTorch Compiler cache, working on Helion
Will Feng: Software Engineer working on TorchInductor and Helion
Markus Hoehnerbach: Software Engineer focusing on Helion development and kernel authoring

Drop in for an informal discussion, share your experiences, and explore opportunities to collaborate with the team!
Speakers
avatar for Will Feng

Will Feng

Software Engineer, Meta
Will Feng is a Software Engineer in PyTorch Compiler team at Meta. He has been working in PyTorch core and ecosystem for the past 7 years. He is now working on and most excited about torch.compile for distributed training performance.
avatar for Oguz Ulgen

Oguz Ulgen

Software Engineer, Meta
I'm a software engineer at Meta where I used to work on the Hack programming language and now work on PyTorch.
avatar for Jason Ansel

Jason Ansel

Research Scientist, Meta
Jason Ansel is a Research Scientist at Meta AI and a technical lead for PyTorch compilers. He started the TorchDynamo and TorchInductor projects, which bring flexible graph capture and a high performance compiler to PyTorch 2. He received a Ph.D. from MIT and has over 15 years of... Read More →
Tuesday April 7, 2026 15:10 - 15:40 CEST
Open Platform
  Meet the Developers
  • Audience Level Any

15:40 CEST

Lightning Talk: Graph Based Pipeline Parallelism - Sanket Purandare, Meta & Simon Fan, Meta PyTorch
Tuesday April 7, 2026 15:40 - 15:50 CEST
Pipeline parallelism is vital for large models, but advanced schedules for SOTA LLMs are difficult to express in current PyTorch. MoE communication dominates the critical path, making latency hiding essential. Leading systems use fw-bw overlapping; fw-fw and bw-bw overlapping further boost throughput.

Schedules like ZeroBubbleV and DualPipeV rely on dI-dW backward splitting for fine-grained overlap. However, eager-mode implementations require a patchwork of fragile integrations (multi-threading, custom autograd functions, activation checkpointing, etc.) that rely on implicit behavior and hand-written logic with poor torch.compile compatibility and upstream composability.

We present Graph-Based PP: stages are compiled to reusable FX graphs executed via an explicit schedule language. Users write standard PyTorch code while specifying schedules at varying granularity; all manipulations run as graph passes, abstracting complexity away from user code and into the compiler/runtime, allowing for greater composability.

We have integrated Graph-PP into TorchTitan and AutoParallel on real MoE workloads, targeting upstream inclusion in torch.distributed.
Speakers
avatar for Simon Fan

Simon Fan

Software Engineer, Meta
I work on the PyTorch team at Meta, focusing on distributed training efficiency.
avatar for Sanket Purandare

Sanket Purandare

Research Engineer, Meta
Currently, Sanket serves as a Research Engineer at Meta's SuperIntelligence Lab, in PyTorch Distributed and Compiler team. He specializes in performance optimization of large scale training of LLMs based on Mixture of Experts architectures.

Prior to this he obtained his PhD in A... Read More →
Tuesday April 7, 2026 15:40 - 15:50 CEST
Master Stage
  Frameworks & Compilers

15:40 CEST

Lightning Talk: Cross-Region Model Serving: PyTorch Inference, Observability & LLMOps - Suraj Muraleedharan, Amazon Web Services
Tuesday April 7, 2026 15:40 - 15:50 CEST
As PyTorch models move to production, organizations face a critical challenge: deploying, monitoring, and operating inference at scale across multiple regions. Single-region serving is well-understood, but multi-region LLMOps—model distribution, observability, failover, and cost management—remains ad-hoc and challenging for multiple customers.

This session presents production-tested architectures for multi-region PyTorch inference and LLMOps workflows. We cover:

Serving: Multi-region TorchServe/KServe on Kubernetes with latency-based routing, blue-green deployments, model versioning, and automated failover with circuit breakers.

Observability: OpenTelemetry distributed tracing, Prometheus/Grafana dashboards for latency, throughput, GPU utilization, and LLM-specific metrics like time-to-first-token and KV-cache hit rate.

LLMOps: CI/CD pipelines for cross-region model deployment with automated rollback, drift detection, and SLO-based alerting.

Attendees leave with serving architectures, dashboards, and deployment pipelines using open-source tooling.
Speakers
avatar for Suraj Muraleedharan

Suraj Muraleedharan

Principal Platform Engineer, Amazon Web Services
Principal Engineer driving technical strategy and building mission-critical foundational platforms for AI, HPC, and distributed systems, bridging the gap between infrastructure, AI research, and product organizations.
Tuesday April 7, 2026 15:40 - 15:50 CEST
Founders Cafe
  Inference & Production

15:40 CEST

Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams - Michael Lazos, Meta
Tuesday April 7, 2026 15:40 - 16:05 CEST
CUDA streams are a widely-used method for parallelizing GPU computation on NVIDIA GPUs. They have long been requested by our users and enable multiple key capabilities - overlapping communication and compute kernels, training on multiple batches in parallel and parallelizing kernels, all of which are needed for achieving SOTA training performance. Another key capability is activation offloading - this can be applied to any model to prevent OOMs by asynchronously storing activations in cpu memory until they are needed by the model.

Before this work, torch.compile previously would graph break on CUDA stream contexts, which can be costly for models that utilize streams. Although workarounds exist (e.g. wrapping stream manipulation into custom ops), these solutions add complexity and create friction in the user experience. By enabling seamless CUDA stream support in PT2, we allow our users to leverage the familiar eager APIs for stream assignment and synchronization directly within torch.compile. This not only simplifies the workflow but also ensures that models using custom streaming patterns can run efficiently out-of-the-box without manual intervention or code restructuring.
Speakers
avatar for Michael Lazos

Michael Lazos

Software Engineer, Meta
Michael Lazos is a software engineer at Meta where he contributes to torch.compile. His expertise spans both graph extraction with TorchDynamo and generating optimized kernels with the backend compiler TorchInductor. Previously, he was at Microsoft contributing to project Brainwave... Read More →
Tuesday April 7, 2026 15:40 - 16:05 CEST
Central Room
  Frameworks & Compilers

15:40 CEST

torch.compile and Diffusers: A Hands-On Guide to Peak Performance - Sayak Paul, Hugging Face
Tuesday April 7, 2026 15:40 - 16:05 CEST
This session shows how to use torch.compile with the Diffusers library to speed up diffusion models like Flux-1-Dev.

You'll learn practical techniques for both model authors and users. For authors, we cover how to make models compiler-friendly using fullgraph=True. For users, we explain regional compilation (which cuts compile time by 7x while keeping the same runtime gains) and how to avoid recompilations with dynamic=True.

We also cover real-world scenarios: running on memory-constrained GPUs using CPU offloading and quantization, and swapping LoRA adapters without triggering recompilation.

Key takeaways:
- Compiling just the Diffusion Transformer (DiT) delivers ~1.5x speedup on H100
- Regional compilation reduces cold-start compile time from 67s to 9.6s
- NF4 quantization cuts memory from 33GB to 15GB
- Combining quantization + offloading drops memory to 12.2GB
- LoRA hot-swap lets you switch adapters without recompiling

Whether you're building diffusion models or using them, this guide helps you get the best performance with minimal effort.
Speakers
avatar for Sayak Paul

Sayak Paul

Research Engineer, Hugging Face
I am a Research Engineer at Hugging Face, working on image and video generation. My day-to-day includes maintaining the Diffusers library, training, and babysitting models. When I am not working, I can be found either watching Suits for the n-th time or playing the guitar.
Tuesday April 7, 2026 15:40 - 16:05 CEST
Junior Stage

15:55 CEST

Lightning Talk: Running ExecuTorch Applications With Silicon Acceleration, in Ultra-low Power - George Gekov, Arm; Aki Makkonen, Alif Semiconductor
Tuesday April 7, 2026 15:55 - 16:05 CEST
Efficient deployment of ML models on low-power embedded systems has been a significant challenge for a number of years. At the same time, these embedded SoCs are all around us—from everyday appliances to the latest smart glasses.

ExecuTorch is a PyTorch-native framework for deploying neural networks on resource-constrained systems. In this session, we show how to build an end-to-end speech recognition application using PyTorch and ExecuTorch—from training a Transformer-based neural network in PyTorch, through quantization, all the way to deployment on a low-power embedded device.

We will introduce the key ExecuTorch APIs for quantization and explain how models are transformed and lowered into a form that can run efficiently on a device. The application is running on the Alif Ensemble E8 SoC, the first implementation of the leading Arm® Ethos-U85 NPU which brings native support for Transformer models to the ultra-low power domain.

Join the experts from Arm and Alif Semiconductor to see how we are bridging the gap between PyTorch and embedded deployment—and how you can bring PyTorch models to silicon-accelerated, ultra-low-power systems.
Speakers
avatar for George Gekov

George Gekov

ML Engineer, Arm
George Gekov is a Staff Software Engineer in Arm’s Machine Learning team, where he focuses on machine learning inference on embedded systems. He has extensive experience deploying neural networks on resource-constrained devices with Neural Processing Units (NPUs) to enable hardware-accelerated... Read More →
avatar for Aki Makkonen

Aki Makkonen

Senior Staff Application Engineer, Alif Semiconductor
Software engineer with background in telecommunication, medical imaging, robotics and embedded systems.
Tuesday April 7, 2026 15:55 - 16:05 CEST
Founders Cafe

15:55 CEST

Lightning Talk: Beyond Generic Spans: Distributed Tracing for Actionable LLM Observability - Sally O'Malley & Greg Pereira, Red Hat
Tuesday April 7, 2026 15:55 - 16:05 CEST
End-to-end observability is non-negotiable for production LLMs to track performance, attribute costs, and validate optimizations. Generating actionable traces from complex distributed inference remains a significant challenge.

We implemented tracing for llm-d, a high-performance distributed LLM inference framework. Using manual OpenTelemetry instrumentation with carefully crafted spans at critical paths, we expose insights that generic tooling can't capture.

This talk explores how distributed tracing illuminates requests through unique inference scenarios:

* Prefix cache-aware routing: Track cache hits and validate whether intelligent scheduling improves TTFT
* Prefill/decode disaggregation: Analyze why each request chose split vs unified processing based on cache locality.
* Wide expert-parallelism: Profile MoE models across multi-node deployments
* Workload autoscaling: Correlate request patterns with scaling decisions

Attendees will learn why LLMOps requires a new approach to distributed tracing, contrasting it with traditional microservices, and how to instrument inference stacks effectively. Walk away ready to add meaningful observability to your own deployments.
Speakers
avatar for Greg Pereira

Greg Pereira

Sr. Machine Learning Engineer, Red Hat
Greg began his career as SRE focusing on CICD and automation in the Emerging Technologies org at redhat. After transferring to the platform and services team he started from the ground up, refocusing on AI centric software development. Three years later he has been involved in building... Read More →
avatar for Sally O'Malley

Sally O'Malley

Principal Software Engineer, Red Hat

Tuesday April 7, 2026 15:55 - 16:05 CEST
Master Stage

16:10 CEST

Build PyTorch to Understand PyTorch - Vijay Janapa Reddi, Harvard University; Andrea Mattia Garavagno, University of Genoa
Tuesday April 7, 2026 16:10 - 16:35 CEST
PyTorch's success depends on more than users—it needs engineers who understand what's inside. Engineers who can debug framework issues, optimize at the systems level, contribute upstream, and build what comes next. But ML education today produces practitioners who call APIs without understanding them. They train models without knowing why Adam needs 3× the memory of SGD, or what happens when they call loss.backward().

TinyTorch is a 20-module open-source curriculum that closes this gap. Students construct PyTorch's core components—tensors, autograd, optimizers, CNNs, transformers—in pure Python, building a complete framework where every operation is code they wrote. By the final module, they don't just use PyTorch; they understand how to build it.

The curriculum uses progressive disclosure, systems-first profiling from Module 01, and build-to-validate milestones—recreating ML breakthroughs from Perceptron (1958) through Transformers (2017), culminating in MLPerf-style benchmarking.

TinyTorch is how we grow the next generation of PyTorch contributors and the engineers who will build what comes after.

Open source: mlsysbook.ai/tinytorch
Speakers
avatar for Vijay Janapa Reddi

Vijay Janapa Reddi

Professor, Harvard University
Vijay Janapa Reddi is a Professor at Harvard University, where he leads research at the intersection of machine learning and computer systems. He is the author of the open-source Machine Learning Systems textbook (mlsysbook.ai) and co-founder of MLCommons, the organization behind... Read More →
avatar for Andrea Mattia Garavagno

Andrea Mattia Garavagno

Research Fellow, University of Genoa & Scuola Superiore Sant'Anna
I am a Research Fellow holding a joint position at the University of Genoa and Scuola Superiore Sant'Anna. My research is centered on Edge AI, where I am currently working to automate the design of applications through Hardware-Aware Neural Architecture Search (NAS). By running these... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Central Room
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

16:10 CEST

On-Device LLM Inference on Android With ExecuTorch and Qualcomm QNN - Shivay Lamba & Kartikey Rawat, Qualcomm
Tuesday April 7, 2026 16:10 - 16:35 CEST
Multimodal models like CLIP are typically deployed in the cloud due to their size and computational demands, limiting their use in latency-sensitive, privacy-preserving, and offline-first applications. This talk demonstrates how one can run fully on-device CLIP inference on Android using ExecuTorch with the Qualcomm QNN backend, enabling real-time vision–language understanding without server dependency.

One can run models like CLIP (ViT-B/32) model entirely on edge devices, leveraging QNN for hardware-accelerated inference. A key focus of the talk is a deep dive into ExecuTorch optimizations for QNN, including graph lowering, operator fusion, quantization strategies, memory planning, and backend-specific execution choices that materially impact latency, memory footprint, and power consumption.

The talk will cover architectural insights, model export and compilation workflows, and real-world benchmarks covering latency, memory usage, and power efficiency. This talk highlights how large multimodal PyTorch models can be made production-ready on edge devices, unlocking new classes of private, offline-capable AI applications.
Speakers
avatar for Shivay Lamba

Shivay Lamba

Senior ML Engineer, Qualcomm
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked at organizations like Amazon, EY, Genpact. He is a Tensorflow.JS SIG member and community lead from In... Read More →
avatar for Kartikey Rawat

Kartikey Rawat

Senior Developer Advocate, Qualcomm
Senior Developer Advocate at Qualcomm| Google Developer Expert in AI and Google Cloud
Tuesday April 7, 2026 16:10 - 16:35 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any

16:10 CEST

Optimizing Reinforcement Learning at Trillion-Parameter Scale - Songlin Jiang, Aalto University & Mind Lab
Tuesday April 7, 2026 16:10 - 16:35 CEST
This talk will dive into how we implemented and optimized reinforcement learning on trillion-parameter Mixture-of-Experts reasoning models using veRL, Megatron-Bridge and vLLM. The session is useful to anyone building large-scale RL training systems.

For the first part, I will walk through the system design required to make RL work at this scale using LoRA: how LoRA adapters are implemented for expert layers, how adapters are sharded and fused under tensor/pipeline/expert parallelism, and most importantly, how refit (parameter sync) is implemented for LoRA between training backend (Megatron) and rollout engine (vLLM).

The second part of the talk focuses on training–inference mismatch in MoE RL. I will explain why common mitigations such as clipping and importance sampling can fail, and how we implement fixed Router Replay R3 across vLLM, veRL, and Megatron to align routing decisions between rollout and training.

These works are done together with Mind Lab and some of the related blog posts are at:
- https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- https://macaron.im/mindlab/research/router-replay-r3-why-it-failed-and-how-we-fixed-it
Speakers
avatar for Songlin Jiang

Songlin Jiang

Doctoral Researcher, Aalto University & Mind Lab
I am a doctoral researcher at Aalto University, focusing on reducing training and inference latency for Reinforcement Learning and Large Language Models (LLMs) on High-Performance Computing (HPC) clusters. I am also a passionate free software developer, a maintainer of VeRL, and a... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Junior Stage
  Training Systems

16:10 CEST

TorchStore: What We Learned Building Distributed Storage Solutions for AysncRL - Lucas Pasqualin, Danielle Pintz, Allen Wang, Amir Afzail Meta
Tuesday April 7, 2026 16:10 - 16:35 CEST
Asynchronous Reinforcement Learning (AsyncRL) workloads have unique data sharing requirements: actors must efficiently exchange large tensors across processes and nodes, often with different sharding configurations—not just at checkpoint time, but continuously during training for live weight synchronization. This talk presents Torchstore, an open-source distributed tensor storage system built on Monarch actors that tackles these challenges. We'll share the key lessons learned—from designing pluggable transport backends (RDMA, shared memory, RPC) to implementing transparent live DTensor resharding that lets producers and consumers use entirely different parallelism strategies. We'll also discuss the friction we encountered integrating with inference engines like vLLM, where differing model definitions and integrations present new bottlenecks. Whether you're building actor-based training systems or thinking about disaggregated training-inference architectures, you'll leave with practical insights on distributed tensor storage design.
Speakers
avatar for Lucas Pasqualin

Lucas Pasqualin

ML Engineer, PyTorch (Meta)
Lucas has been developing Machine Learning Applications and Machine Learning infrastructure at scale for years, and has recently been focused on extending the product offering of PyTorch's Distributed Checkpointing stack.
AW

Allen Wang

Software Engineer, Meta
avatar for Danielle Pintz

Danielle Pintz

Software Engineer, Meta
Danielle is a software engineer working on PyTorch, currently focused on TorchStore and Async RL. She previously worked on the Llama Research team.
avatar for Amir Afzali

Amir Afzali

Software Engineer, Meta
Software engineer working on Pytorch distributed infra and large scale training
Tuesday April 7, 2026 16:10 - 16:35 CEST
Master Stage

16:40 CEST

Lightning Talk: TerraKit: Standardising AI-Ready Geospatial Data Preparation for the TorchGeo Ecosystem - Rosie Lickorish & Romeo Kienzler, IBM
Tuesday April 7, 2026 16:40 - 16:50 CEST
With the advent of geospatial foundation models, unexplored use cases are emerging that require well-curated datasets. Currently, no standardised approach exists for creating such AI-ready geospatial datasets. In this session, we introduce TerraKit: a comprehensive open-source Python library for retrieving, and processing geospatial data, that seamlessly integrates with upstream geospatial model training libraries such as TorchGeo or TerraTorch.

From raster/vector annotations, TerraKit will match, download, process, align and split the requested data source (e.g., EarthData, CDSE, Planetary Computer) based on user specifications provided by a simple configuration file. TerraKit also supports spatial train/val splits and exports datasets in standard formats such as TACO datasets. TerraKit streamlines the pipeline from raw EO data to AI-ready datasets, accelerating the development of custom geospatial applications, and ensuring query and processing pipelines are reproducible. By lowering the barrier to entry, a wider community of TorchGeo and TerraTorch users are empowered to leverage foundation models for Earth observation.
Speakers
avatar for Romeo Kienzler

Romeo Kienzler

AI Research Engineer, IBM
Romeo is a data scientist working for IBM Research and an advocate for ethical machine learning, transparency and privacy
avatar for Rosie Lickorish

Rosie Lickorish

Research Software Engineer, IBM
Rosie is a Research Software Engineer at IBM, specializing in the development of next-generation tools and technologies designed to drastically accelerate solutions for today’s most urgent global challenges. Her technical focus involves leveraging geospatial data, AI models... Read More →
Tuesday April 7, 2026 16:40 - 16:50 CEST
Central Room
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

16:40 CEST

Optimizing PyTorch on CPU-GPU Coherent Platforms - Matthias Jouanneaux, Nvidia
Tuesday April 7, 2026 16:40 - 17:05 CEST
In recent years, both Nvidia and AMD have introduced hardware coherent platforms: GH200, GB200 and MI300A. These coherent platforms provide both many new features and challenges for PyTorch applications attempting to make the most out of the platform.
This talk will focus on Nvidia's GB200 and walk through techniques to utilize the features of the coherent architecture in PyTorch, such as the high CPU-GPU interconnect bandwidth, unified memory, as well as the advantages and caveats of sharing system memory between CPU and GPU.
Speakers
avatar for Matthias Jouanneaux

Matthias Jouanneaux

Sr Software Engineer - PyTorch, NVIDIA
After his master’s degree, Matthias Jouanneaux worked at Konica Minolta's european research lab on medical image analysis using deep learning for 2 years.
He then joined Nvidia, focusing on optimizing application performance for Nvidia hardware as a Developer Technology enginee... Read More →
Tuesday April 7, 2026 16:40 - 17:05 CEST
Founders Cafe
  Frameworks & Compilers

16:40 CEST

Securing Agentic AI With PyTorch: Threat Modeling & LLM Red Teaming in Practice - Valeri Milke, VamiSec GmbH
Tuesday April 7, 2026 16:40 - 17:05 CEST
Agentic AI systems built with PyTorch introduce a new security paradigm: autonomous decision-making, tool usage, memory, and multi-step reasoning significantly expand the attack surface beyond traditional ML pipelines.

This session presents a practical, security-first approach to building and testing agentic AI systems using PyTorch, combining AI threat modeling and hands-on LLM security testing.

We introduce MAESTRO-based AI Threat Modeling to systematically identify risks across prompts, tools, memory, orchestration and model interactions. Building on this foundation, we demonstrate how the OWASP LLM Top 10 and the OWASP LLM Testing Guide can be applied to real PyTorch-based agent architectures.

The session includes a live demo of a prompt injection attack against an agentic workflow, showing how task delegation and tool invocation can be abused — and how developers can detect, mitigate and test these risks early in the AI development lifecycle.

Attendees will leave with concrete techniques to integrate AI security testing and threat modeling into PyTorch-based systems, bridging research, engineering and real-world AI risk.
Speakers
avatar for Valeri Milke

Valeri Milke

CEO, VamiSec GmbH
Valeri Milke is an AI security and cybersecurity specialist focusing on secure AI and agentic system design. He works at the intersection of PyTorch-based AI engineering, threat modeling and LLM security testing. His work includes AI red teaming, prompt injection analysis and the... Read More →
Tuesday April 7, 2026 16:40 - 17:05 CEST
Junior Stage

16:55 CEST

Lightning Talk: Bayesian Neural Networks With Variational Inference in PyTorch - Lars Heyen, Karlsruhe Instute of Technology, Scientific Computing Center
Tuesday April 7, 2026 16:55 - 17:05 CEST
Uncertainty quantification is becoming more and more important as neural networks are used for increasingly critical tasks. Bayesian neural networks (BNNs) inherently provide a measure of their own uncertainty, but can be either hard to implement or inflexible if one uses common frameworks. In this session I discuss how to efficiently implement BNNs using Variational Inference within PyTorch and present torch_blue, a light-weight open source library that implements these methods with the goal of being easy to pick up, yet flexible enough for research on BNNs.
Speakers
avatar for Lars Heyen

Lars Heyen

PostDoc, Karlsruhe Institute of Technology
I am a postdoctoral researcher working on uncertainty quantification in the research group "Robust and Efficient AI" at the Scientific Computing Center of the Karlsruhe Institute of Technology. I also coauthored the PyTorch-based library torch_blue for implementing Bayesian neural... Read More →
Tuesday April 7, 2026 16:55 - 17:05 CEST
Central Room
  Frameworks & Compilers
  • Audience Level Any
  • Slides Attached Yes

17:05 CEST

Flare Party
Tuesday April 7, 2026 17:05 - 18:30 CEST
Wrap up Day 1 of PyTorch Conference Europe 2026 at our official Flare Party. It’s the perfect opportunity to unwind, network, and keep the day’s momentum going.

Enjoy complimentary beer, wine, and appetizers as you connect with speakers, core contributors, and fellow developers. Throughout the evening, explore the Poster Sessions, where presenters will be available for live Q&A to spark deeper technical conversations. Be sure to also visit our sponsor booths to discover new tools and pick up conference swag.

Join us to recharge and celebrate with the PyTorch community!

Menu:
THE COLD PIECES
-Cheese cone: comté and osso iraty, salad, grapes, and nuts (Gluten Free, Vegetarian)
-Sausage cone
-Vegetable chips cone (Vegan, Vegetarian)
-Bretzel bread (Vegan, Vegetarian)

THE HOT PIECES
-Parmesan cream and artichoke brioche tart with arugula (Vegetarian)
-Watercress brioche tart (Vegetarian)
-Green frittata with olives (Gluten Free, Vegetarian)
-Flammen truffle (Vegetarian)
-Red lentil, ginger, and coconut milk soup (Vegan, Vegetarian)

Wine, Beer, Soft Drinks
Tuesday April 7, 2026 17:05 - 18:30 CEST
Open Platform

17:05 CEST

Poster Presentations: Applications & Case Studies
Tuesday April 7, 2026 17:05 - 18:35 CEST
  1. LegoLoaderX: a PyTorch DataLoader for Sparse Spatio-Temporal Data - Michelle Audirac, Harvard University
  2. Stress State Estimation from Deformed Surface Images Using Deep Learning - Bakhtiyar Mammadli, NOMATEN Centre of Excellence, National Centre for Nuclear Research
Speakers
MA

Michelle Audirac

Senior Data Scientist, Harvard University

avatar for Bakhtiyar Mammadli

Bakhtiyar Mammadli

PhD Student, NOMATEN Centre of Excellence, National Centre for Nuclear Research
Bakhtiyar Mammadli is a PhD student in Mechanical Engineering at the NOMATEN Centre of Excellence (NCBJ, Poland). His research focuses on applying machine learning to experimental mechanics, particularly unsupervised methods for analyzing strain fields from Digital Image Correlation... Read More →
Tuesday April 7, 2026 17:05 - 18:35 CEST
Open Platform

17:05 CEST

Poster Presentations: Frameworks & Compilers
Tuesday April 7, 2026 17:05 - 18:35 CEST
  1. Automatic Comm-Compute Overlap and Bucketing in torch.compile - Elias Ellison & Ivan Kobzarev, Meta
  2. Flexible Custom Operators: custom ops with arbitrary inputs and outputs - Angela Yi & Richard Zou, Meta
  3. How Your Code Becomes a Kernel - Harshita Varma, Juspay; Nikita Verma, Individual
  4. TorchCodec: The Easy and Efficient Media Decoding Library for PyTorch - Daniel Flores & Molly Xu, Meta
  5. TorchDynamo Debugging Tools for Power Users - William Wen, Meta
  6. Accelerating GNN Workloads on ARM CPUs with PyTorch Geometric - Akash Agrawal, Fujitsu Research of India; Co-Authors: N Maajid Khan & Devang Choudhary, Fujitsu Research of India
  7. Reaching SOTA Normalization Performance with torch.compile -  Paul Zhang & Shunting Zhang, Meta

Speakers
avatar for Angela Yi

Angela Yi

Software Engineer, Meta
Angela has been on the PyTorch Compiler team for the past 3 years, working on torch.export and AOTInductor.
avatar for Richard Zou

Richard Zou

Software Engineer, Meta
I work on PyTorch.
avatar for William Wen

William Wen

Software Engineer, Meta
William works on the torch.compile team, specializing in TorchDynamo.
avatar for Nikita Verma

Nikita Verma

cloud native developer, Indian Institute of Technology bhubaneswar
Nikita Verma is an active contributor to the open-source community with a strong focus on Kubernetes and cloud-native technologies. She worked on developing forest growth simulations, automating configuration generation, and integrating CI/CD workflows. Nikita has volunteered at KubeCon... Read More →
avatar for Harshita Varma

Harshita Varma

Product Manager, Juspay
Harshita Varma is a contributor to the Kubernetes project, actively involved in the SIG Contributor Experience community, with a focus on enhancing the contributor journey. She began her open-source journey by contributing to the Thanos project, sparking her passion for open source... Read More →
avatar for Daniel Flores

Daniel Flores

Software Engineer, Meta
Daniel is a software Engineer at Meta working on Torchcodec. Previously, Daniel studied computer science at Brown University.
avatar for Ivan Kobzarev

Ivan Kobzarev

Software Engineer, Meta

avatar for Elias Ellison

Elias Ellison

Software Engineer, Meta
Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack
avatar for Paul Zhang

Paul Zhang

Software Engineer, Meta
Paul Zhang is currently a software engineer working on PyTorch and Triton at Meta, ensuring that PyTorch and PT2 best utilizes the hardware it is run on. Previous to this, Paul has done extensive work on recommendation systems for training and inference, optimizing performance and... Read More →
MX

Molly Xu

Software Engineer, Meta

avatar for Akash Agrawal

Akash Agrawal

Software Engineer - II, Fujitsu Research of India Private Limited
Akash is a Software Engineer in Fujitsu Research of India, working actively on AI Framework Software Stack Optimization and Open-Source Software developments for FUJITSU-MONAKA – a 2 nanometer Armv9-A architecture-based CPU, for handling AI/HPC workloads and energy efficient co... Read More →
Tuesday April 7, 2026 17:05 - 18:35 CEST
Open Platform

17:05 CEST

Poster Presentations: GenAI & Multimodal
Tuesday April 7, 2026 17:05 - 18:35 CEST
  1. Unifying Modalities: Building Efficient Video Flows with PyTorch and Diffusion Transformers - David Brewster, Red Hat

Speakers
DB

David Brewster

Principal Software Engineer, Red Hat

Tuesday April 7, 2026 17:05 - 18:35 CEST
Open Platform

17:05 CEST

Poster Presentations: Inference & Production
Tuesday April 7, 2026 17:05 - 18:35 CEST
  1. A Tale of Two DSLs: A Comparative Study of vLLM GPU Performance with cuTile and CuTe DSL - Anil Vishnoi & Matthew Odden, Red Hat
  2. Bringing BitNet to ExecuTorch via Vulkan - Marcus Edel & Vineet Suryan, Collabora
  3. Building Production-Grade PyTorch Inference Pipelines for 100K+ Heterogenous Devices - Samaresh Kumar Singh, HP Inc.
  4. Feather: Software Emulated FP8 for Older GPUs - Suriyaa MM, Indian Institute of Technology Tirupati
  5. Model Compression API for a fast and precise inference in ExecuTorch - Daniil Liakhov & Aamir Nazir, Intel
  6. Near-Lossless MXFP4 Compression for Accelerated LLM Serving: Jointly Tuning Distribution Transforms - Felix Marty, AMD
  7. Recursive Language Models (RLMs): Scaling to Infinite Context via Programmatic Decomposition - Rudraksh Karpe, Simplismart, Shivay Lamba, Qualcomm
  8. When CPUs Win: Profiling Small Neural Network Inference Through PyTorch's Compiler and Runtime Stack - Dave Grove & Olivier Tardieu, IBM; Co-Authors: Vivek Mankar, Nethra Khandige, Pradipta Ghosh, & Anto Ajay Raj John, IBM
Speakers
avatar for Dave Grove

Dave Grove

Distinguished Research Scientist, IBM
David Grove is a Distinguished Research Scientist at IBM T.J. Watson, NY, USA. He has been a software systems researcher at IBM since 1998, specializing in programming language implementation and scalable runtime systems. He has authored more than sixty peer-reviewed publications... Read More →
avatar for Olivier Tardieu

Olivier Tardieu

Principal Research Scientist, Manager, IBM
Dr. Olivier Tardieu is a Principal Research Scientist and Manager at IBM T.J. Watson, NY, USA. He joined IBM Research in 2007. His current research focuses on cloud-related technologies, including Serverless Computing and Kubernetes, as well as their application to Machine Learning... Read More →
avatar for Shivay Lamba

Shivay Lamba

Senior ML Engineer, Qualcomm
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked at organizations like Amazon, EY, Genpact. He is a Tensorflow.JS SIG member and community lead from In... Read More →
avatar for Marcus Edel

Marcus Edel

Machine Learning Lead, Collabora
Marcus Edel is the the machine-learning lead at Collabora, where he leads the effort to optimise and apply deep networks for inference, with a focus on embedded devices. Marcus completed his graduate studies in 2020 with a focus on fast algorithms for core machine learning tasks applied... Read More →
avatar for Anil Vishnoi

Anil Vishnoi

Principal Software Engineer, RedHat Inc
Anil has been doing research, design and development of software networking products for more than 15 years at RedHat and his prior employers. Most of his career he has been working in Software Defined Networks, Data Center Networking, Network Virtualization and Cloud Networking domain... Read More →
avatar for Rudraksh Karpe

Rudraksh Karpe

Forward Deployed Engineer, Simplismart
Rudraksh is FDE at Simplismart, where he builds solutions focused on high-performance AI inference. He previously worked as an AI Engineer at ZS Associates. He was a two-time Google Summer of Code participant with the openSUSE Project and

He has presented internationally at events including OpenSearch Korea, openSUSE Conference, Early Adopter Tech Summit, PyCon US, PyCon Japan, and openSUSE Asia Summit, focusing on GenAI, open source, and cloud-native technologies... Read More →
avatar for Samaresh Kumar Singh

Samaresh Kumar Singh

Principal Engineer, HP Inc.
Samaresh Kumar Singh is an engineering principal at HP Inc. with more than 21 years of experience in designing and implementing large-scale distributed systems, cloud native platform systems, and edge AI / ML systems. His expertise includes agentic AI systems, GenAI / LLMs, Edge AI... Read More →
SM

Suriyaa MM

Student, Indian Institute of Technology Tirupati

avatar for Daniil Lyakhov

Daniil Lyakhov

AI Research Engineer/Scientist, Intel corporation

avatar for Felix Marty

Felix Marty

Senior Software Engineer, AMD
Felix Marty is a software engineer specialized in deep learning model compression, working on AMD Quark open-source model compression toolkit, and contributing to algorithms, evaluations, hardware deployment and open-source integrations. Prior to AMD, he used to work at Hugging Face... Read More →
avatar for Vineet Suryan

Vineet Suryan

Senior Software Engineer, Collabora


avatar for Aamir Nazir

Aamir Nazir

Research Engineer, Intel

MO

Mathew Odden

Principle Software Engineer, Red Hat

Tuesday April 7, 2026 17:05 - 18:35 CEST
Open Platform

17:05 CEST

Poster Presentations: Responsible AI & Compliance
Tuesday April 7, 2026 17:05 - 18:35 CEST
  1. When Models Collaborate but Data Cannot: Explainable Ensemble Learning Under Privacy Constraints - Pavani Rajula, NeuCorelytix Solutions LLP

Speakers
avatar for Pavani Rajula

Pavani Rajula

AI Developer, NeuCorelytix Solutions LLP

I’m Pavani Rajula, a Data Science and AI Developer working with Data Migration International AG, currently working remotely from India. I have nearly six years of experience in data engineering, machine learning and artificial intelligence, including two years of professional experience... Read More →
Tuesday April 7, 2026 17:05 - 18:35 CEST
Open Platform

18:30 CEST

Open Source AI Soirée hosted by Label Studio and Docling
Tuesday April 7, 2026 18:30 - 21:00 CEST
Join Label Studio and Docling for an evening of conversation, connection, and community during PyTorch Conf EU.

Whether you're working on training pipelines, document workflows, evaluation systems, or production AI infrastructure, this gathering is a chance to meet peers, exchange ideas, and connect with others building real-world AI.

The evening will bring together the technical founder of Label Studio (CTO of HumanSignal), the creator of Docling, PyTorch Conf attendees, and the greater open source AI ecosystem.  The venue is a short walk from Station F on the Seine. 

Location: Near Station F (PyTorch venue), Paris
Date: Tuesday 7 April 2026
Time: 18:30 CET
Click here to register

Tuesday April 7, 2026 18:30 - 21:00 CEST
TBA
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Level
  • Slides Attached
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -