Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Venue: Founders Cafe clear filter
arrow_back View All Dates
Tuesday, April 7
 

11:00 CEST

Lights, Camera, Inference! Video Generation as a Service With VLLM-Omni - Ricardo Noriega, Red Hat & Doug Smith, Red Hat, Inc
Tuesday April 7, 2026 11:00 - 11:25 CEST
LLMs made for text generation as a service. What does it take to do the same for video?
We built an experimental Video Generation as a Service stack using vLLM-Omni and the LTX-2 open weights video model to explore how far an open, multimodal stack can go toward production use. We’ll share what worked, what busted, and what it takes to treat generative video as a first-class workload.
vLLM is known for high-performance autoregressive inference, and vLLM-Omni extends that foundation to multimodal inputs and outputs. We pushed those capabilities further by adding support for LTX-2, extending the OpenAI-compatible API surface, integrating with front ends, and packaging for scalable deployment. We’re here to walk you through and get you familiar with the touch points for just how we put all the Legos together with vLLM-Omni.
Finally, we’ll examine the gap between novelty demos and real applications: going from quirky spaghetti eating videos to generating consistent characters, personalized media, customized video game cutscenes, and interactive storytelling, and highlight what’s still missing to make generative video truly production-ready.
Speakers
avatar for Doug Smith

Doug Smith

Principal Software Engineer, Red Hat
Doug Smith is a Principal MLOps Engineer at Red Hat, where he works on the AI Inference Server team and contributes upstream to the vLLM project through its CI Special Interest Group. Recently, he's also been looking into contributions to vLLM-Omni. He’s spent years bridging telecom... Read More →
avatar for Ricardo Noriega

Ricardo Noriega

Principal SW Engineer, Red Hat
Ricardo is a Principal Software Engineer working at the Red Hat's Office of the CTO in the Emerging Technologies organization. Ricardo is currently focused on AI multimodality and researching the benefits of Small Language Models.
He is a former member of the Akraino TSC and PTL of the Kubernetes-Native-Infrastructure blueprint family, and contributor to Kubernetes, OpenStack, OpenDaylight and OPNFV... Read More →
Tuesday April 7, 2026 11:00 - 11:25 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any
  • Slides Attached Yes

11:30 CEST

Lightning Talk: Coding Agents for Compiler Construction: Beyond the AI Assistant Paradigm - Reza Rahimi, yasp.ai & Stefan Krassin, yasp
Tuesday April 7, 2026 11:30 - 11:40 CEST
Modern ML compilers follow a familiar pattern: a frontend lowers models into an intermediate representation, while a backend applies graph and kernel optimizations before generating code for target accelerators. PyTorch provides strong foundations through nn.Module, FX, and graph capture, but implementing optimized backends remains challenging due to hardware diversity and kernel-level complexity.

Optimizing GPU kernels is hard. Few engineers do it well. Hardware architectures evolve yearly, and with hyperscalers, chip makers, and AI labs building custom silicon, demand for efficient kernel generation keeps growing. This creates a gap between model developers and hardware capabilities.

This talk explores coding agents as engineering tools for compiler construction, not general-purpose assistants. We discuss how agents can generate and refine backend components by analyzing model mathematics and hardware specifications to produce optimized kernels tailored to specific targets.

We present a compiler architecture built as a PyTorch add-on that accepts PyTorch models or FX graphs and produces executable artifacts, demonstrating practical integration with existing PyTorch workflows.
Speakers
avatar for Reza Rahimi

Reza Rahimi

CTO, yasp
Reza Rahimi is a seasoned technologist with a strong background in accelerating engineering software and scaling machine learning systems. With experience leading teams across embedded AI, compiler design, and model optimization, he now serves as CTO of yasp, where he is pioneering... Read More →
avatar for Stefan Krassin

Stefan Krassin

CEO, yasp.ai
With a background in electrical engineering and a career spanning embedded systems to executive leadership, he combines technical expertise with a vision for scale. After 10+ years of leading companies to outstanding growth, he co-founded yasp in 2023. His mission is to eliminate... Read More →
Tuesday April 7, 2026 11:30 - 11:40 CEST
Founders Cafe
  Agents & Interop

11:45 CEST

Lightning Talk: TorchJD: Jacobian Descent in PyTorch - Pierre Quinton, EPFL & Valérian Rey, Simplex Lab
Tuesday April 7, 2026 11:45 - 11:55 CEST
Jacobian descent (JD) is an extension of gradient descent supporting the optimization of vector-valued functions. This algorithm can be used to train neural networks with multiple loss functions (e.g. multi-task learning). JD iteratively updates the parameters of the model using the Jacobian matrix of the vector of losses (the matrix stacking each individual loss' gradient).

To support and extend our research, we have developed the TorchJD library. With it, it's easy and efficient to compute the Jacobians with respect to the model parameters, and to aggregate them into an update direction that is beneficial to every objective. In contrast, if we had averaged the losses and used gradient descent, the update would have been beneficial to the average loss, but may have actually increased one of the individual losses.

In this session, we will give a quick introduction to the theory behind Jacobian descent, and then show how to use TorchJD on a variety of use-cases, beyond multi-task learning.

Library: https://github.com/TorchJD/torchjd
Paper: https://arxiv.org/abs/2406.16232
Speakers
avatar for Pierre Quinton

Pierre Quinton

Teacher, EPFL
PhD in Information Theory and Master in Data Science, specializing in fundamental math and multi-objective optimization (MOO). I am the co-author of TorchJD, a PyTorch library for Jacobian Descent developed with Valerian, currently at ~300 GitHub stars. My work aims to translate complex... Read More →
avatar for Valérian Rey

Valérian Rey

Research Engineer, Simplex Lab
I graduated from EPFL with a MSc in Data Science in 2021. Since then, I worked as a Data Scientist as Withings, and I worked on Jacobian descent, initially as a side-project, but now as a full-time occupation. I now spend most of my time developing and maintaining TorchJD, and I love... Read More →
Tuesday April 7, 2026 11:45 - 11:55 CEST
Founders Cafe
  Training Systems

12:00 CEST

Lightning Talk: Ethical, Privacy and Sustainability Considerations in PyTorch Systems - Paula Mesa Macias, Pau&Company
Tuesday April 7, 2026 12:00 - 12:10 CEST
PyTorch models are part of larger systems that handle data, logs, APIs and other services. Ethical, privacy, security and environmental considerations appear not only around the AI itself, but across the whole system.
Using the Ethical Software Framework and the Ethical IT Audit, this session explores practical ways to think about these issues in real workflows. It highlights situations where decisions in data handling, model deployment, logging or infrastructure have ethical, compliance or sustainability implications. It also shows considerations for using AI responsibly, such as dataset choices, bias awareness and evaluating risks before deployment.
The goal is to provide a clear, structured lens for identifying risks and trade-offs, making ethical, privacy, security, and sustainability concerns easier to discuss in everyday work.
Speakers
avatar for Paula Mesa Macias

Paula Mesa Macias

Founder and Ethical Technology Consultant, Pau&Company
Founder of Pau&Company (https://pau.company/), which offers Ethical IT Audits (https://pau.company/ethical-it-audit/) based on the Ethical Software Framework (https://pau.company/ethical-software-framework/), Paula focuses on ethical considerations in technology. Through Pau&Company... Read More →
Tuesday April 7, 2026 12:00 - 12:10 CEST
Founders Cafe

13:45 CEST

Lightning Talk: From Pretrained To Personal: Privacy-First Fine-Tuning on AI PCs - Daniel Holanda Noronha & Iswarya Alex, AMD
Tuesday April 7, 2026 13:45 - 13:55 CEST
Pytorch on AI PCs crossed a threshold: local hardware can now support meaningful model fine-tuning, not just inference. This unlocks a new class of enterprise workflows where sensitive data never leaves the device, yet models can still be personalized and adapted using PyTorch.

In this session, we’ll show how to design on-device fine-tuning pipelines for AI PCs, focusing on enterprise scenarios where privacy is non-negotiable: regulated healthcare data, government and public-sector workloads, financial services, and proprietary enterprise systems. We’ll walk through key decisions such as selecting efficient pre-trained models, and how the right PyTorch optimizations enable effective personalization on large private datasets.

We'll also showcase practical fine-tuning techniques such as supervised fine-tuning (SFT), LoRA, and QLoRA, and show how mixed-precision training and correct use of training vs. evaluation modes make these approaches efficient and practical on AI PCs while preserving privacy. The result is a cloud-free, privacy-first fine-tuning blueprint that turns AI PCs into secure personalization engines for enterprise AI.
Speakers
avatar for Daniel Holanda

Daniel Holanda

Solutions Architect & ML Engineer, AMD
Daniel is a Sr. ML Engineer at AMD, specializing in local AI. He leads the development of local fine-tuning workflows for AI PCs and co-leads several open-source projects where he designs production-grade LLM/VLM tooling to accelerate the AI development lifecycle.

Previously, he was a Machine Learning Engineer at Groq and a contributor to Microsoft’s Project Brainwave. Daniel holds a PhD in AI understanding and hardware architecture from UBC... Read More →
avatar for Iswarya Alex

Iswarya Alex

Iswarya Alex, AMD
I am an ML Engineer at AMD focused on enabling high-performance on-device AI experiences. I work on optimizing and deploying models on AMD's Ryzen AI powered devices with GPUs and NPUs efficiently
Tuesday April 7, 2026 13:45 - 13:55 CEST
Founders Cafe
  Security & Privacy

14:15 CEST

Sponsored Session: TorchTPU: Expanding TPU Programmability to Pytorch - Kat Ko & Claudio Basile, Google; Jana van Greunen, Meta
Tuesday April 7, 2026 14:15 - 14:40 CEST
Google Tensor Processing Units (TPUs) are designed for ML at massive scale, offering significant benefits in performance, energy, and cost. While TPUs have historically been associated with the TensorFlow and JAX ecosystems, we introduce TorchTPU: a new Google effort to expand TPU programmability to PyTorch.

This talk charts TorchTPU’s evolution, from the initial RFC to establishing a native, eager-first PyTorch backend. We will outline the core technical challenges overcome during this transition—particularly the complexities of translating dynamic, eager execution into highly optimized TPU computations.

We’ll highlight current milestones, including native integration with torch.compile, DTensor, and robust support for the latest Ironwood (TPU v7) architecture. These advancements collectively enable multi-billion parameter models to run on TPUs with minimal code changes, while retaining the ability for users to apply model-specific optimizations (e.g., custom kernels, quantization, sharding) to reach peak performance. Finally, we’ll provide a sneak peek at our roadmap for 2026.
Speakers
avatar for Jana van Greunen

Jana van Greunen

Director of PyTorch Engineering, Meta
Jana van Greunen is the Director of PyTorch Engineering at Meta, where she leads efforts to ensure PyTorch remains the leading AI/ML framework for researchers and developers worldwide. With deep expertise in distributed systems, large-scale infrastructure, and over 15 years of experience... Read More →
avatar for Kat Ko

Kat Ko

Senior Eng Manager, Google
Kat Ko is a Senior Engineering Manager at Google and a lead on TorchTPU, where she drives the integration of PyTorch with TPU technology to enable high-performance computing at scale. An EECS graduate of UC Berkeley, she brings over 15 years of experience building large-scale systems... Read More →
avatar for Claudio Basile

Claudio Basile

Software Engineer, Google
Claudio Basile is a Google Software Engineer and the co-founder and technical lead of TorchTPU. During his tenure at Google, he also authored LiteRT, the company’s new on-device ML framework. With a Ph.D. in ECE from UIUC and over 15 years of experience spanning machine learning... Read More →
Tuesday April 7, 2026 14:15 - 14:40 CEST
Founders Cafe
  Frameworks & Compilers
  • Slides Attached Yes

14:45 CEST

Lightning Talk: Implementing Single-Dim Strategies With Sharding Validator - Anshul Sinha, Meta
Tuesday April 7, 2026 14:45 - 14:55 CEST
DTensor sharding propagation is a major bottleneck to full operator coverage: adding or fixing an op strategy is complex, bug‑prone, and gaps often surface as unexpected resharding and extra collectives. A key source of complexity is that today’s rules conflate (1) semantic correctness—valid input/output sharding combinations for an operator—with (2) search‑space pruning to avoid combinatorial blowups on N‑dimensional meshes.

This talk presents a landed prototype that separates these concerns via Single Mesh Dim Strategies: each operator specifies valid placement combinations for one mesh dimension, while infra expands/composes them across the full mesh and selects low‑cost strategies. For contributors, this provides a clear path to refactor existing op_strategies into single‑dim rules that are easier to review and extend. We also introduce a Truth Table‑style sharding validator that systematically tests shapes and sharding specs to check soundness/completeness and to flag unnecessary redistribution/collectives caused by missing cases.

The goal of this presentation is faster, higher‑confidence contributions that improve correctness and expand DTensor operator coverage.
Speakers
avatar for Anshul Sinha

Anshul Sinha

Software Engineer, Meta
I graduated from the University of Michigan with a B.S in Computer Science in December 2024. I joined Meta's PyTorch Distributed as a SWE in June 2025.
Tuesday April 7, 2026 14:45 - 14:55 CEST
Founders Cafe
  Frameworks & Compilers

15:00 CEST

Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology
Tuesday April 7, 2026 15:00 - 15:10 CEST
Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures.
Moreover, existing tensor parallelism strategies typically replicate input data across all processes, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting, medical imaging, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization.
Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations, convolutions, linear layers, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP, FSDP, and Megatron-LM approaches.
Speakers
avatar for Deifilia Kieckhefen

Deifilia Kieckhefen

Doctoral Researcher, Karlsruhe Institute of Technology
Deifilia Kieckhefen is a doctoral researcher at the Karlsruhe Institute of Technology. She works on scalable and distributed training of neural network architectures.
Tuesday April 7, 2026 15:00 - 15:10 CEST
Founders Cafe
  Training Systems
  • Audience Level Any
  • Slides Attached Yes

15:40 CEST

Lightning Talk: Cross-Region Model Serving: PyTorch Inference, Observability & LLMOps - Suraj Muraleedharan, Amazon Web Services
Tuesday April 7, 2026 15:40 - 15:50 CEST
As PyTorch models move to production, organizations face a critical challenge: deploying, monitoring, and operating inference at scale across multiple regions. Single-region serving is well-understood, but multi-region LLMOps—model distribution, observability, failover, and cost management—remains ad-hoc and challenging for multiple customers.

This session presents production-tested architectures for multi-region PyTorch inference and LLMOps workflows. We cover:

Serving: Multi-region TorchServe/KServe on Kubernetes with latency-based routing, blue-green deployments, model versioning, and automated failover with circuit breakers.

Observability: OpenTelemetry distributed tracing, Prometheus/Grafana dashboards for latency, throughput, GPU utilization, and LLM-specific metrics like time-to-first-token and KV-cache hit rate.

LLMOps: CI/CD pipelines for cross-region model deployment with automated rollback, drift detection, and SLO-based alerting.

Attendees leave with serving architectures, dashboards, and deployment pipelines using open-source tooling.
Speakers
avatar for Suraj Muraleedharan

Suraj Muraleedharan

Principal Platform Engineer, Amazon Web Services
Principal Engineer driving technical strategy and building mission-critical foundational platforms for AI, HPC, and distributed systems, bridging the gap between infrastructure, AI research, and product organizations.
Tuesday April 7, 2026 15:40 - 15:50 CEST
Founders Cafe
  Inference & Production

15:55 CEST

Lightning Talk: Running ExecuTorch Applications With Silicon Acceleration, in Ultra-low Power - George Gekov, Arm; Aki Makkonen, Alif Semiconductor
Tuesday April 7, 2026 15:55 - 16:05 CEST
Efficient deployment of ML models on low-power embedded systems has been a significant challenge for a number of years. At the same time, these embedded SoCs are all around us—from everyday appliances to the latest smart glasses.

ExecuTorch is a PyTorch-native framework for deploying neural networks on resource-constrained systems. In this session, we show how to build an end-to-end speech recognition application using PyTorch and ExecuTorch—from training a Transformer-based neural network in PyTorch, through quantization, all the way to deployment on a low-power embedded device.

We will introduce the key ExecuTorch APIs for quantization and explain how models are transformed and lowered into a form that can run efficiently on a device. The application is running on the Alif Ensemble E8 SoC, the first implementation of the leading Arm® Ethos-U85 NPU which brings native support for Transformer models to the ultra-low power domain.

Join the experts from Arm and Alif Semiconductor to see how we are bridging the gap between PyTorch and embedded deployment—and how you can bring PyTorch models to silicon-accelerated, ultra-low-power systems.
Speakers
avatar for George Gekov

George Gekov

ML Engineer, Arm
George Gekov is a Staff Software Engineer in Arm’s Machine Learning team, where he focuses on machine learning inference on embedded systems. He has extensive experience deploying neural networks on resource-constrained devices with Neural Processing Units (NPUs) to enable hardware-accelerated... Read More →
avatar for Aki Makkonen

Aki Makkonen

Senior Staff Application Engineer, Alif Semiconductor
Software engineer with background in telecommunication, medical imaging, robotics and embedded systems.
Tuesday April 7, 2026 15:55 - 16:05 CEST
Founders Cafe

16:10 CEST

On-Device LLM Inference on Android With ExecuTorch and Qualcomm QNN - Shivay Lamba & Kartikey Rawat, Qualcomm
Tuesday April 7, 2026 16:10 - 16:35 CEST
Multimodal models like CLIP are typically deployed in the cloud due to their size and computational demands, limiting their use in latency-sensitive, privacy-preserving, and offline-first applications. This talk demonstrates how one can run fully on-device CLIP inference on Android using ExecuTorch with the Qualcomm QNN backend, enabling real-time vision–language understanding without server dependency.

One can run models like CLIP (ViT-B/32) model entirely on edge devices, leveraging QNN for hardware-accelerated inference. A key focus of the talk is a deep dive into ExecuTorch optimizations for QNN, including graph lowering, operator fusion, quantization strategies, memory planning, and backend-specific execution choices that materially impact latency, memory footprint, and power consumption.

The talk will cover architectural insights, model export and compilation workflows, and real-world benchmarks covering latency, memory usage, and power efficiency. This talk highlights how large multimodal PyTorch models can be made production-ready on edge devices, unlocking new classes of private, offline-capable AI applications.
Speakers
avatar for Shivay Lamba

Shivay Lamba

Senior ML Engineer, Qualcomm
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked at organizations like Amazon, EY, Genpact. He is a Tensorflow.JS SIG member and community lead from In... Read More →
avatar for Kartikey Rawat

Kartikey Rawat

Senior Developer Advocate, Qualcomm
Senior Developer Advocate at Qualcomm| Google Developer Expert in AI and Google Cloud
Tuesday April 7, 2026 16:10 - 16:35 CEST
Founders Cafe
  GenAI & Multimodal
  • Audience Level Any

16:40 CEST

Optimizing PyTorch on CPU-GPU Coherent Platforms - Matthias Jouanneaux, Nvidia
Tuesday April 7, 2026 16:40 - 17:05 CEST
In recent years, both Nvidia and AMD have introduced hardware coherent platforms: GH200, GB200 and MI300A. These coherent platforms provide both many new features and challenges for PyTorch applications attempting to make the most out of the platform.
This talk will focus on Nvidia's GB200 and walk through techniques to utilize the features of the coherent architecture in PyTorch, such as the high CPU-GPU interconnect bandwidth, unified memory, as well as the advantages and caveats of sharing system memory between CPU and GPU.
Speakers
avatar for Matthias Jouanneaux

Matthias Jouanneaux

Sr Software Engineer - PyTorch, NVIDIA
After his master’s degree, Matthias Jouanneaux worked at Konica Minolta's european research lab on medical image analysis using deep learning for 2 years.
He then joined Nvidia, focusing on optimizing application performance for Nvidia hardware as a Developer Technology enginee... Read More →
Tuesday April 7, 2026 16:40 - 17:05 CEST
Founders Cafe
  Frameworks & Compilers
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Level
  • Slides Attached
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -