PyTorch Conference Europe 2026: Full Schedule

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

arrow_back View All Dates

11:00 CEST

Lightning Talk: Training Embedding Model Resiliently for Multimodal Model Inference Routing - Huamin Chen, Red Hat & Haichen Zhang, AMD

Tuesday April 7, 2026 11:00 - 11:10 CEST

Junior Stage

LLM systems increasingly rely on intelligent routing to balance cost, latency, and quality tradeoffs. The vLLM Semantic Router, a vLLM Ecosystem project, provides both semantic and performance level routing intelligence for Mixture-of-Multimodal Models (MoM) architectures, but its effectiveness depends on fast and accurate classifiers.

This talk presents our end-to-end journey training production-grade embedding and classification models on AMD GPUs using native PyTorch, achieving high GPU utilization with distributed training optimizations.

We introduce a multilingual text embedding model with 32K context window and 2D Matryoshka support, and multimodal embedding models, trained on AMD GPUs using PyTorch DDP. The talk covers practical training optimizations for AMD ROCm. All training code uses native PyTorch distributed primitives, with additional enhancement to improve training stability and pipeline efficiency.

Attendees will learn how to train efficient classifiers for LLM routing systems and integrate these models into production inference pipelines.

Speakers

Huamin Chen

Technical Advisor, Microsoft

Dr. Huamin Chen is a passionate developer. He co-founded the Semantic Router project under vLLM community. His recent contributions to the CNCF ecosystem include Project Kepler, TAG Environmental Sustainability, and Cloud Native AI WG. He is also one of the founding members... Read More →

Haichen Zhang

Senior AI Software Engineer, AMD

Haichen is the Senior AI Engineer for AMD AI Group, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Before joining AMD, Haichen worked at... Read More →

Tuesday April 7, 2026 11:00 - 11:10 CEST
Junior Stage

Training Systems

Audience Level Intermediate

11:45 CEST

Lightning Talk: TorchJD: Jacobian Descent in PyTorch - Pierre Quinton, EPFL & Valérian Rey, Simplex Lab

Tuesday April 7, 2026 11:45 - 11:55 CEST

Founders Cafe

Jacobian descent (JD) is an extension of gradient descent supporting the optimization of vector-valued functions. This algorithm can be used to train neural networks with multiple loss functions (e.g. multi-task learning). JD iteratively updates the parameters of the model using the Jacobian matrix of the vector of losses (the matrix stacking each individual loss' gradient).

To support and extend our research, we have developed the TorchJD library. With it, it's easy and efficient to compute the Jacobians with respect to the model parameters, and to aggregate them into an update direction that is beneficial to every objective. In contrast, if we had averaged the losses and used gradient descent, the update would have been beneficial to the average loss, but may have actually increased one of the individual losses.

In this session, we will give a quick introduction to the theory behind Jacobian descent, and then show how to use TorchJD on a variety of use-cases, beyond multi-task learning.

Library: https://github.com/TorchJD/torchjd
Paper: https://arxiv.org/abs/2406.16232

Speakers

Pierre Quinton

Teacher, EPFL

PhD in Information Theory and Master in Data Science, specializing in fundamental math and multi-objective optimization (MOO). I am the co-author of TorchJD, a PyTorch library for Jacobian Descent developed with Valerian, currently at ~300 GitHub stars. My work aims to translate complex... Read More →

Valérian Rey

Research Engineer, Simplex Lab

I graduated from EPFL with a MSc in Data Science in 2021. Since then, I worked as a Data Scientist as Withings, and I worked on Jacobian descent, initially as a side-project, but now as a full-time occupation. I now spend most of my time developing and maintaining TorchJD, and I love... Read More →

TorchJD pdf

Tuesday April 7, 2026 11:45 - 11:55 CEST
Founders Cafe

Training Systems

Audience Level Intermediate
Slides Attached Yes

12:00 CEST

Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy - Ankita Luthra & Trinadh Kotturu, Google

Tuesday April 7, 2026 12:00 - 12:10 CEST

Master Stage

As PyTorch models scale to billions of parameters, the bottleneck has quietly shifted from compute to storage. Modern GPU clusters often sit idle, "starving" for data while waiting on legacy REST-based protocols. This talk introduces Rapid Storage: a fundamental architectural shift bringing Google’s Colossus stateful protocol (that powers many Google’s products) to PyTorch via fsspec , a common Pythonic file interface used by many frameworks within PyTorch ecosystem.
By bypassing REST APIs entirely via persistent gRPC streams to the storage layer, we eliminate protocol overhead. In this talk, we also dive into how Rapid achieves <1ms random read/write latency, 20x faster data access, and a massive 6 TB/s of aggregate throughput. Crucially, it delivers up to 10x lower tail latency for random I/O, preventing the stragglers that often stall distributed training jobs.
Beyond raw speed, we will deconstruct the integration with gcsfs and the broader fsspec ecosystem. This ensures that high-performance I/O is available across the entire data stack including Dask, Ray, HF Datasets and vLLM etc. Join us to learn how to stop wasting GPU cycles and achieve linear scaling in the cloud.

Speakers

Ankita Luthra

Senior Software Engineer, Google

Ankita Luthra is a Software Developer at Google, focused on AI/ML infrastructure and scalable data pipelines. Her work with open-source tools like fsspec(gcsfs) and gcsfuse improves how frameworks such as PyTorch/ JAX efficiently access data from Google Cloud Storage.

Trinadh Kotturu

Senior Product Manager, Google

Trinadh Kotturu is a Senior Product Manager specializing in AI/ML and analytics client strategy at Google. An alumnus of IIM Bangalore with 12 years of experience, he has a proven track record of shipping v1 products and scaling them into robust platform services. His expertise spans large-scale distributed storage systems, autonomous driving, and system resiliency... Read More →

Bringing Rapid Buckets to Pytorch pdf

Tuesday April 7, 2026 12:00 - 12:10 CEST
Master Stage

Training Systems

Audience Level Any
Slides Attached Yes

15:00 CEST

Lightning Talk: Jigsaw: Domain and Tensor Parallelism for High-Resolution Input Training - Deifilia Kieckhefen, Karlsruhe Institute of Technology

Tuesday April 7, 2026 15:00 - 15:10 CEST

Founders Cafe

Distributed neural network training frameworks typically optimize for specific architectures while minimizing communication overhead. Transformer layers can be efficiently parallelized, but other operations such as convolutions often remain inefficient. This creates bottlenecks for complex model architectures.
Moreover, existing tensor parallelism strategies typically replicate input data across all processes, creating redundant I/O that scales poorly with input size. In applications with heavy I/O demands-weather forecasting, medical imaging, or video processing-unsharded input data creates additional data-loading bottlenecks that could benefit from parallelization.
Jigsaw is a PyTorch library that shards both model weights and input data across parallel processes. It maintains a PyTorch-like interface while parallelizing activations, convolutions, linear layers, and attention through a distributed matrix multiplication backend. We demonstrate the usability of Jigsaw across a wide range of model architectures and shows performance when scaling multi-billion-parameter models sharded across up to 8 processes and compares the scalability to DDP, FSDP, and Megatron-LM approaches.

Speakers

Deifilia Kieckhefen

Doctoral Researcher, Karlsruhe Institute of Technology

Deifilia Kieckhefen is a doctoral researcher at the Karlsruhe Institute of Technology. She works on scalable and distributed training of neural network architectures.

PyTorchConf jigsaw Kieckhefen pdf

Tuesday April 7, 2026 15:00 - 15:10 CEST
Founders Cafe

Training Systems

Audience Level Any
Slides Attached Yes

16:10 CEST

Optimizing Reinforcement Learning at Trillion-Parameter Scale - Songlin Jiang, Aalto University & Mind Lab

Tuesday April 7, 2026 16:10 - 16:35 CEST

Junior Stage

This talk will dive into how we implemented and optimized reinforcement learning on trillion-parameter Mixture-of-Experts reasoning models using veRL, Megatron-Bridge and vLLM. The session is useful to anyone building large-scale RL training systems.

For the first part, I will walk through the system design required to make RL work at this scale using LoRA: how LoRA adapters are implemented for expert layers, how adapters are sharded and fused under tensor/pipeline/expert parallelism, and most importantly, how refit (parameter sync) is implemented for LoRA between training backend (Megatron) and rollout engine (vLLM).

The second part of the talk focuses on training–inference mismatch in MoE RL. I will explain why common mitigations such as clipping and importance sampling can fail, and how we implement fixed Router Replay R3 across vLLM, veRL, and Megatron to align routing decisions between rollout and training.

These works are done together with Mind Lab and some of the related blog posts are at:
- https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- https://macaron.im/mindlab/research/router-replay-r3-why-it-failed-and-how-we-fixed-it

Speakers

Songlin Jiang

Doctoral Researcher, Aalto University & Mind Lab

I am a doctoral researcher at Aalto University, focusing on reducing training and inference latency for Reinforcement Learning and Large Language Models (LLMs) on High-Performance Computing (HPC) clusters. I am also a passionate free software developer, a maintainer of VeRL, and a... Read More →

1T RL PTC EU26 Songlin pdf

Tuesday April 7, 2026 16:10 - 16:35 CEST
Junior Stage

Training Systems

Audience Level Intermediate
Slides Attached Yes

16:10 CEST

TorchStore: What We Learned Building Distributed Storage Solutions for AysncRL - Lucas Pasqualin, Danielle Pintz, Allen Wang, Amir Afzail Meta

Tuesday April 7, 2026 16:10 - 16:35 CEST

Master Stage

Asynchronous Reinforcement Learning (AsyncRL) workloads have unique data sharing requirements: actors must efficiently exchange large tensors across processes and nodes, often with different sharding configurations—not just at checkpoint time, but continuously during training for live weight synchronization. This talk presents Torchstore, an open-source distributed tensor storage system built on Monarch actors that tackles these challenges. We'll share the key lessons learned—from designing pluggable transport backends (RDMA, shared memory, RPC) to implementing transparent live DTensor resharding that lets producers and consumers use entirely different parallelism strategies. We'll also discuss the friction we encountered integrating with inference engines like vLLM, where differing model definitions and integrations present new bottlenecks. Whether you're building actor-based training systems or thinking about disaggregated training-inference architectures, you'll leave with practical insights on distributed tensor storage design.

Speakers

Tuesday April 7, 2026 16:10 - 16:35 CEST
Master Stage

Training Systems

Audience Level Intermediate

PyTorch Conference Europe 2026

11:00 CEST

Huamin Chen

Haichen Zhang

11:45 CEST

Pierre Quinton

Valérian Rey

12:00 CEST

Ankita Luthra

Trinadh Kotturu

15:00 CEST

Deifilia Kieckhefen

16:10 CEST

Songlin Jiang

16:10 CEST

Lucas Pasqualin

Allen Wang

Danielle Pintz

Amir Afzali

Get help with the event