Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Tuesday April 7, 2026 16:10 - 16:35 CEST


This talk will dive into how we implemented and optimized reinforcement learning on trillion-parameter Mixture-of-Experts reasoning models using veRL, Megatron-Bridge and vLLM. The session is useful to anyone building large-scale RL training systems.

For the first part, I will walk through the system design required to make RL work at this scale using LoRA: how LoRA adapters are implemented for expert layers, how adapters are sharded and fused under tensor/pipeline/expert parallelism, and most importantly, how refit (parameter sync) is implemented for LoRA between training backend (Megatron) and rollout engine (vLLM).

The second part of the talk focuses on training–inference mismatch in MoE RL. I will explain why common mitigations such as clipping and importance sampling can fail, and how we implement fixed Router Replay R3 across vLLM, veRL, and Megatron to align routing decisions between rollout and training.

These works are done together with Mind Lab and some of the related blog posts are at:
- https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- https://macaron.im/mindlab/research/router-replay-r3-why-it-failed-and-how-we-fixed-it
Speakers
avatar for Songlin Jiang

Songlin Jiang

Doctoral Researcher, Aalto University & Mind Lab
I am a doctoral researcher at Aalto University, focusing on reducing training and inference latency for Reinforcement Learning and Large Language Models (LLMs) on High-Performance Computing (HPC) clusters. I am also a passionate free software developer, a maintainer of VeRL, and a... Read More →
Tuesday April 7, 2026 16:10 - 16:35 CEST
Junior Stage
  Training Systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link