Name: From Responses To Trajectories: Multi-Turn and Multi-Environment Reinforcement Learning - Kashif Rasul & Sergio Paniego Blanco, Hugging Face
Start: 2026-04-08T14:30:00+0200
End: 2026-04-08T14:55:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

From Responses To Trajectories: Multi-Turn and Multi-Environment Reinforcement Learning - Kashif Rasul & Sergio Paniego Blanco, Hugging Face

Wednesday April 8, 2026 14:30 - 14:55 CEST

Founders Cafe

Post-training of LLMs with reinforcement learning is increasingly moving beyond static prompt–response pairs and preference optimization methods such as DPO, toward trajectory-based optimization. This talk focuses on the latest advances in multi-turn and multi-environment GRPO training, enabling LLMs to learn from interactive, agent-like experiences, including interacting with simulated environments, using tools, or completing multi-step reasoning tasks.

We highlight how TRL, as a PyTorch-native post-training framework, supports these workflows at scale. Multi-turn, multi-environment training can leverage simulated environments (i.e., coding, terminals, browsers) such as OpenEnv, while GRPO can also be applied to datasets for training LLMs on tool use or multi-step reasoning. Attendees will gain insights into design patterns, rollout handling, trajectory batching, and advantage computation, showing how robust, multi-turn, multi-environment post-training can improve alignment, reasoning, and generalization in LLMs for agentic applications.

Speakers

Kashif Rasul

Research Scientist, Hugging Face

Kashif has a PhD. in Mathematics from the Freie Universität Berlin. He is passionate about high-performance computing, Reinforcement learning, and has presented at NVIDIA's GTC in 2009 and at StrangeLoop in 2012, and is also contributing to a number of data science and deep learning... Read More →

Sergio Paniego Blanco

Machine Learning Engineer, Hugging Face

Sergio tiene una amplia trayectoria en el ámbito del código abierto y la inteligencia artificial, campo en el que también obtuvo su doctorado. Lleva más de ocho años participando en iniciativas como Google Summer of Code, donde ha contribuido como desarrollador y mentor. Actualmente... Read More →

From Responses To Trajectories Multi Turn and Multi Environment Reinforcement Learning pdf

Wednesday April 8, 2026 14:30 - 14:55 CEST
Founders Cafe

Training Systems

Audience Level Intermediate
Slides Attached Yes

PyTorch Conference Europe 2026

Kashif Rasul

Sergio Paniego Blanco

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event