Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 14:30 - 14:55 CEST


Post-training of LLMs with reinforcement learning is increasingly moving beyond static prompt–response pairs and preference optimization methods such as DPO, toward trajectory-based optimization. This talk focuses on the latest advances in multi-turn and multi-environment GRPO training, enabling LLMs to learn from interactive, agent-like experiences, including interacting with simulated environments, using tools, or completing multi-step reasoning tasks.

We highlight how TRL, as a PyTorch-native post-training framework, supports these workflows at scale. Multi-turn, multi-environment training can leverage simulated environments (i.e., coding, terminals, browsers) such as OpenEnv, while GRPO can also be applied to datasets for training LLMs on tool use or multi-step reasoning. Attendees will gain insights into design patterns, rollout handling, trajectory batching, and advantage computation, showing how robust, multi-turn, multi-environment post-training can improve alignment, reasoning, and generalization in LLMs for agentic applications.
Speakers
avatar for Kashif Rasul

Kashif Rasul

Research Scientist, Hugging Face
Kashif has a PhD. in Mathematics from the Freie Universität Berlin. He is passionate about high-performance computing, Reinforcement learning, and has presented at NVIDIA's GTC in 2009 and at StrangeLoop in 2012, and is also contributing to a number of data science and deep learning... Read More →
avatar for Sergio Paniego Blanco

Sergio Paniego Blanco

Machine Learning Engineer, Hugging Face
Sergio tiene una amplia trayectoria en el ámbito del código abierto y la inteligencia artificial, campo en el que también obtuvo su doctorado. Lleva más de ocho años participando en iniciativas como Google Summer of Code, donde ha contribuido como desarrollador y mentor. Actualmente... Read More →
Wednesday April 8, 2026 14:30 - 14:55 CEST
Founders Cafe
  Training Systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link