Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 15:25 - 15:35 CEST


In this talk, we'll cover how to use torchtitan to scale training of ultra-sparse mixture-of-experts models across over 2,000 GPUs. We'll walk through the pre-training of Trinity Large, a 400B mixture-of-experts model trained entirely using torchtitan, focusing on maximizing throughput and minimizing the impact of hardware induced failures. Along the way, we'll discuss challenges like fault tolerance, large-scale distributed training, and ensuring determinism - and how we've addressed each of these using torchtitan. Finally, we'll share insights and common pitfalls to avoid in your own large-scale training runs.
Speakers
avatar for Matej Sirovatka

Matej Sirovatka

Research Engineer, Prime Intellect
Research Engineer at Prime Intellect, mainly focusing on distributed training, performance and scaling.
Wednesday April 8, 2026 15:25 - 15:35 CEST
Founders Cafe
  Training Systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link