The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
TLDR: In this session, we go over one of the key technologies to Ads model scaling at Meta, 2D sparse parallelism. Which scales sparse recommendation embedding tables beyond 1k GPUs to 8k GPUs - enabling the largest Ads model training runs in production at Meta.
Scaling Laws have dominated LLMs and shown the industry we can achieve better model performance through scaling. The same scaling law can be applied to recommendation systems. However, the path to scaling recommender systems is not the same. The leap from hundreds to thousands of GPUs introduces complex technical challenges, particularly around handling sparse operations in recommendation models.
In this talk, we will detail the development of 2D sparse parallelism, tracing its path from research to production to address sparse scaling challenges. We will demonstrate how we optimize these systems to push performance boundaries, increasing speed and reducing memory at scale. Participants will walk away with lessons learned from designing 1,000+ GPU scale systems, and a deeper understanding of how to implement these solutions efficiently in production.
Zain works on large scale training systems for recommender systems at Meta. He works on TorchRec, a library for distributed parallelism for sparse recommender models. He is also one of the authors of 2D sparse parallelism.
Wednesday April 8, 2026 14:15 - 14:25 CEST Founders Cafe