Name: Lightning Talk: Scaling Recommendation Systems To 2K GPUs and Beyond - Zain Huda, Meta
Start: 2026-04-08T14:15:00+0200
End: 2026-04-08T14:25:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Lightning Talk: Scaling Recommendation Systems To 2K GPUs and Beyond - Zain Huda, Meta

Wednesday April 8, 2026 14:15 - 14:25 CEST

Founders Cafe

TLDR: In this session, we go over one of the key technologies to Ads model scaling at Meta, 2D sparse parallelism. Which scales sparse recommendation embedding tables beyond 1k GPUs to 8k GPUs - enabling the largest Ads model training runs in production at Meta.

Scaling Laws have dominated LLMs and shown the industry we can achieve better model performance through scaling. The same scaling law can be applied to recommendation systems. However, the path to scaling recommender systems is not the same. The leap from hundreds to thousands of GPUs introduces complex technical challenges, particularly around handling sparse operations in recommendation models.

In this talk, we will detail the development of 2D sparse parallelism, tracing its path from research to production to address sparse scaling challenges. We will demonstrate how we optimize these systems to push performance boundaries, increasing speed and reducing memory at scale. Participants will walk away with lessons learned from designing 1,000+ GPU scale systems, and a deeper understanding of how to implement these solutions efficiently in production.

Speakers

Zain Huda

Software Engineer, Meta

Zain works on large scale training systems for recommender systems at Meta. He works on TorchRec, a library for distributed parallelism for sparse recommender models. He is also one of the authors of 2D sparse parallelism.

Wednesday April 8, 2026 14:15 - 14:25 CEST
Founders Cafe

Training Systems

Audience Level Intermediate

PyTorch Conference Europe 2026

Zain Huda

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event