Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 15:55 - 16:20 CEST


The DeepSeek-V3 paper describes 5D parallelism and DualPipe at a high level, but leaves critical implementation details undocumented. This session presents our open-source PyTorch reference implementation that fills those gaps - verified against the original architecture and designed for learning and extension.

We'll share what we discovered building it from scratch:
Why K_pe is shared across heads in decoupled RoPE (not explicit in paper)
The critical timing of bias updates in auxiliary-loss-free load balancing
How sigmoid routing separates selection scores from gate values
The warmup formula that makes DualPipe achieve 3% bubble overhead
Bugs we caught: causal mask position offsets, EMA initialization, capacity dropping priority

What you'll learn:

5D Parallelism: How TP, PP, DP, EP, and SP interact at 2,048+ GPU scale
DualPipe: Building the bidirectional scheduler with 55% throughput gain over GPipe
Hierarchical All-to-All: Two-level communication reducing MoE dispatch overhead by 4x
Teachable abstractions: CapacityMetrics, ExpertSpecializationTracker, ScheduleStep enums

Prerequisites: torch.distributed basics.
Code: github.com/DevJadhav/deepseek-from-scratch
Speakers
avatar for Dev Jadhav

Dev Jadhav

Tech Lead ML Engineer, ING Bank
Dev Jadhav is a production AI/ML engineer with 10+ years building AI
systems at scale. He currently leads ML engineering at Major Bank,
developing financial-grade AI and large-scale model operations. Dev is
the creator of DeepSeek From Scratch, an open-source implementation of
DeepSe... Read More →
Wednesday April 8, 2026 15:55 - 16:20 CEST
Founders Cafe
  Training Systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link