Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Tuesday April 7, 2026 16:10 - 16:35 CEST


Asynchronous Reinforcement Learning (AsyncRL) workloads have unique data sharing requirements: actors must efficiently exchange large tensors across processes and nodes, often with different sharding configurations—not just at checkpoint time, but continuously during training for live weight synchronization. This talk presents Torchstore, an open-source distributed tensor storage system built on Monarch actors that tackles these challenges. We'll share the key lessons learned—from designing pluggable transport backends (RDMA, shared memory, RPC) to implementing transparent live DTensor resharding that lets producers and consumers use entirely different parallelism strategies. We'll also discuss the friction we encountered integrating with inference engines like vLLM, where differing model definitions and integrations present new bottlenecks. Whether you're building actor-based training systems or thinking about disaggregated training-inference architectures, you'll leave with practical insights on distributed tensor storage design.
Speakers
avatar for Lucas Pasqualin

Lucas Pasqualin

ML Engineer, PyTorch (Meta)
Lucas has been developing Machine Learning Applications and Machine Learning infrastructure at scale for years, and has recently been focused on extending the product offering of PyTorch's Distributed Checkpointing stack.
AW

Allen Wang

Software Engineer, Meta
avatar for Danielle Pintz

Danielle Pintz

Software Engineer, Meta
Danielle is a software engineer working on PyTorch, currently focused on TorchStore and Async RL. She previously worked on the Llama Research team.
avatar for Amir Afzali

Amir Afzali

Software Engineer, Meta
Software engineer working on Pytorch distributed infra and large scale training
Tuesday April 7, 2026 16:10 - 16:35 CEST
Master Stage

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link