PyTorch Conference Europe 2026: Full Schedule

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

arrow_back View All Dates

10:05 CEST

Birds of A Feather: Disaggregated Tokenization: Building Toward Tokens-In-Tokens-Out LLM Inference - Maroon Ayoub, IBM Research; Hang Yin & Xi Ning Wang, Alibaba Cloud; Nili Guy, IBM; Hyunkyun Moon, Moreh

Wednesday April 8, 2026 10:05 - 10:35 CEST

Open Platform

LLMs are token-in, token-out - but our serving stacks aren't. Tokenization and preprocessing are still locked inside the inference engine, blocking the cache-aware routing and encode/prefill/decode (E/P/D) disaggregation that production deployments demand. To route smart, you need tokens before you reach the backend - and with multi-modal inputs requiring heavy encode-stage preprocessing, this is an architectural imperative, not just an optimization.

In llm-d, we learned this the hard way: three tokenization approaches, three gaps. We're now converging on disaggregated tokenization via vLLM's Renderer API as a gRPC sidecar, and collaborating with the Gateway API Inference Extension community to define the tokens-in-tokens-out interface. For multi-modal workloads, disaggregating preprocessing unlocks independent scaling of encode, prefill, and decode - each with different compute profiles.

Join us to discuss: How should we standardize tokenization and multi-modal preprocessing outside the engine? How does this shape E/P/D disaggregation? What are your pain points? We'll frame the problem from scheduling, vLLM, and gateway perspectives - then open the floor.

Speakers

Xi Ning Wang

Senior Technical Expert, Alibaba Cloud

Wang Xining, senior technical expert of Alibaba Cloud, focusing on MaaS/LLM, Kubernetes, service mesh and other advanced cloud native technical strategies. Previously worked in the IBM as tech architect focusing on SOA/Cloud and served as the chairman of the Patent Technology Review... Read More →

Hang Yin

Senior R&D Engineer, Alibaba Cloud

Hang Yin, senior engineer of Alibaba Cloud, focusing on Kubernetes, service mesh, Gateway API Inference Extension and other cloud native fields. Currently served in the Alibaba Cloud Container Service for Kubernetes (ACK) team, responsible for the developing of ACK Gateway with Inference... Read More →

Maroon Ayoub

Research Scientist & Architect, IBM Research

Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.

Nili Guy

IBM Research, IBM

Nili is a Research Manager and Senior Technical Staff Member at IBM Research, co-creator of llm-d, and an expert in distributed inference and Kubernetes-native AI systems. She has led key open-source and productized inference initiatives across IBM’s AI platforms.

hyunkyun moon

MLOps Engineer, Moreh

Hyunkyun Moon is an ML Platform Engineer at Moreh, focusing on building high-performance LLM inference platforms with llm-d. He is an active contributor to open-source projects, including llm-d and vLLM. With a strong background in large-scale Kubernetes-native infrastructure, he... Read More →

Wednesday April 8, 2026 10:05 - 10:35 CEST
Open Platform

Birds of A Feather

Audience Level Beginner

10:35 CEST

How To Write C++ Extensions in 2026 - Jane Xu, Meta & Mikayla Gawarecki, Meta

Wednesday April 8, 2026 10:35 - 11:00 CEST

Founders Cafe

Are you writing a C++ custom op extension to PyTorch? It's 2026 and are you still shipping M x N wheels for M CPython versions and N libtorch versions? Did you know you can just ship 1 wheel that works across multiple CPythons and libtorches? If you're curious how, attend this talk to get the deets on py_limited_api, APIs like torch::stable::Tensor & TORCH_TARGET_VERSION, and generally the latest and greatest ways for keeping your code and your release matrix simple. Get your custom kernel enrolling in new features with benefits proven out in FA3, xformers, torchao, torchaudio, and more in progress! We'll also share some of our vision towards smoother and faster custom ops extensions.

Speakers

Jane Xu

PyTorch SWE, Meta

Hi, I'm Jane! Please don't hesitate to come talk to me about your favorite optimizer, fitting models in GPU memory, how to free C++ extensions from libtorch version, and anything that interests you.

Mikayla Gawarecki

Software Engineer, Meta

Software Engineer on PyTorch

PTC EU 2026 How to Write C++ Extensions in 2026 pdf

Wednesday April 8, 2026 10:35 - 11:00 CEST
Founders Cafe

Frameworks & Compilers

Audience Level Beginner
Slides Attached Yes

11:20 CEST

Lightning Talk: Building AI That Ops Teams Actually Trust - Robert King, Chronosphere / Palo Alto Networks

Wednesday April 8, 2026 11:20 - 11:30 CEST

Junior Stage

You've built an AI that identifies root causes of incidents faster than any human could... but there's one problem, no one trusts it.

Ops teams are skeptical by nature. They've been burned by noisy alerts, black-box tools, and "intelligent" systems that weren't.
This talk covers what we learned building AI for incident response across enterprise environments: why technically correct recommendations get ignored, and how to design for skepticism from day one.

I'll share specific patterns that moved the needle:

- Validating agent responses before they reach users, catching hallucinations, weak reasoning, and overconfident outputs
- Explainability that fits the operator's mental model, not the data scientist's
- Feedback loops that improve the AI and build user trust simultaneously
- Rollout strategies that let teams build confidence gradually

Whether you're using LLMs, agents, or traditional ML for operational tasks, the trust problem is the same. Ship something wrong during an incident and you've lost your users for months.

You'll leave with a practical framework for validating AI outputs and building the kind of trust that gets recommendations acted on.

Speakers

Robert King

Senior Sales Engineer, Chronosphere

Robert is Lead Enterprise Solutions Engineer at Chronosphere and an OpenTelemetry contributor. He recently presented on AI Observability with OpenTelemetry at Cloud Native London https://www.youtube.com/live/qF4wz-pha1w?si=PFzjNcGkbD4pFKnA&t=625 and has spoken at AWS Summit, and other... Read More →

Building AI That Ops Teams Actually Trust pdf

Wednesday April 8, 2026 11:20 - 11:30 CEST
Junior Stage

Inference & Production

Audience Level Beginner
Slides Attached Yes

10:05 CEST

10:35 CEST

11:20 CEST

Get help with the event