Name: Birds of A Feather: Disaggregated Tokenization: Building Toward Tokens-In-Tokens-Out LLM Inference - Maroon Ayoub, IBM Research; Hang Yin & Xi Ning Wang, Alibaba Cloud; Nili Guy, IBM; Hyunkyun Moon, Moreh
Start: 2026-04-08T10:05:00+0200
End: 2026-04-08T10:35:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Birds of A Feather: Disaggregated Tokenization: Building Toward Tokens-In-Tokens-Out LLM Inference - Maroon Ayoub, IBM Research; Hang Yin & Xi Ning Wang, Alibaba Cloud; Nili Guy, IBM; Hyunkyun Moon, Moreh

Wednesday April 8, 2026 10:05 - 10:35 CEST

Open Platform

LLMs are token-in, token-out - but our serving stacks aren't. Tokenization and preprocessing are still locked inside the inference engine, blocking the cache-aware routing and encode/prefill/decode (E/P/D) disaggregation that production deployments demand. To route smart, you need tokens before you reach the backend - and with multi-modal inputs requiring heavy encode-stage preprocessing, this is an architectural imperative, not just an optimization.

In llm-d, we learned this the hard way: three tokenization approaches, three gaps. We're now converging on disaggregated tokenization via vLLM's Renderer API as a gRPC sidecar, and collaborating with the Gateway API Inference Extension community to define the tokens-in-tokens-out interface. For multi-modal workloads, disaggregating preprocessing unlocks independent scaling of encode, prefill, and decode - each with different compute profiles.

Join us to discuss: How should we standardize tokenization and multi-modal preprocessing outside the engine? How does this shape E/P/D disaggregation? What are your pain points? We'll frame the problem from scheduling, vLLM, and gateway perspectives - then open the floor.

Speakers

Xi Ning Wang

Senior Technical Expert, Alibaba Cloud

Wang Xining, senior technical expert of Alibaba Cloud, focusing on MaaS/LLM, Kubernetes, service mesh and other advanced cloud native technical strategies. Previously worked in the IBM as tech architect focusing on SOA/Cloud and served as the chairman of the Patent Technology Review... Read More →

Hang Yin

Senior R&D Engineer, Alibaba Cloud

Hang Yin, senior engineer of Alibaba Cloud, focusing on Kubernetes, service mesh, Gateway API Inference Extension and other cloud native fields. Currently served in the Alibaba Cloud Container Service for Kubernetes (ACK) team, responsible for the developing of ACK Gateway with Inference... Read More →

Maroon Ayoub

Research Scientist & Architect, IBM Research

Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.

Nili Guy

IBM Research, IBM

Nili is a Research Manager and Senior Technical Staff Member at IBM Research, co-creator of llm-d, and an expert in distributed inference and Kubernetes-native AI systems. She has led key open-source and productized inference initiatives across IBM’s AI platforms.

hyunkyun moon

MLOps Engineer, Moreh

Hyunkyun Moon is an ML Platform Engineer at Moreh, focusing on building high-performance LLM inference platforms with llm-d. He is an active contributor to open-source projects, including llm-d and vLLM. With a strong background in large-scale Kubernetes-native infrastructure, he... Read More →

Wednesday April 8, 2026 10:05 - 10:35 CEST
Open Platform

Birds of A Feather

Audience Level Beginner

PyTorch Conference Europe 2026

Xi Ning Wang

Hang Yin

Maroon Ayoub

Nili Guy

hyunkyun moon

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event