Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Wednesday April 8, 2026 11:20 - 11:30 CEST


Agentic AI workloads - tree-of-thought exploration, ReAct loops, hierarchical swarms - expose a fundamental mismatch in how we serve PyTorch models. Today's inference stacks treat the KV-cache as a flat, anonymous tensor buffer with blind LRU eviction. This ignores the structural reality of agents: system prompts are durable, tool definitions are shared, and reasoning scratchpads are ephemeral. We are currently evicting high-value state to preserve throwaway tokens.

In this talk, we present Semantic KV-Cache, an architectural evolution for llm-d and vLLM that replaces anonymous blocks with Typed State.

We demonstrate a runtime that tags blocks as SystemPrompt, ToolDefinition, or ReasoningBranch, applying differentiated policies to each: pinning foundational context, replicating shared tools, and eagerly evicting completed thoughts. We show how this "lifecycle-aware" caching reduces recomputation and minimizes the "Agentic Tax" - evolving the PyTorch serving stack from request-centric to workload-aware.
Speakers
avatar for Maroon Ayoub

Maroon Ayoub

Research Scientist & Architect, IBM Research
Maroon Ayoub is a systems engineer at IBM Research focused on distributed AI infrastructure. He co-leads development of llm-d and specializes in scaling LLM inference with Kubernetes-native architectures, performance efficiency, and open source integrations.
avatar for hyunkyun moon

hyunkyun moon

MLOps Engineer, Moreh
Hyunkyun Moon is an ML Platform Engineer at Moreh, focusing on building high-performance LLM inference platforms with llm-d. He is an active contributor to open-source projects, including llm-d and vLLM. With a strong background in large-scale Kubernetes-native infrastructure, he... Read More →
Wednesday April 8, 2026 11:20 - 11:30 CEST
Central Room

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link