Name: Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, Microsoft
Start: 2026-04-07T11:30:00+0200
End: 2026-04-07T11:55:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, Microsoft

Tuesday April 7, 2026 11:30 - 11:55 CEST

Master Stage

Making your GPUs go brrr is complex. Efficient LLM inference requires navigating a maze of optimization techniques each with different trade-offs. This session provides a practical journey through inference optimizations, clearly categorized by implementation effort.

We'll explore techniques across three levels:

- Model choices (start here): Model selection, quantization, smart routing

- Library-level improvements (using PyTorch-based frameworks like vLLM, SGLang, TensorRT-LLM): Continuous batching, KV-cache management, tensor parallelism

- Custom implementations: Speculative decoding with custom draft heads, disaggregated inference, fine-tuning smaller models

The session covers practical trade-offs and key metrics: time to first token, inter-token latency, throughput, and cost per token.

Whether deploying your first model or optimizing at scale, this talk delivers actionable insights into which techniques to prioritize for deeper investigation.

Speakers

Christin Pohl

Global Black Belt Solution Engineer AI Infrastructure, Microsoft

Christin Pohl is a Global Black Belt Solution Engineer for AI Infrastructure at Microsoft (Switzerland), now in her third year. After building her first chatbot in 2018 and 5+ years at SAP, she helps enterprises worldwide choose the right GPU, run LLM training and inference end-to-end... Read More →

Tuesday April 7, 2026 11:30 - 11:55 CEST
Master Stage

Inference & Production

Audience Level Intermediate

PyTorch Conference Europe 2026

Christin Pohl

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event