The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
Making your GPUs go brrr is complex. Efficient LLM inference requires navigating a maze of optimization techniques each with different trade-offs. This session provides a practical journey through inference optimizations, clearly categorized by implementation effort.
We'll explore techniques across three levels:
- Model choices (start here): Model selection, quantization, smart routing
The session covers practical trade-offs and key metrics: time to first token, inter-token latency, throughput, and cost per token.
Whether deploying your first model or optimizing at scale, this talk delivers actionable insights into which techniques to prioritize for deeper investigation.
Global Black Belt Solution Engineer AI Infrastructure, Microsoft
Christin Pohl is a Global Black Belt Solution Engineer for AI Infrastructure at Microsoft (Switzerland), now in her third year. After building her first chatbot in 2018 and 5+ years at SAP, she helps enterprises worldwide choose the right GPU, run LLM training and inference end-to-end... Read More →
Tuesday April 7, 2026 11:30 - 11:55 CEST Master Stage