Loading…
7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Tuesday April 7, 2026 10:15 - 10:25 CEST


The discipline of evaluating large language models underwent a major transformation with the rise of general AI capabilities. Today, the field is undergoing yet another challenging transformation following the groundbreaking improvements in agentic tasks, which expect models and systems to plan and take autonomous actions in the real world. Measuring how well models and systems perform in such tasks is however still i) fragile from a methodological perspective, and ii) difficult to scale and generalize across different domains. This talk will first discuss common challenges in reproducing agentic evaluations, including differences in reference implementation, error handling, trajectory post processing, and tooling definitions. Next, it will cover infrastructural requirements that need to be addressed for such evaluations to run efficiently at scale. Finally, we will conclude with a set of (still nascent) best practices that can help alleviate “lightness” and build more consistent measurement pipelines.
Speakers
avatar for Besmira Nushi

Besmira Nushi

Senior Manager - AI Research, NVIDIA
Besmira Nushi is a Senior AI Research Manager at NVIDIA in Zurich, where she leads research on LLM evaluation, model analysis and generalization, and real-world and agentic AI system measurements. Previously, she spent 7+ years at Microsoft Research advancing responsible AI, model... Read More →
Tuesday April 7, 2026 10:15 - 10:25 CEST
Master Stage
  Keynote Sessions
  • Audience Level Any

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link