Name: The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA
Start: 2026-04-07T14:45:00+0200
End: 2026-04-07T15:10:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA

Tuesday April 7, 2026 14:45 - 15:10 CEST

Central Room

Rapid advances in AI have expanded the range of capabilities required for successful real-world deployment. Understanding where we are in this multi-dimensional frontier is essential for accelerating innovation through effective quality assurance. Rigorous evaluation is increasingly difficult to scale as development requires testing many checkpoints across numerous benchmarks. Model comparison is further complicated by limited transparency of reported results. This talk explores challenges, best practices, and open-source tools that elevate evaluation to a core component of LLM development, delivering continuous signals across the model lifecycle.
We discuss principles for standardizing evaluation methods and improving consistency through practical patterns and anti-patterns, and examples of integrating the science of evaluation directly into model development. Using Nemo-Evaluator, an open-source scalable evaluation tool, we demonstrate modular architectures that enable transparent, reproducible measurement. Finally, we show how Nemo-Evaluator supports reproducible evaluation for the Nemotron model family, helping enable one of the most open development processes in modern AI.

Speakers

Grzegorz Chlebus

Manager R&D, NVIDIA

Grzegorz Chlebus is a Manager at Frontier Model Evaluation at NVIDIA, where he leads tooling and infrastructure efforts for evaluating frontier AI models. He holds a PhD in Medical Sciences from Radboud University Nijmegen, focused on deep learning-based medical image segmentation... Read More →

GChlebus The Science and Practice of Open and Scalable LLM Evaluations pdf

Tuesday April 7, 2026 14:45 - 15:10 CEST
Central Room

GenAI & Multimodal

Audience Level Intermediate
Slides Attached Yes

PyTorch Conference Europe 2026

Grzegorz Chlebus

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event