Name: Lightning Talk: Slash LLM Cold-Start Times by Pre-distributing GPU Caches - Billy McFall & Maryam Tahhan, Red Hat
Start: 2026-04-08T13:45:00+0200
End: 2026-04-08T13:55:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Lightning Talk: Slash LLM Cold-Start Times by Pre-distributing GPU Caches - Billy McFall & Maryam Tahhan, Red Hat

Wednesday April 8, 2026 13:45 - 13:55 CEST

Central Room

Are your Large Language Model (LLM) deployments stuck waiting for GPU kernels to compile? If you are running distributed inference at scale, your infrastructure is likely wasting time rebuilding the same GPU Kernel Cache for every single instance. You may not even realize the time and resources that are being consumed for rebuilding. This session is designed for platform engineers and ML practitioners who need to optimize inference scaling and reduce startup latency.

We will demonstrate how to eliminate redundant compilation by pre-distributing GPU kernel caches to all the inference nodes using KServe, a distributed model inference runtime for Kubernetes. Beyond just the "what," we will dive into the technical implementation of signing, verifying, and mounting cache images to ensure supply-chain security across clusters. Attendees will leave with a practical blueprint for reducing cold-start times and securing GPU-heavy workloads in production.

Speakers

Billy McFall

Sr. Principal Software Engineer, Red Hat

Billy McFall is a software engineer in the Emerging Tech Networking Team within the Office of the CTO at Red Hat for 9+ years. Billy previously worked on Kubernetes/OpenShift networking, including the integration of the NVIDIA DPU into OpenShift. Billy has also been a maintainer of... Read More →

Maryam Tahhan

Principal Engineer, Red Hat

Maryam is a Principal Engineer in Red Hat's Office of the CTO, where she focuses on standardising CPU inferencing performance evaluation to help effectively validate and scale ML workloads.

Slash LLM Cold Start Times by Pre distributing GPU Caches pdf

Wednesday April 8, 2026 13:45 - 13:55 CEST
Central Room

Inference & Production

Audience Level Intermediate
Slides Attached Yes

PyTorch Conference Europe 2026

Billy McFall

Maryam Tahhan

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event