The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
Are your Large Language Model (LLM) deployments stuck waiting for GPU kernels to compile? If you are running distributed inference at scale, your infrastructure is likely wasting time rebuilding the same GPU Kernel Cache for every single instance. You may not even realize the time and resources that are being consumed for rebuilding. This session is designed for platform engineers and ML practitioners who need to optimize inference scaling and reduce startup latency.
We will demonstrate how to eliminate redundant compilation by pre-distributing GPU kernel caches to all the inference nodes using KServe, a distributed model inference runtime for Kubernetes. Beyond just the "what," we will dive into the technical implementation of signing, verifying, and mounting cache images to ensure supply-chain security across clusters. Attendees will leave with a practical blueprint for reducing cold-start times and securing GPU-heavy workloads in production.
Billy McFall is a software engineer in the Emerging Tech Networking Team within the Office of the CTO at Red Hat for 9+ years. Billy previously worked on Kubernetes/OpenShift networking, including the integration of the NVIDIA DPU into OpenShift. Billy has also been a maintainer of... Read More →
Maryam is a Principal Engineer in Red Hat's Office of the CTO, where she focuses on standardising CPU inferencing performance evaluation to help effectively validate and scale ML workloads.