The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
As PyTorch models move to production, organizations face a critical challenge: deploying, monitoring, and operating inference at scale across multiple regions. Single-region serving is well-understood, but multi-region LLMOps—model distribution, observability, failover, and cost management—remains ad-hoc and challenging for multiple customers.
This session presents production-tested architectures for multi-region PyTorch inference and LLMOps workflows. We cover:
Serving: Multi-region TorchServe/KServe on Kubernetes with latency-based routing, blue-green deployments, model versioning, and automated failover with circuit breakers.
Observability: OpenTelemetry distributed tracing, Prometheus/Grafana dashboards for latency, throughput, GPU utilization, and LLM-specific metrics like time-to-first-token and KV-cache hit rate.
LLMOps: CI/CD pipelines for cross-region model deployment with automated rollback, drift detection, and SLO-based alerting.
Attendees leave with serving architectures, dashboards, and deployment pipelines using open-source tooling.
Principal Engineer driving technical strategy and building mission-critical foundational platforms for AI, HPC, and distributed systems, bridging the gap between infrastructure, AI research, and product organizations.