Name: Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang, Google
Start: 2026-04-08T13:30:00+0200
End: 2026-04-08T13:40:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Lightning Talk: From Hugging Face To Handheld: Scaling LLM Deployment With LiteRT Generative API - Cormac Brick & Weiyi Wang, Google

Wednesday April 8, 2026 13:30 - 13:40 CEST

Central Room

This session will demonstrate the E2E journey of bringing custom PyTorch-based Open Source LLMs on cross platform devices using LiteRT. We will show developers how to take a custom Hugging Face Transformers checkpoint and convert them for on-device execution, including:
-Taking the Pytorch model from conversion to deployment.
-Automated Optimization: How LiteRT performs automated patching of performance-critical components, including architecture-specific rewrites for PyTorch models.
-Seamless Fine-Tuning Integration: How to move from an Unsloth fine-tuning session to a TorchAO-quantized model and LiteRT export without leaving your script.
-The "0-Day" Enablement Strategy: Well-known architectures are supported out-of-the-box. We’ll share how we enabled the QWEN0.6 (or Liquid AI) model in just 20 minutes.
-Interactive Validation: Run inference on the exported model directly in the Terminal or Colab to verify numerical correctness before deploying to device.
This workflow shows a smooth fine-tune-to-deployment story where everything stays within the original PyTorch/Hugging Face ecosystem. Viewers can "vibe code" along using Gemini CLI or other coding agents.

Speakers

Cormac Brick

Principal Engineer, Google AI Edge, Google

Cormac Brick is a Principal Engineer on the Google AI Edge team, where he specializes in frameworks and on-device AI. He has over 10 years experience in AI software, silicon and systems, with work spanning AI frameworks and ecosystems and compilers down to silicon microarchitecture... Read More →

Weiyi Wang

Software Engineer, Google

Weiyi Is lead software engineer on LiteRT/TFLite, focusing on compiler, NPU and GenAI stack.

Wednesday April 8, 2026 13:30 - 13:40 CEST
Central Room

Inference & Production

Audience Level Intermediate

PyTorch Conference Europe 2026

Cormac Brick

Weiyi Wang

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event