Name: Lightning Talk: Training Embedding Model Resiliently for Multimodal Model Inference Routing - Huamin Chen, Red Hat & Haichen Zhang, AMD
Start: 2026-04-07T11:00:00+0200
End: 2026-04-07T11:10:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Lightning Talk: Training Embedding Model Resiliently for Multimodal Model Inference Routing - Huamin Chen, Red Hat & Haichen Zhang, AMD

Tuesday April 7, 2026 11:00 - 11:10 CEST

Junior Stage

LLM systems increasingly rely on intelligent routing to balance cost, latency, and quality tradeoffs. The vLLM Semantic Router, a vLLM Ecosystem project, provides both semantic and performance level routing intelligence for Mixture-of-Multimodal Models (MoM) architectures, but its effectiveness depends on fast and accurate classifiers.

This talk presents our end-to-end journey training production-grade embedding and classification models on AMD GPUs using native PyTorch, achieving high GPU utilization with distributed training optimizations.

We introduce a multilingual text embedding model with 32K context window and 2D Matryoshka support, and multimodal embedding models, trained on AMD GPUs using PyTorch DDP. The talk covers practical training optimizations for AMD ROCm. All training code uses native PyTorch distributed primitives, with additional enhancement to improve training stability and pipeline efficiency.

Attendees will learn how to train efficient classifiers for LLM routing systems and integrate these models into production inference pipelines.

Speakers

Huamin Chen

Technical Advisor, Microsoft

Dr. Huamin Chen is a passionate developer. He co-founded the Semantic Router project under vLLM community. His recent contributions to the CNCF ecosystem include Project Kepler, TAG Environmental Sustainability, and Cloud Native AI WG. He is also one of the founding members... Read More →

Haichen Zhang

Senior AI Software Engineer, AMD

Haichen is the Senior AI Engineer for AMD AI Group, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Before joining AMD, Haichen worked at... Read More →

Tuesday April 7, 2026 11:00 - 11:10 CEST
Junior Stage

Training Systems

Audience Level Intermediate

PyTorch Conference Europe 2026

Huamin Chen

Haichen Zhang

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event