The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
As the maintainer of everything audio in `transformers` (the lib), this talk shares how audio is being integrated into large language models, grounded in what we observe from the OS ecosystem.
Beginning with a brief overview of the current landscape of Audio LMs, I'll then highlight emerging trends in how audio is incorporated into pretrained text backbones. In particular, we examine the growing convergence of architectural choices, many inspired by VLMs, as well as newer concepts such as audio tokenization and streaming.
The core of the talk focuses on providing the audience with key technical insights: audio encoders vs audio tokenizers, their respective advantages and limitations. It covers the motivations behind introducing concepts such as audio tokenizers and audio processors into transformers, shows how these design choices are reflected in the library, and explains how PyTorch tooling is leveraged to make audio a standardized modality for the open-source community.
A 2024 MVA graduate, I now work on open-source audio at Hugging Face. My current focus is on standardising audio in the transformers library and strengthening support across models.
Wednesday April 8, 2026 11:35 - 11:45 CEST Founders Cafe