Name: Lightning Talk: Accelerating On-Device ML Inference With ExecuTorch and Arm SME2 - Jason Zhu, Arm
Start: 2026-04-07T14:15:00+0200
End: 2026-04-07T14:25:00+0200

7-8 April, 2025
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference Europe 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (UTC/GMT +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

Lightning Talk: Accelerating On-Device ML Inference With ExecuTorch and Arm SME2 - Jason Zhu, Arm

Tuesday April 7, 2026 14:15 - 14:25 CEST

Master Stage

As on-device AI workloads grow in complexity, achieving low-latency inference within mobile power constraints remains a central challenge. We examine how ExecuTorch, combined with Arm’s Scalable Matrix Extension 2 (SME2), enables efficient CPU deployments of production AI workloads. We present a case study of SqueezeSAM, a segmentation model deployed in real-world mobile applications. Using ExecuTorch with XNNPACK delegation and SME2-optimized kernels, we evaluate INT8 and FP16 inference on a flagship smartphone. Moving beyond aggregate latency, we apply operator-level profiling to decompose runtime across convolution, GEMM, elementwise, and data movement operators, showing how hardware acceleration reshapes bottlenecks in the execution stack. SME2 delivers up to 3.9x end-to-end speedup on a single CPU core, materially altering runtime composition and revealing data movement as the primary post-acceleration bottleneck. This session presents a practical workflow for deploying, profiling, and systematically optimizing on-device PyTorch models, demonstrating how SME2 expands the viable design space for interactive mobile AI.

Speakers

Jason Zhihuai Zhu

Senior Principal Engineer, Arm

Jason Zhu is a Senior Principal Engineer at Arm focused on hardware and software co-optimization for AI systems. With a background in quantum physics and experience spanning AI research and product engineering across major technology companies, he works across the full execution stack... Read More →

On device ML SME2 Pytorch Paris 2026 final pdf

Tuesday April 7, 2026 14:15 - 14:25 CEST
Master Stage

Inference & Production

Audience Level Any
Slides Attached Yes

PyTorch Conference Europe 2026

Jason Zhihuai Zhu

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event