Meta VP Matt Steiner on Ads Infra, GPUs, MTIA, and LLM-Written Kernels

Matt Steiner, VP of Monetization Infrastructure, Ranking & AI Foundations at Meta, walks through how Meta's ad system actually works, and why the infrastructure behind it differs from what you'd build for LLMs.

We cover Andromeda (retrieval on a custom NVIDIA Grace Hopper SKU Meta co-designed), Lattice (consolidating N ranking models into one), GEM (Meta's Generative Ads Recommendation foundation model), and the adaptive ranking model, a roughly one-trillion-parameter recommender served at sub-second latency.

We get into why recommender workloads aren't embarrassingly parallel like LLMs (the "personalization blob"), what that means for Meta's MTIA custom silicon roadmap, and how LLM-written kernels (KernelEvolve) flipped the economics of running a heterogeneous hardware fleet. Demand for software engineering has actually gone up as the price has come down. Meta now wants ~100x more optimized kernels per chip.

Read the full transcript at https://www.chipstrat.com/p/an-interview-with-meta-vp-matt-steiner

Chapters:
0:00 Intro and scale
0:39 How Meta's ad system works
2:00 Meta Andromeda and the custom NVIDIA SKU
3:30 Lattice: consolidating ranking models
5:00 GEM, Meta's ads foundation model
6:30 Adaptive ranking for power users
8:17 The scale: 3B DAUs at sub-second latency
9:40 Why longer interaction histories matter
10:45 The anniversary gift analogy
12:57 A decade of compute evolution
15:21 Meta's infra as a CP-SAT problem
16:07 Co-designing Grace Hopper with NVIDIA
17:47 Matching compute shape to workload
18:26 Influencing hardware and software roadmaps
20:23 MTIA: why ads aren't LLMs
22:07 The personalization blob and I/O ratios
26:38 One trillion parameters at sub-second latency
28:26 Heterogeneous hardware trade-offs
29:30 KernelEvolve: LLMs writing custom kernels
33:30 GenAI and recommender systems cross-pollination
35:21 The 2-year infrastructure outlook
37:00 Why demand for software engineering is rising
38:53 How Matt stays on top of it all

Relevant reading:
KernelEvolve (Meta Engineering): https://engineering.fb.com/2026/04/02/developer-tools/kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure/

Follow Chipstrat:
Newsletter: https://www.chipstrat.com
X: https://x.com/chipstrat

Fler avsnitt av Semi Doped