The Hidden Engine of Vision with Peyman Milanfar (Google)

How Denoising Secretly Powers Everything in AI

Peyman Milanfar is a Distinguished Scientist at Google, leading its Computational Imaging team. He's a member of the National Academy of Engineering, an IEEE Fellow, and one of the key people behind the Pixel camera pipeline. Before Google, he was a professor at UC Santa Cruz for 15 years and helped build the imaging pipeline for Google Glass at Google X. Over 35,000 citations.

Peyman makes a provocative case that denoising, long dismissed as a boring cleanup task, is actually one of the most fundamental operations in modern ML, on par with SGD and backprop. Knowing how to remove noise from a signal basically means you have a map of the manifold that signals live on, and that insight connects everything from classical inverse problems to diffusion models.

We go from early patch-based denoisers to his 2010 "Is Denoising Dead?" paper, and then to the question that redirected his research: if denoising is nearly solved, what else can denoisers do? That led to Regularization by Denoising (RED), which, if you unroll it, looks a lot like a diffusion process, years before diffusion models existed. We also cover how his team shipped a one-step diffusion model on the Pixel phone for 100x ProRes Zoom, the perception-distortion-authenticity tradeoff in generative imaging, and a new paper on why diffusion models don't actually need noise conditioning. The conversation wraps with a debate on why language has dominated the AI spotlight while vision lags, and Peyman's argument that visual intelligence, grounded in physics and robotics, is coming next.

Timeline

0:00 Intro and Peyman's background

1:22 Why denoising matters more than you think Sensor diversity and Tesla's vision-only bet

15:04 BM3D and why it was secretly an MMSE estimator

17:02 "Is Denoising Dead?" then what else can denoisers do?

18:07 Plug-and-play methods and Regularization by Denoising (RED)

26:18 Denoising, manifolds, and the compression connection

28:12 Energy-based models vs. diffusion: "The Geometry of Noise"

31:40 Natural gradient descent and why flow models work

34:48 Gradient-free optimization and high-dimensional noise

45:13 Image quality and the perception-distortion tradeoff

48:39 Information theory, rate-distortion, and generative models

52:57 Denoising vs. editing

54:25 The changing role of theory

57:07 Hobbyist tools vs. shipping consumer products

59:40 Coding agents, vibe coding, and domain expertise

1:05:00 Vision and more complex-dimensional signals

1:09:31 Do models need to interact with the physical world?

1:11:28 Continual learning and novelty-driven updates

1:13:00 On-device learning and privacy

1:15:01 Why has language dominated AI? Is vision next?

1:17:14 How kids learn: vision first, language later

1:19:36 Academia vs. industry

1:22:28 10,000 citations vs. shipping to millions, why choose?

Music:

"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
"Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Fler avsnitt av The Information Bottleneck