In this episode, we hosted Zhuang Liu, Assistant Professor at Princeton and former researcher at Meta, for a conversation about what actually matters in modern AI and what turns out to be a historical accident.
Zhuang is behind some of the most important papers in recent years (with more than 100k citations): ConvNeXt (showing ConvNets can match Transformers if you get the details right), Transformers Without Normalization (replacing LayerNorm with dynamic tanh), ImageBind, Eyes Wide Shut on CLIP's blind spots, the dataset bias work showing that even our biggest "diverse" datasets are still distinguishable from each other, and more.
We got into whether architecture research is even worth doing anymore, what "good data" actually means, why vision is the natural bridge across modalities but language drove the adoption wave, whether we need per-lab RL environments or better continual learning, whether LLMs have world models (and for which tasks you'd need one), why LLM outputs carry fingerprints that survive paraphrasing, and where coding agents like Claude Code fit into research workflows today and where they still fall short.
Timeline
00:13 — Intro
01:15 — ConvNeXt and whether architecture still matters
06:35 — What actually drove the jump from GPT-1 to GPT-3
08:24 — Setting the bar for architecture papers today
11:14 — Dataset bias: why "diverse" datasets still aren't
22:52 — What good data actually looks like
26:49 — ImageBind and vision as the bridge across modalities
29:09 — Why language drove the adoption wave, not vision
32:24 — Eyes Wide Shut: CLIP's blind spots
34:57 — RL environments, continual learning, and memory as the real bottleneck
43:06 — Are inductive biases just historical accidents?
44:30 — Do LLMs have world models?
48:15 — Which tasks actually need a vision world model
50:14 — Idiosyncrasy in LLMs: pre-training vs post-training fingerprints
53:39 — The future of pre-training, mid-training, and post-training
57:57 — Claude Code, Codex, and coding agents in research
59:11 — Do we still need students in the age of autonomous research?
1:04:19 — Transformers Without Normalization and the four pillars that survived
1:06:53 — MetaMorph: Does generation help understanding, or the other way around?
1:09:17 — Wrap
Music:
- "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
- "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
- Changes: trimmed
About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Fler avsnitt av The Information Bottleneck
Visa alla avsnitt av The Information BottleneckThe Information Bottleneck med Ravid Shwartz-Ziv & Allen Roush finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
