Office Hours with Amanda Askell of Anthropic

In this CS 153 "Office Hours" episode, Anthropic's Amanda Askell discusses her work making Claude "good"—a journey that took her from a philosophy PhD in formal ethics and decision theory to leading character and alignment work at Anthropic. She explains why Aristotelian virtue ethics has proven more practically useful than abstract theoretical frameworks, and unpacks Anthropic's constitutional approach to AI. Rather than imposing strict rules, the constitution describes situations, values, and good judgment, aiming for coherence across domains so models generalize well into new contexts. Askell argues this approach is safer than the "purely corrigible tool" model, which she worries could generalize into an entity willing to do anything it's told, and stresses the importance of flexibility paired with backbone - models that adapt to users but push back when something genuinely harms them. Looking ahead, she sees the next one to two years as critical as models become more autonomous, and hopes for a "rocky but ultimately good" transition with responsible deployment, strong alignment work, and a society that adapts to the disruption. She closes on an optimistic note about meaning beyond work, drawing on Star Trek-style abundance and the idea that purpose comes from relationships and contribution.

Fler avsnitt av CS 153