Sveriges mest populära poddar
Intellectually Curious

TD-Gammon: Self-Taught Reinforcement Learning and the Backgammon Breakthrough

6 min13 oktober 2025
Gerald Tesoro’s TD-Gammon (early 1990s, IBM) proved that reinforcement learning could reach world-class backgammon by learning from self‑play alone. A small neural network used temporal-difference learning to bootstrap its way toward better play, training on roughly 1.5 million self‑played games with a 3-layer architecture (198 inputs, ~80–160 hidden units, 4 outputs predicting White/Black win with or without a gammon). It barely lost to top players and, in doing so, shifted human strategy (notably the 2-1 opening) and helped spark modern RL breakthroughs that culminated in Deep Q‑Networks and AlphaGo/AlphaZero. The TD error signal also draws a provocative parallel to dopamine-based learning in the brain, suggesting universal principles behind intelligence that transcend systems.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

Fler avsnitt av Intellectually Curious

Visa alla avsnitt av Intellectually Curious

Intellectually Curious med Mike Breault finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.