We dive into ByteDance Seed's AAPT—autoregressive adversarial post-training—that promises fast, frame-by-frame AI video for interactive experiences. Learn how a pre-trained diffusion model is converted into a causal, one-pass-per-frame generator, how KV caching and a sliding 5-second window keep latency in check, and why a three-stage training pipeline (diffusion adaptation, consistency distillation, and adversarial training with a frame-level discriminator) matters. We'll unpack student forcing versus teacher forcing, what the results say about latency, throughput, and long-horizon coherence, and what this could mean for real-time virtual worlds.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
Fler avsnitt av Intellectually Curious
Visa alla avsnitt av Intellectually CuriousIntellectually Curious med Mike Breault finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
