In this tech talk, we dive deep into the technical specifics around LLM inference.
The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?
We jump into:
- Is fast model inference the real moat for LLM companies?
- What are the implications of slow model inference on the future of decentralized and edge model inference?
- As demand rises, what will the latency/throughput tradeoff look like?
- What innovations on the horizon might massively speed up model inference?
Fler avsnitt av Thinking Machines: AI & Philosophy
Visa alla avsnitt av Thinking Machines: AI & PhilosophyThinking Machines: AI & Philosophy med Daniel Reid Cahn finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
