Sveriges mest populära poddar

Module 4: Quantization - Shrinking Models Without Breaking Them

12 min•25 februari 2026

This episode tackles the lever that turns powerful LLMs into something you can actually run: quantization. We explore what it means to store model weights with fewer bits, why that can cut memory in half at 8-bit and down to roughly a quarter at 4-bit, and the real tradeoff between compression and capability as rounding error accumulates across billions of parameters. We break down why large models survive this better than small ones, why 8-bit is often near lossless, why 4-bit can still be shockingly strong, and why going below that can make models fall apart. We compare the three practical paths you will see in the wild: GPTQ (layer-wise compression with error compensation), AWQ (protecting the most important weights), and GGUF (the local-friendly format that makes CPU and GPU splitting possible).

Fler avsnitt av The AI Concepts Podcast

Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All

12 juni•9 min

Module 6: RAG | GraphRAG - When Relationships Matter More Than Text

10 juni•8 min

Module 6: RAG | Query Transformation - When the Question Is the Bottleneck

10 juni•7 min

Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big

10 juni•7 min

Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

10 juni•10 min

Module 6: RAG | Dense and Sparse Search - Why Vector Search Alone Is Not Enough

10 juni•11 min

Module 6: RAG | Chunking - Where You Cut Decides What Gets Found

29 apr.•11 min

Module 6: RAG | Data Ingestion - Before Your Documents Can Be Found

27 apr.•12 min

Module 6: RAG | Vector Databases - Where That Meaning Gets Stored

27 apr.•10 min

Module 6: RAG | Embeddings - Teaching Machines to Understand Meaning

27 apr.•8 min

The AI Concepts Podcast med Sheetal ’Shay’ Dhar finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.