Sveriges mest populära poddar

Rapid Synthesis: Delivered under 30 mins..ish, or it's on me!

Scaling Multi-Tenant ML Inference on Kubernetes: Workday's Strategy

20 min•9 april 2025

Workday's engineering team tackled the challenge of scaling machine learning inference for numerous customers by devising a "bin packed shards" strategy on Kubernetes. This approach, detailed in their Medium article from January 2022, involves grouping multiple tenants' ML models into shared units called shards, aiming for efficient resource usage, particularly memory. Kubernetes handles the deployment and scaling of these shards, while Istio's Virtual Services manage the routing of tenant-specific requests. The strategy offers benefits like cost reduction and independent model management but also presents complexities in initial design and ongoing operation, focusing on a balance between efficiency and manageability.

Fler avsnitt av Rapid Synthesis: Delivered under 30 mins..ish, or it's on me!

The Industrialization of Autonomy: Anthropic’s Managed Agents Infrastructure

9 apr.•59 min

Qwen3.6-Plus: The Architecture of Agentic Enterprise Intelligence

9 apr.•41 min

The Open Agent Data Revolution

9 apr.•48 min

GLM-5.1: The Dawn of Eight-Hour Agentic Engineering

9 apr.•58 min

TurboQuant: Engineering Extreme AI Vector Compression and Efficiency

9 apr.•39 min

Terminal Velocity: A Beginner’s Guide to Claude Code

9 apr.•1 tim 5 min

Gemma 4 and Local-First AI Architectural

9 apr.•52 min

AI Orchestration: The CLI and MCP Architectural Debate

29 mars•1 tim 13 min

The Maturation of AI Agent Infrastructure

29 mars•41 min

GPU Value and Data Center Investment Dynamics

29 mars•58 min

Rapid Synthesis: Delivered under 30 mins..ish, or it's on me! med Benjamin Alloul 🗪 🅽🅾🆃🅴🅱🅾🅾🅺🅻🅼 finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.