Start / MLOps.community / Genai traffic why api infrastructure must evolve again erica hughberg 296

GenAI Traffic: Why API Infrastructure Must Evolve... Again // Erica Hughberg // #296

66 min • 14 mars 2025

GenAI Traffic: Why API Infrastructure Must Evolve... Again // MLOps Podcast #296 with Erica Hughberg, Community Advocate at Tetrate.

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

The way we handle API traffic is broken for GenAI. We've spent years optimizing for microservices—fast, stateless, and lightweight API calls. But GenAI changes everything. Requests are slower, heavier, and more complex, requiring long-lived connections, massive payloads, and streaming responses. Suddenly, traditional API gateways are struggling—timeout limits are too short, rate limiting models don’t fit, and payload constraints are blocking innovation.

In this episode, we unpack the new challenges of GenAI traffic and why infrastructure must evolve—again. We look back at previous API shifts, from the C10K problem to the monolith-to-microservices revolution, and how they reshaped networking. Now, AI-driven workloads demand a new kind of API gateway—one that handles token-based rate limiting, cost-aware request shaping, and scalable AI inference traffic.

// Bio

Erica Hughberg is a technical leader and community advocate passionate about helping engineering teams build scalable, secure, and human-centric application platforms. With a background in software engineering and a deep understanding of cloud-native technologies, she specializes in driving the adoption of open-source projects like Envoy Gateway, Istio, and Kubernetes Gateway API, which enable organizations to simplify traffic management, security, and API distribution.

As a maintainer of Envoy AI Gateway, she plays a key role in shaping the future of API infrastructure. She focuses on features to ensure organizations can securely and efficiently integrate AI-powered services while simplifying traffic management, security, and API distribution. In the Envoy community, she drives collaboration, mentorship, and contributions that advance the project and its adoption.

Lastly, as a believer in the power of storytelling, Erica enjoys translating complex technical concepts into engaging, accessible narratives in the form of social media posts, conference talks, podcasts, and educational content.

// Related Links

Efficient Deployment of Models at the Edge // Krishna Sridhar // MLOps Podcast #284 - https://youtu.be/sFqm7GTeulg

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Erica on LinkedIn: /ericahughberg

Timestamps:

[00:00] Erica's preferred coffee

[00:30] Takeaways

[01:50] Evolving Web Gateways

[14:35] Microservices to LLM Shift

[17:42] Intelligence Privacy Model

[22:26] Infrastructure for AI Creativity

[25:25] AI Gateway Networking Challenges

[30:37] Streamlit MVP to Production

[43:03] AI Model Scaling Challenges

[47:48] Tech Advocacy and Skills

[53:17] Optimizing Edge AI Performance

[56:43] Product Management Insights

[1:00:02] Navigating Evolving Tech Challenges

[1:04:35] Wrap up

Senaste avsnitt

LinkedIn Recommender System Predictive ML vs LLMs

12 augusti | 48 min

GPU Considerations, Labeling Privacy, Rapid Fine Tuning, and the Role of Private Eval Pipelines to Benchmark New Models

9 augusti | 56 min

GenAI Traffic: Why API Infrastructure Must Evolve... Again // Erica Hughberg // #296

Senaste avsnitt

LinkedIn Recommender System Predictive ML vs LLMs

GPU Considerations, Labeling Privacy, Rapid Fine Tuning, and the Role of Private Eval Pipelines to Benchmark New Models

The Hidden Bottlenecks Slowing Down AI Agents

9 Commandments for Building AI Agents

Enterprise AI Adoption Challenges

Real-time Feature Generation at Lyft // Rakesh Kumar // #334

AI Agent Development Tradeoffs You NEED to Know

From the Legal Trenches to Tech // Nick Coleman // #332

The Rise of Sovereign AI and Global AI Innovation in a World of US Protectionism // Frank Meehan // MLOps Podcast #331

A New Way of Building with AI

Inside Uber’s AI Revolution - Everything about how they use AI/ML

The Missing Data Stack for Physical AI

AI Reliability, Spark, Observability, SLAs and Starting an AI Infra Company

Greg Kamradt: Benchmarking Intelligence | ARC Prize

Bridging the Gap Between AI and Business Data // Deepti Srivastava // #325

The Creator of FastAPI’s Next Chapter // Sebastián Ramírez // #324

Everything Hard About Building AI Agents Today

Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318

Packaging MLOps Tech Neatly for Engineers and Non-engineers // Jukka Remes // #322

Hard Learned Lessons from Over a Decade in AI

Product Metrics are LLM Evals // Raza Habib CEO of Humanloop // #320

Getting AI Apps Past the Demo // Vaibhav Gupta // #319

Building Out GPU Clouds // Mohan Atreya // #317

A Candid Conversation Around MCP and A2A // Rahul Parundekar and Sam Partee // #316 SF Live

AI in M&A: Building, Buying, and the Future of Dealmaking // Kison Patel // #315

AI, Marketing, and Human Decision Making // Fausto Albers // #313

MLOps with Databricks // Maria Vechtomova // #314

Making AI Reliable is the Greatest Challenge of the 2020s // Alon Bochman // #312

Behavior Modeling, Secondary AI Effects, Bias Reduction & Synthetic Data // Devansh Devansh // #311

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

AI Data Engineers - Data Engineering After AI // Vikram Chennai // #309

I Am Once Again Asking "What is MLOps?" // Oleksandr Stasyk // #308

How Sama is Improving ML Models to Make AVs Safer // Duncan Curtis // #307

Agents of Innovation: AI-Powered Product Ideation with Synthetic Consumer Testing // Luca Fiaschi // #306

Real-Time Forecasting Faceoff: Time Series vs. DNNs // Josh Xi // #305

We're All Finetuning Incorrectly // Tanmay Chopra // #304

From Shiny to Strategic: The Maturation of AI Across Industries // David Cox // #303

Streaming Ecosystem Complexities and Cost Management // Rohit Agrawal // #302

Fraud Detection in the AI Era // Rafael Sandroni // #301

Beyond the Matrix: AI and the Future of Human Creativity

Efficient GPU infrastructure at LinkedIn // Animesh Singh // MLOps Podcast #299

Building Trust Through Technology: Responsible AI in Practice // Allegra Guinan // #298

Claude Plays Pokémon - A Conversation with the Creator // David Hershey // #294

From Rules to Reasoning Engines // George Mathew // #296

GenAI Traffic: Why API Infrastructure Must Evolve... Again // Erica Hughberg // #296

The Unbearable Lightness of Data // Rohit Krishnan // #295

Kubernetes, AI Gateways, and the Future of MLOps // Alexa Griffith // #294

Future of Software, Agents in the Enterprise, and Inception Stage Company Building // Eliot Durbin // #293

The Agent Exchange: Practitioner Insights

Talk to Your Data: The SQL Data Analyst

Getting to Grips with Web Agents

The Challenge with Voice Agents

The Agent Landscape - Lessons Learned Putting Agents Into Production

Evolving Workflow Orchestration // Alex Milowski // #291

Insights from Cleric: Building an Autonomous AI SRE // Willem Pienaar // #290

Robustness, Detectability, and Data Privacy in AI // Vinu Sankar Sadasivan // #289

AI & Aliens: New Eyes on Ancient Questions // Richard Cloete // #288

Real LLM Success Stories: How They Actually Work // Alex Strick van Linschoten // #287

Navigating Machine Learning Careers: Insights from Meta to Consulting // Ilya Reznik // #286

Collective Memory for AI on Decentralized Knowledge Graph // Tomaž Levak // #285

Efficient Deployment of Models at the Edge // Krishna Sridhar // #284

Real World AI Agent Stories // Zach Wallace // #283

Machine Learning, AI Agents, and Autonomy // Egor Kraev // #282

Re-Platforming Your Tech Stack // Michelle Marie Conway & Andrew Baker // #281

Holistic Evaluation of Generative AI Systems // Jineet Doshi // #280

Unleashing Unconstrained News Knowledge Graphs to Combat Misinformation // Robert Caulk // #279

LLM Distillation and Compression // Guanhua Wang // #278

AI's Next Frontier // Aditya Naganath // #277

PyTorch for Control Systems and Decision Making // Vincent Moens // #276

AI-Driven Code: Navigating Due Diligence & Transparency in MLOps // Matt van Itallie // #275

PyTorch's Combined Effort in Large Model Optimization // Michael Gschwind // #274

LLMs to agents: The Beauty & Perils of Investing in GenAI // VC Panel // Agents in Production

We Can All Be AI Engineers and We Can Do It with Open Source Models // Luke Marsden // #273

Exploring AI Agents: Voice, Visuals, and Versatility // Panel // Agents in Production

The Impact of UX Research in the AI Space // Lauren Kaplan // #272

EU AI Act - Navigating New Legislation // Petar Tsankov // MLOps Podcast #271

Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269