GenAI Traffic: Why API Infrastructure Must Evolve... Again // MLOps Podcast #296 with Erica Hughberg, Community Advocate at Tetrate.
Join the Community: https://go.mlops.community/YTJoinIn
Get the newsletter: https://go.mlops.community/YTNewsletter
// Abstract
The way we handle API traffic is broken for GenAI. We've spent years optimizing for microservices—fast, stateless, and lightweight API calls. But GenAI changes everything. Requests are slower, heavier, and more complex, requiring long-lived connections, massive payloads, and streaming responses. Suddenly, traditional API gateways are struggling—timeout limits are too short, rate limiting models don’t fit, and payload constraints are blocking innovation.
In this episode, we unpack the new challenges of GenAI traffic and why infrastructure must evolve—again. We look back at previous API shifts, from the C10K problem to the monolith-to-microservices revolution, and how they reshaped networking. Now, AI-driven workloads demand a new kind of API gateway—one that handles token-based rate limiting, cost-aware request shaping, and scalable AI inference traffic.
// Bio
Erica Hughberg is a technical leader and community advocate passionate about helping engineering teams build scalable, secure, and human-centric application platforms. With a background in software engineering and a deep understanding of cloud-native technologies, she specializes in driving the adoption of open-source projects like Envoy Gateway, Istio, and Kubernetes Gateway API, which enable organizations to simplify traffic management, security, and API distribution.
As a maintainer of Envoy AI Gateway, she plays a key role in shaping the future of API infrastructure. She focuses on features to ensure organizations can securely and efficiently integrate AI-powered services while simplifying traffic management, security, and API distribution. In the Envoy community, she drives collaboration, mentorship, and contributions that advance the project and its adoption.
Lastly, as a believer in the power of storytelling, Erica enjoys translating complex technical concepts into engaging, accessible narratives in the form of social media posts, conference talks, podcasts, and educational content.
// Related Links
Efficient Deployment of Models at the Edge // Krishna Sridhar // MLOps Podcast #284 - https://youtu.be/sFqm7GTeulg
~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~
Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
Join our Slack community [https://go.mlops.community/slack]
Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]
Sign up for the next meetup: [https://go.mlops.community/register]
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Erica on LinkedIn: /ericahughberg
Timestamps:
[00:00] Erica's preferred coffee
[00:30] Takeaways
[01:50] Evolving Web Gateways
[14:35] Microservices to LLM Shift
[17:42] Intelligence Privacy Model
[22:26] Infrastructure for AI Creativity
[25:25] AI Gateway Networking Challenges
[30:37] Streamlit MVP to Production
[43:03] AI Model Scaling Challenges
[47:48] Tech Advocacy and Skills
[53:17] Optimizing Edge AI Performance
[56:43] Product Management Insights
[1:00:02] Navigating Evolving Tech Challenges
[1:04:35] Wrap up
En liten tjänst av I'm With Friends. Finns även på engelska.