Sveriges mest populära poddar
Alexa's Input (AI)

Building Reliable Systems at Bloomberg with Sal Furino

54 min17 maj 2026

In this episode of Alexa’s Input (AI), I sit down with Sal Furino to explore the hidden engineering work that keeps modern systems reliable.

We break down what Service Level Objectives, Indicators (SLOs/SLIs), and error budgets actually mean in practice, why reliability is as much a cultural problem as a technical one, and how teams can better measure real user experience instead of just infrastructure health.

Sal also explains reliability engineering and the challenges of reliability at scale, like:

  • Why latency and correctness become harder to measure with GenAI
  • The difference between a bad incident and a fundamentally bad system
  • How observability and telemetry shape modern engineering organizations
  • Why most teams focus too much on infrastructure metrics and not enough on user happiness
  • Why “the best systems are the ones nobody notices.”

If you work in AI infrastructure, distributed systems, platform engineering, observability, or SRE, this episode is a must listen!


SRECon Talk Dashboards & Dragons: Reliability Magic for AI Platforms by Alexa Griffith and Sal Furino: https://youtu.be/aWMB_7ksbkc?si=S49nPyAl_hCUIH7y


General Podcast Links

Watch: ⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠

Read: ⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠

Listen:⁠⁠ ⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠

More: ⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠


Learn more about the host at

Website: ⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠

LinkedIn: ⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠


Find out more about the guest at:

LinkedIn: https://www.linkedin.com/in/salvatore-furino/

Rootly Interview: https://rootly.com/humans-of-reliability/salvatore-furino

Reliability at Scale Talk: https://youtu.be/J-VrU5JHPlk?si=8aV8acy57NWX30KA

Bloomberg Careers: https://bloomberg.avature.net/careers/SearchJobs


Chapters


00:00 - Introduction: Reliability in a world reshaped by generative AI
02:22 - The importance of seamless, background system design
04:41 - Becoming a Customer Reliability Engineer at Bloomberg
05:17 - Clarifying the CRE role and its customer focus
08:02 - The importance of observability and high-scale performance in finance
09:00 - Balancing technical and cultural aspects of reliability
10:19 - Coaching teams to be proactive using error budgets and SLIs
12:21 - The social-technical system: People, processes, and tools
13:06 - Mediation of differing opinions on reliability practices
15:06 - The nuanced approach to alerting and incident response
17:08 - The significance of tiered SLOs and the concept of error budgets
21:08 - Using signals like latency, correctness, availability, saturation in system measurement
22:53 - The impact of service level "nines" on system design and resilience
28:00 - Handling non-determinism and trust in AI responses
33:01 - Error budgets and their role in managing deployments
34:10 - The challenge of achieving five nines and data durability considerations
40:03 - Adapting SLOs for GenAI systems: core principles remain intact
42:23 - Measuring non-deterministic AI responses and quality proxies
44:41 - The ongoing importance of reliability even in AI/ML contexts
47:25 - Reacting to error budget exhaustion and proactive mitigation
50:42 - The significance of involving cross-functional teams during outages
55:36 - Advocating reliability investment to leadership
56:24 - The customer perspective: reliability as a fundamental feature
58:42 - Connecting with Sal Furino: where to follow his work and learn more about Bloomberg's engineering culture
59:20 - Final advice: Focus on user happiness to avoid common pitfalls in adopting SLOs

Alexa's Input (AI) med Alexa Griffith finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.