Sveriges mest populära poddar
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

24 min21 augusti 2025

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.


In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.


Key Takeaways:


00:00 Introduction.

02:13 Overview of the company’s operations and global presence.

04:00 The tech stack and structure of the data engineering team.

04:24 Running nearly 2,000 DAGs in production using Airflow.

05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.

07:05 Details on the Kubernetes-based Airflow setup using Helm charts.

09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.

14:11 Making every team member Airflow-literate through local installation.

17:56 Using custom libraries and plugins to extend Airflow functionality.


Resources Mentioned:


Sébastien Crocquevieille

https://www.linkedin.com/in/scroc/


Numberly | LinkedIn

https://www.linkedin.com/company/numberly/


Numberly | Website

https://numberly.com/


Apache Airflow

https://airflow.apache.org/


Grafana

https://grafana.com/


Apache Kafka

https://kafka.apache.org/


Helm Chart for Apache Airflow

https://airflow.apache.org/docs/helm-chart/stable/index.html


Kubernetes

https://kubernetes.io/


GitLab

https://about.gitlab.com/


KubernetesPodOperator – Airflow

https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html


Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Fler avsnitt av The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Visa alla avsnitt av The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI med Astronomer finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.