Data quality is not optional when you manage credit data at scale.
In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality with Airflow and DAG Factory and how they scale safely across thousands of DAGs. We explore how governance, PII protection and orchestration come together inside a modern data platform.
Key Takeaways:
00:00 Introduction.
01:00 Overview of Credit Karma’s products and financial data ecosystem.
02:00 The team acts as gatekeepers for ingesting data from TransUnion and Equifax.
03:00 Why PII handling and controlled downstream access led to adopting Airflow.
04:00 BigQuery as the warehouse and Airflow as the primary orchestrator.
05:00 Why data quality and governance are critical in financial systems.
07:00 Why Airflow was selected: ease of use and unified ETL plus data quality.
09:00 Introduction to DAG Factory and YAML-based DAG generation.
10:00 GitHub executor creates PR-driven DAG workflows with CI checks.
12:00 BigQuery operators, structured checks and custom Slack and PagerDuty alerts.
13:00 Failed checks stop ETL pipelines and trigger notifications.
17:00 Scaling DAG Factory across thousands of DAGs and runtime vs compile-time concerns.
19:00 Future improvements: better defaults, retries and GenAI workflows in Airflow.
Resources Mentioned:
https://www.linkedin.com/in/ashir-alam/
https://www.linkedin.com/company/intuit-credit-karma/
https://airflow.apache.org/
https://github.com/astronomer/dag-factory
https://cloud.google.com/bigquery
https://github.com/
https://slack.com/
https://www.pagerduty.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
Fler avsnitt av The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Visa alla avsnitt av The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AIThe Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI med Astronomer finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
