Sveriges mest populära poddar
Technology Explorations in Data & AI

Data Ingestion using PyAirbyte

18 min18 november 2025

Move your Google Drive documents straight into Postgres using Python and PyAirbyte. In this Technical Explorations episode, Jonny and Tarik from Dataminded show how they ingest internal meeting transcripts (Facts at Breakfast, Learning Over Lunch) from Google Drive into a relational table, ready for querying and AI use cases.

You'll see how to:

  • Configure PyAirbyte to read from a Google Drive folder
  • Authenticate with a Google service account (JSON key)
  • Convert Airbyte output into a clean pandas DataFrame
  • Load the processed data into a Postgres table
  • Discuss performance limits, API rate limits, and batching
  • Reflect on when PyAirbyte is great for PoCs vs. production setups

We also touch on:

  • How many connectors Airbyte offers and what PyAirbyte can reuse
  • Trade-offs of code-first ingestion vs. point-and-click UI
  • Ideas for the next step: using MindsDB and LLMs to query this knowledge base

Resources:

Creators & Guests


Chapters:
  • (00:00) - Intro
  • (01:18) - What is Airbyte? (and 600+ connectors)
  • (04:11) - Demo: Google Drive → Postgres
  • (09:22) - Q: How do you get the table structure?
  • (10:43) - Scale & format limits (many files, PDFs, images)
  • (12:45) - Setting up Google Drive: auth & permissions
  • (14:44) - Running it in production: Airflow + Docker
  • (15:15) - Next up: MindsDB + verdict

Data & AI: Technology Explorations is a biweekly show from Dataminded. Each episode a Dataminded engineer demos a tool or technique worth knowing about -- working code, honest takes, no hype.

Music by Aleksandr Karabanov from Pixabay

Fler avsnitt av Technology Explorations in Data & AI

Visa alla avsnitt av Technology Explorations in Data & AI

Technology Explorations in Data & AI med Dataminded finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.