Move your Google Drive documents straight into Postgres using Python and PyAirbyte. In this Technical Explorations episode, Jonny and Tarik from Dataminded show how they ingest internal meeting transcripts (Facts at Breakfast, Learning Over Lunch) from Google Drive into a relational table, ready for querying and AI use cases.
You'll see how to:
- Configure PyAirbyte to read from a Google Drive folder
- Authenticate with a Google service account (JSON key)
- Convert Airbyte output into a clean pandas DataFrame
- Load the processed data into a Postgres table
- Discuss performance limits, API rate limits, and batching
- Reflect on when PyAirbyte is great for PoCs vs. production setups
We also touch on:
- How many connectors Airbyte offers and what PyAirbyte can reuse
- Trade-offs of code-first ingestion vs. point-and-click UI
- Ideas for the next step: using MindsDB and LLMs to query this knowledge base
Resources:
- Demo code: https://github.com/datamindedbe/demo-technology-exploration/
- Click here to watch a video of this episode.
- Full playlist: https://www.youtube.com/playlist?list=PLJ_da7qdfL80rA7byzC_CmyrfJWjcCTnb
Creators & Guests
- Jonny Daenen - Host
- Tarik Jamoulle - Guest
Chapters:
- (00:00) - Intro
- (01:18) - What is Airbyte? (and 600+ connectors)
- (04:11) - Demo: Google Drive → Postgres
- (09:22) - Q: How do you get the table structure?
- (10:43) - Scale & format limits (many files, PDFs, images)
- (12:45) - Setting up Google Drive: auth & permissions
- (14:44) - Running it in production: Airflow + Docker
- (15:15) - Next up: MindsDB + verdict
Data & AI: Technology Explorations is a biweekly show from Dataminded. Each episode a Dataminded engineer demos a tool or technique worth knowing about -- working code, honest takes, no hype.
Music by Aleksandr Karabanov from Pixabay
Fler avsnitt av Technology Explorations in Data & AI
Visa alla avsnitt av Technology Explorations in Data & AITechnology Explorations in Data & AI med Dataminded finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
