Sveriges mest populära poddar

Intellectually Curious

Vetenskap Teknologi

How Do You Count Words in a 5 TB Text File?

5 min•3 mars 2026

We explore counting words across 5 terabytes of text using distributed systems. From chunking data into 128 MB blocks and performing map and reduce, to Hadoop’s disk I/O and Spark’s in-memory approach, we discuss when memory fits, when it spills, and why I/O is the real bottleneck. We’ll also cover tokenization pitfalls at block boundaries, failure resilience, data skew, and practical timelines on real clusters for building resilient, scalable text analytics pipelines.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

Fler avsnitt av Intellectually Curious

The Late Paleozoic Oxygen Pulse

31 mars•6 min

TurboQuant: The 3-Bit Breakthrough Making AI Faster and Smaller

30 mars•6 min

The AI Scientist: Automating the Scientific Life Cycle

29 mars•6 min

Protein Truths and Fiber Focus: A Stanford Reality Check

28 mars•6 min

AI and the High Temperature Superconductivity Challenge

27 mars•6 min

Black Mass: Turning Spent EV Batteries into a Circular Economy

26 mars•5 min

The Silicon Geologist: Mapping Alien Worlds with AI

25 mars•6 min

Spacetime Bounds on Consciousness: Chords, Arpeggios, and the BCI Frontier

24 mars•5 min

From Local News to GroundSource: AI That Predicts Floods 24 Hours Ahead

23 mars•5 min

Two Realities, One Team: Trust and Cooperation in Mixed Reality

22 mars•4 min

Intellectually Curious med Mike Breault finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.