Designing on-device AI models

Two new on-device AI audio models are shipping across Detail and Subwave this week, and both were built in-house. Until recently, the benchmark for audio enhancement was Dolby: server-side processing, per-minute pricing, and a generic output that sounded identical regardless of the app it came from. Building the alternative became possible because of four things arriving at the same time: open-source base models, years of Detail recordings as training data, affordable hardware fast enough to train overnight, and tools like Claude Code handling the engineering scaffolding around the model itself. The Clear model cleans up recordings and the Uhm model detects filler words, both running entirely on-device, processing a 10-minute recording in 10 seconds without the audio ever leaving your phone. The audio enhancement model was shaped around a specific sound target: warm and present, closer to a podcast studio than a phone call. The clean reference audio it learns from is itself lightly enhanced and normalized toward that target, so the model learns the sound of Detail recordings, not clean speech in general. Iteration looked more like product work than research: train overnight, listen back in the morning across tens of real recordings, run blind A/B comparisons against Dolby and against previous versions, pick the winner, repeat. Because both models carry zero variable cost, they change what's possible as product defaults. In Detail, Auto Edit runs both models on every recording without being asked. In Subwave, audio enhancement applies to every post by default. The filler word detection model processes a 57-minute interview in under 30 seconds. The result is not a generic enhancement layer dropped into the app; it is a trained opinion about what a Detail recording should sound like. Published on Subwave https://subwave.app/@paul/post/designing-on-device-ai-models

Fler avsnitt av Paul Veugen