The newsletter for ai builders of all levels. Mini-tutorials, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
Let’s talk about voice.
It feels like there is a bit of a barrier to sitting at your computer and talking to it vs sending voice notes to friends. Voice is quicker. But clarity is better when it’s written down. So being able to see your own speech would be helpful for clarifying your thoughts.
But the problem with voice at the moment is it's an extra cost, extra latency, and can you get the accuracy? Whenever I use ChatGPT Voice Mode, it transcribes what I say into Welsh because I am Welsh and I've got a faint Welsh accent, but I can't speak a word of it. Not particularly helpful.
AssemblyAI have been partners of ours for a while now, and their latest Streaming Speech-To-Text API is here. So think of streaming like when you ask ChatGPT or Claude a question, the output starts appearing a few lines at a time. That’s a much nicer experience instead of waiting for its full answer. For voice tools, streaming is hard because what your next few words are changes the interpretation of what you’ve already said.
AssemblyAI's API now balances that with insane speed, cost, and accuracy. So you can build an experience that combines what your users are saying and the transcription that they see, without it feeling broken. I tested it, and did a little review, and I think you should check it out if you're building anything with voice or you want to add voice capability to your applications or workflows. I'm going to be adding it to a bunch of mine where I actually do like to see what I'm saying, which my current voice tools don’t do!
Or, watch on YouTube.
ChatGPT now has a study mode. Instead of just doing the work, it guides students towards learning and finding answers on their own. Currently, it’s just a special prompt to nudge model behaviour, but OpenAI plans to bake that behaviour into its models over time.
NotebookLM can now make Video Overviews. It pulls in images and diagrams from your documents, creates new infographics and visuals as slides. I’ve already seen a few startups on the beat of “make a 3blue1brown-type animated video on a topic”, so expect more of such tools in the coming weeks.
And AI mode (in Google Search) is getting some new features like file uploads, search with live video and images, and a writing canvas. Google’s promoting both these features from a studying pov, you can guess why.
Attio is the AI-native CRM for the next generation of teams. Sync your email and calendar, and Attio instantly builds your CRM—enriching every company, contact, and interaction with actionable insights in seconds. Join fast growing teams like Granola, Flatfile, Modal, and more. Start for free today.*
Mark Zuckerberg released a new memo called “Personal Superintelligence”. I loved reading Om’s breakdown of it. Also, a scoop from Wired claims Zuck made offers worth a billion dollars (spread over a few years) to many people at Thinking Machines Lab (Mira Murati’s company), and everyone declined.
*sponsored
We’ve got a few open ad slots over the summer. Wanna partner with us?
🌐 What I’m consuming
Behold the first AI-native investment bank.
Cursor’s lead designer built an operating system with Cursor.
Logan Kilpatrick’s latest podcast with Matan Grinberg, CEO of Factory AI (I’m an investor).
⚙️ Tools I’m looking into
Airia - The enterprise AI platform with built-in governance and security. Deploy agents across teams while maintaining enterprise compliance.*
Snaptrude - Instantly turn messy docs into customizable requirements with an end-to-end platform for concept design.
Gumboard - A free, real-time sticky note board to keep track of your tasks. (it’s open-source)
Magic Patterns - The design tool to create and prototype new features for your product.
Ollama has a desktop app to use local models with a ChatGPT-like interface.
Eigent - A team of AI agents collaborating to complete complex tasks in parallel.
Chilled Sites - Super simple text-to-app builders with a TON of features (great for non-technical folks) - Paul’s shipping like crazy.
Parse.bot - Turn any website into an API.
Terragon - Background agents for Claude Code. I’m using it a lot for firing off lots of agents, especially on mobile.
*sponsored
Want customised help from AI devs for your company’s project? We’re trialling a matching service connecting companies with AI experts. Register your interest here.
🥣 Dev dish
Exa Fast - A search API from Exa with <500ms latency (40% faster than Google Search).
Claude Code can now work in multiple directories in a single session.
Crush by Charm - A glamorous coding agent in your terminal (works with all major model providers)
AutoRL - Write a single sentence to train a task-specific model.
LangExtract - OS Python library from Google to reliably extract information from documents.
🍦 Afters
Applications for the latest batch of HF0 (residency for ex-founders) are now open.
First-ever Claude Code Office Hours - today, 3 pm PDT.
Anthropic is looking to raise at $170B valuation, while Meta just increased its market cap by a similar number in just after-hours trading yesterday (puts all those offers from Zuck into perspective)
Also, OpenAI announced Stargate Norway, its first AI data centre initiative in Europe.
That’s it for today. Feel free to comment and share your thoughts. 👋
📷 thumbnail creds: @keshavatearth - I remembered Asimov’s “The Fun They Had” when coming up with the idea for today’s thumbnail.
Fellow voice lover currently exploring starting a company in the voice-to-action space. There’s so much low hanging fruit. So much more voice should be able to do.
Would love to chat more if that’s interesting!