The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
Claude 4.5 Sonnet is out - the best model on programming benchmarks. It’s better than Opus 4.1 across the board and better than GPT-5-codex in most cases. But in comparison to other models, it doesn’t give you the full picture here. Sonnet 4.5 is in its own league for agentic coding, computer use, and long-running tasks. It’s got much better vision, and you’ll feel it’s much smarter across the board. And it’s much more aligned (wrt to AI safety). Although I’ve watched livestreams where 4.5 didn’t do so well…
I’m loyal to no model, but if you want some stability with “the best AI you can get for $20/mo”, Sonnet 4.5 could hold that for the next few months. (I’m sure Gemini 3 or OpenAI dev day will soon make me eat my words, haha - if you’re going to dev day, tweet me - I’ll be there!)
This update comes with Claude Code v2, a new VS Code plugin for CC, a rebrand of the Claude Code SDK to Claude Agents SDK, two new tools in the Claude API, and a research preview called “Imagine with Claude“.
Sonnet 4.5 is available in every tool out there, including Factory’s Droid, Notion AI and Figma Make.
I’d recommend the vibe check from Every and Simon Willison if you want to read more. Or this quick video of each version of Claude trying to make a clone of Claude.ai
Mintlify has a new Agent to help you keep your docs up to date with AI. You can share any context (code changes via PRs, Slack threads, links and writing guidelines) and it’ll draft a docs PR with changes.
Before the weekend hit, Meta and OpenAI both released a new content feed in their AI apps. Vibes in Meta AI is a feed of AI-generated videos in partnership with Midjourney and Flux. The launch videos and the feed mostly have cute (sometimes absurd) animals dancing, but it’s not hard to imagine where it goes from here.
ChatGPT Pulse, otoh, is a daily personalised feed of new things that might be interesting to you. It’s proactive (ChatGPT messages you first), curatable (you can tell it what to search for) and limited (ends after a few recommendations every day). It works overnight to search for things based on your memories/recent activity in ChatGPT = compute-intensive = only available in ChatGPT’s Pro plan ($200/mo).
You can now buy things on ChatGPT. Instant Checkout, a new feature, allows Etsy and Shopify sellers to let people buy their stuff via ChatGPT in exchange for a cut they’ll pay to OpenAI. OpenAI claims it doesn’t affect ChatGPT’s recommendations. OpenAI’s recent “how people use ChatGPT” report classifies only 2.1% of queries are related to purchases (half as many as programming, which is 4.2%).
Gemini updated the 2.5 Flash and 2.5 Flash Lite models, primarily making them a lot faster and less hungry for tokens. Browser Use found the new flash model performs at par with o3 on their internal benchmarks (but much faster/cheaper). ps: Gemini also released a new "Gemini Live” model and a “robotics” model.
Outresearch the competition in minutes. Catch every critical detail, move faster on deals, and delegate complex tasks with Brightwave’s AI research agents. Get unlimited access free for 14 days — faster insights, instant memos, and the edge you need to win. Start your free trial today.*
*sponsored
🌐 What I’m consuming
Code Mode by CloudFlare - LLMs are better at writing TypeScript code to call MCP than at calling MCP directly—making code gen a better way to use MCP.
AI is already writing 90% of my code - by the maker of Flask.
Real AI agents and real work.
LoRA without regret - new blog from Thinking Machines Lab comparing LoRA with full fine-tuning and RL.
Abundant Intelligence by Sam Altman.
What I look for in an AI PM at Google Labs - part 1, part 2, part 3.
First course on Cursor Learn - A six-part video series on AI foundations.
⚙️ Tools and demos
Scout Monitoring’s MCP - Plain-language monitoring. Ask questions like “why is latency spiking?” and get answers right in your coding agent.*
Unified Copilot in Zapier - A single agent to create any workflow with access to the full toolkit of Zapier.
Lovable Cloud & AI - Lovable now comes with backend support (powered by Supabase) and special attention to adding AI features inside your app.
Excel’s Agent Mode - Microsoft now lets Copilot work autonomously in Excel, and it’s better than you’d expect. (how they built it).
Tembo - Background agents that plan, code, and review.
Integrity - Bring notes, canvases and AI chats into one connected workspace.
Cell - The fastest way for software teams to go AI‑native. (read more)
*sponsored
🥣 Dev dish
exa-code - hybrid search over 1B+ docs pages, repos, and Stack Overflow posts indexed to reduce hallucination for coding tasks.
GitHub Copilot CLI - GitHub also has a coding agent now, living in your terminal.
Agentic Commerce Protocol - An open standard to let agents make purchases.
Shared Payment Tokens by Stripe - An API for agentic payments.
How to make a Gemini CLI plugin for any IDE of your choice.
🌌 On the frontier
Cloudflare is launching a stablecoin for agentic commerce, calling it “Net Dollar”
DeepSeek made an experimental version of their base model. It’s 50% cheaper for users and 3x to 10x cheaper to serve for inference.
GDPval - measuring AI on real-world, economically valuable tasks. Opus 4.1 is the best model, just a few points away from human-like performance.
🍦 Afters
OpenAI is hiring its first research scientists for OpenAI for Science - a new program to build an AI-powered platform that accelerates scientific discovery.
Modal (infra for AI developers) raised a $87M Series B at a $1.1B valuation.
Paid AI raises $21M seed for monetising & cost tracking for AI agents
Ben’s Bites x Factory meetup in SF 8th October - come meet me IRL and talk about Droids, ask the team questions, plus more free tokens :).
That’s it for today. Feel free to comment and share your thoughts. 👋
📷 thumbnail creds: @keshavatearth