Big upgrade for Sonnet
and massive downgrade in developer comms
Hey I’m Ben. I build stuff with agents, even though I’m not technical. Here’s all the stuff I’m reading and tinkering with. If you want to start building or level up your ‘vibe-coding’ skills, join our community.
Hey folks,
Claude Sonnet 4.6 is out. It’s better than Opus 4.5 across most of the benchmarks and even surpasses Opus 4.6 in two categories: office tasks and financial analysis. Plus, it’s really good at browser/computer-use-based tasks. If you have simple agents, switch to Sonnet 4.6 and make your limits go further.
Sonnet 4.6 is also now the default model for free Claude users. My recommendation to people outside the AI circle has always been ChatGPT because Claude’s free tier was worthless (in terms of compute and features). This upgrade also brings a lot of features like file creation, connectors, etc., to the free tier. Claude also got better at using web search and not filling up the context window.
But it wouldn’t be AI without a little *drama*…
First some context; Anthropic and OpenAI both offer heavily subsidised $200/mo plans (~15x less than if you used the API’s), but the two companies are handling third-party developer access very differently. Anthropic told developers in January that building on the Agent SDK (formerly Claude Code SDK) with a Claude subscription was fine, then updated their docs recently to say the opposite. And have since refused to give a clear answer on whether open-source apps can let users bring their own subscription. Theo has a really good breakdown of it here. OpenAI, by contrast, has explicitly blessed third-party use of Codex via ChatGPT OAuth. The general concern here is that Anthropic's opaque policy reversals and insular culture are creating real uncertainty for developers trying to build on their platform.
Gemini can create music now. Google's new music generation model Lyria 3 is now integrated in Gemini, and it can create music with lyrics based on your prompts, images or even videos. It creates 30-second clips with an art cover generated by Nano Banana.
I played with it for a bit.
a) it’s fast
b) output is a little cringe. but I’ve been tracking the rise of AI music on YouTube and the output is kinda similar to popular videos from 6 months ago — that’s when it started getting good. Study or sleeping related AI music is now a big category on YT.Two chats from my experiments on how Gemini treats lyrics, vibes and copyright etc. — a techno chant and a techbro lullaby
— Keshav
Claude Code to Figma - Figma MCP now allows you to code (design) something using Claude code and then send it to Figma, where you can work on it with your familiar tools.
Attio is the AI CRM for modern go-to-market teams. Connect to your email/calendar to instantly build your enriched CRM with complete context. Then ask anything: meeting prep, call insights, answers about your business. Join fast-growing teams like Granola, Flatfile and Modal. Start for free today.*
🌐 What I’m consuming
Decisions that shaped up Claude Code and where coding agents are going next.
Browse code by meaning - A different way to explore a codebase instead of the file tree.
What is the future of design in a post-AI world?
5 fixes that could explode consumer LLM adoption.
WEB 4.0 - AI that earns its existence, self-improves, and replicates without a human.
Anthropic is building an Auto Memory system for Claude Code. I found this bit interesting - “The system is nudging toward a pattern where MEMORY.md is a short index, and details live in separate topic files.”
Giving an agent access to email is extremely risky. It’s not a new claim or a detailed post, but still worth a reminder every now and then :) ps: here’s a detailed feature list for an email agent that doesn’t suck.
Measuring agent autonomy in practice - new data from Anthropic. Top 0.1% of Claude Code sessions now run for more than 45 minutes (up from 25 mins in October 2025 i.e <5 mo ago). As users get familiar, they give Claude more permissions as well as interrupt it more.
⚙️ Tools and demos
Speechmatics – STT for voice agents. <300ms latency, high accuracy at conversational speed, 55+ languages. BB readers get $200 free credits.*
Traces - CLI and web tool to share and discover your sessions with multiple coding agents.
Intent by Augment Code - Orchestrate agents and manage all your dev work in a single place.
Aperture by TailScale - LLM gateway to centralise model access and track team usage without managing individual API keys.
Lemon - Talk to your computer and let it do tasks for you. It’s kinda like wisprflow did more than just transcription—like sending emails, managing your calendar, researching across tabs and more.
Monologue, the transcription app I use on Mac, is now on iOS too. I’ve been testing it for a while and it’s great!
Cursor Marketplace - Discover and install plugins (a bundle of skills, MCPs, subagents, hooks, etc.) for the full development lifecycle.
Wiretext and Mockdown - Both let you create quick wireframes and export them as markdown or ascii to share with your coding agents. (wiretext demo — mockdown demo)
🥣 Dev Dish
Liveline - real-time animated line chart component for React. (see live example)
React Doctor - Scan your React codebase for anti-patterns.
Chartroom - a CLI charting tool from Simon Willison. (read more)
Build a custom agent framework with Pi.
🍦 Afters
Polsia is an agent building clones (which are just landing pages with a “get early access” button) for popular tools autonomously. It has created over 500 of those by now.
EVMbench - new eval from OpenAI to test if models can exploit and patch smart contracts on blockchains. Most models can detect a decent number of vulnerabilities, patch only a few of them, but exploit a lot.
Enjoy this newsletter? Forward it to a friend.
That’s it for today. Feel free to comment and share your thoughts. 👋
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who made this newsletter possible :)
Wanna partner with us for Q1?

