The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
I’m hosting a workshop tomorrow with Dan Shipper and Every. We’ll talk about what CLIs are and dive into the Factory CLI, Droid. It’s aptly called, Droid Camp. Come and join us for an hour of informal demos, q&a and building!
Cursor and Windsurf both shipped their in-house models. Cursor 2.0 came with Composer-1 (and a new agent-first UI). Windsurf also released SWE-1.5. Both are better than Sonnet 4 but worse than Sonnet 4.5/GPT-5 Codex, have insane speeds and are built to perform the best inside their own platforms.
The Information reported that Anthropic and OpenAI are using Cognition’s coding tests, and Cursor is also using their internal benchmark (which they say they can’t make public) to show Composer’s performance. Simon made them both draw a pelican (Composer — SWE-1.5), and Every feels the IDE in Cursor distracts you from Composer.
I agree generally. IDEs are the old way, CLIs and these new agent UI’s are the new way for software engineering (although CLIs are technically old). I didn’t ever understand ‘tab’ (autocomplete) as a ‘thing’ - it reminded me of early ChatGPT days where people rushed to build autocomplete for text everywhere. It's fine, but it’s no leap in function or form.
OpenAI finished its profit/non-profit reorg, and Sam Altman posted a tldr of a livestream he did with OpenAI’s chief scientist. It’s still pretty long, so here’s tldr of a tldr:
Already secured 30GW of compute commitments costing $1.4T.
Non-profit owns 26% of the for-profit PBC (valued at $500B total), Microsoft owns 27% (rumour - OpenAI has plans for a trillion-dollar IPO)
Goals to build an automated “AI researcher intern” by Sept 2026, drop the intern by 2028.
That’s not it, they released two new open-weights models under the name gpt-oss-safeguard: fine-tunes of the gpt-oss models with both 20B and 120B variants. Also, Sora has a new feature, Character Cameos (put your dog in your videos), and it’s available in three more countries, with open access for these (different) countries (i.e. no invite code needed).
The CEO of Warp ripped out Salesforce internally and moved entirely to Attio. His logic? “We need something powerful, easy to use that makes us want to log in every day as opposed to feeling like a chore.” No surprise, his sales team was down for the switch. Are you?
🌐 What I’m consuming
Why AI voice agents fail at multi-speaker conversations – and how to fix it.
New diligence challenge with AI - When a prototype performs better than the company you wanted to acquire.
How Rakuten replaced LLM-as-a-Judge with SAE probes for PII detection (while saving money).
Signs of introspection in LLMs - Can Claude actually recognise its thoughts, or does it just make up explanations when asked about them?
Cursor 2.0’s system prompt in an easy-to-explore artifact (hunted by elder_plinius).
⚙️ Tools and demos
LLM Gateway - Combine AssemblyAI’s fast & accurate speech models with GPT-5, Claude, and more.
Superhuman Go - Grammarly is renaming itself to Superhuman (they acquired the email company a while ago) and launching an assistant that can connect to your email, docs, company data and of course fix your typos.
Odyssey-2 - Generate an instant AI video that you can interact with.
Kaizen - Build browser automations and never do a data entry task twice.
Ariana, not Grande, it’s another parallel coding agents app. If you’re planning to make a text-to-app tool, pivot to “UI for Claude Code” hah.
Anime Leak - Put funky art in your real pictures.
🥣 Dev dish
NextJS has its own evals now. It tests AI models on how many failing tests in a NextJS project a model can fix. GPT-5 Codex performs the best by fixing 42% of the tests, but Codex, the agent, solves only 30% (vs Claude Code’s 42% again). So, I'm not sure how reliable these results are.
Might wanna run this command to review your PRs before you submit them.
OlmoOCR - Convert PDFs/image-based documents into clean plain text on your device.
New Search() API by Chroma - Get the best of both worlds with a hybrid search that combines vector search with metadata filtering and custom ranking.
💰 Who got that bag?
Cartesia AI released a new speech generation model, Sonic-3, and raised $100M.
Fireworks AI (model hosting platform) has raised $250M Series C at a $4B valuation.
Mem0 (memory for AI) raised $24M in Seed and Series A.
🙋 How do I…
ask data questions and get answers, right in my Slack?
Three easy steps, takes less than 5 minutes:
Securely connect your DB to Julius (Postgres, BigQuery, Snowflake, Supabase, MySQL + more supported)
Enable Slack Agent (http://julius.ai/data-connectors)
Create a #ask-data Slack channel and invite `@juliusai` to the channel
More docs here: Julius’ Slack Agent overview.
create marketing posters just with my website URL?
like this one? You got it. Go to this new tool from Google Labs called Pomelli.
It picks up business DNA - Your brand colours, fonts and assets from your website.
It ideates on campaigns that you can run (and you can guide it via prompts).
It generates the ads, and you can edit them to your liking (with some limitations).
See the full tutorial by Justine.
make a “good” chat with X feature without complex code?
hmm, this one’s not easy. You’ll need to make multiple tools, manage context between them. allow dozens of API calls… I’m just kidding,
send your prompt to droid exec.
That’s it.
More here: Building interactive apps with Droid Exec
🍦 Afters
Find out what integrating AI looks like for healthcare, education or urban policy on Nov 7th-8th. Sign up for TechEquity’s Ai Summit (visit the link and use the code BENSBITES20 for a 20% discount.)
Two new measures of model capabilities:
ECI by Epoch AI, meant to normalise multiple benchmarks to track progress across a longer time period.
RLI - Remote Labor Index from Scale AI and Center for AI Safety. An attempt to measure how much real-world, economically valuable remote work the models/agents can do.
New AI wearable in the market - Stream Ring by Sandbar.
YouTube plans to upscale old videos to HD using AI.
That’s it for today. Feel free to comment and share your thoughts. 👋
Read about me and Ben’s Bites
📷 thumbnail creds: @keshavatearth,
Thanks to today’s sponsors who made this newsletter possible :)
Attio, Speechmatics, AssemblyAI and Ai Summit.
Wanna partner with us? Last few slots left for the rest of the year.




Love the Newsletter. IDEs feel like they're fighting against the workflow now.
We're building MCP360: a unified gateway that connects AI agents to 100+ tools through a single configuration. Solves the NxM problem—configure once instead of connecting every agent to every tool.
Would love your thoughts if you are open trying it out.