New Opus is Bananas
ChatGPT will go shopping for you
The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks, today’s post is packed.
Everyone’s rushing to launch their thing before American Turkey day.
I built a social tracker over the weekend, design inspired by Linear so i can rip through all social mentions from X/Github/Reddits/any other feed you like. It’s open-source so you can copy it and use it for yourself. Here’s a lil video I recorded of it in action. Built with Factory’s Droid, Gemini did the initial design pass (which was ok-ish), but trusty Sonnet polished the F out of it. (My first time open-sourcing something I’m proud of!)
Google released Nano Banana Pro, i.e. image generation powered by Gemini 3 Pro. Massive upgrade over already amazing Nano Banana (on top of Gemini 2.5 Flash). It’s great at text rendering, uses reasoning to think about styles, visual layout and content inside an image.
Gemini 3 Pro had its charm for a week, but Claude took it back with Opus 4.5. It’s the best coding model in the world (crossing 80% on SWE-Bench Verified) and 3x cheaper than previous Opus models. Opus 4.5’s medium reasoning performs the same as Sonnet 4.5’s best results while using 76% fewer tokens. So, in real use, sometimes using Opus 4.5 might end up being cheaper.
I’m working on a web app for a board game. I tested both Gemini 3 Pro and Opus 4.5 on adding multiplayer room creation to it. First, both of these models gave an almost working solution, whereas older models just didn’t work. Opus was definitely better (and faster) than Gemini, but it still has a few familiar issues. It creates too much duplicate code and very easily falls into the “you’re absolutely right” trap when fixing bugs.
– Keshav
Anthropic also released many product updates with this: “no cutoff” in long conversations on the Claude app, Claude Code on Desktop, Claude in Chrome and Excel, new tools for avoiding the pitfalls of MCP and more.
OpenAI also released two major features in ChatGPT: Group Chats and Shopping Research. Group Chats have a limit of 20 people and still have that beta feel. They were earlier launched in a few countries but are now available globally. Shopping Research is a new mode like Deep Research, powered by a special version of GPT-5 Mini. It promises to find the product on the web for your query, but since it’s a small model, be wary of inaccuracies in details like pricing, sizes, etc. Both these features are available to free and paid users (except teams/enterprises).
Hardcoded permissions slowing you down? As systems grow, hardcoded authorization creates security gaps and dev bottlenecks. This 80-page ebook walks through the move to externalized authorization - with frameworks, code samples, and lessons from teams who’ve made the transition. Download free ebook.*
Factory skills are now live (links to my video)! Skills are essentially instructions + tools to run workflows super easily. I built this browser tool that opens sites, takes actions (clicks, fills in forms etc) and pulls all that back into my chat. I also recently added a DOM picker so i can select elements on a webpage and that code gets sent to the chat so i can identify specific elements to change (instead of saying no not that button, the other blue button)
🌐 What I’m consuming
Why your Voice AI keeps interrupting users (hint: it’s too fast).*
I tested Gemini 3.0 Pro inside Droid and Google Antigravity.
How a global company lets its employees build with 30+ LLMs.
Benedict Evans’ Nov 2025 edition of AI eats the world.
Why we forked Chromium for AI automation.
How to prompt with Gemini 3 to get the best UIs
Building an AI native engineering team.
A first principles deep dive into Claude Agent Skills.
My read for the weekend is this 35k+ words monster from
Research covering all things OpenAI.
⚙️ Tools and demos
Navigator by Yutori - SOTA web-navigating agent with its own cloud browser.
TubeDummies - Turn any YouTube tutorial into step-by-step interactive learning.
paperreview - Get detailed AI feedback on your research paper.
Parallel Extract - Get all content from a URL in markdown, either in full detail or in a compressed form for better token efficiency.
Replit Agent now integrates with Stripe to help you add payments easily.
Silvia – Your personal AI CFO that tracks every account, models “what if” scenarios, and surfaces insights automatically. One place to see and grow your entire net worth.*
🥣 Dev dish
claude-agent-server - Run Claude Agent (the harness behind Claude Code) in a cloud sandbox and control it via WebSocket.
synapse - Fork chats and go down rabbitholes on things without clogging context (demo).
llm-council - Send your response to multiple models and let the council choose the best for you.
MCP servers might soon power interactive user interfaces for hosts with the new proposal of MCP Apps.
🍦 Afters
Valon is hiring Forward Deployed Engineers. $130K–$230K & equity, turning enterprise clients needs into code onsite. NYC/SF/Seattle + travel*
OpenAI for Science released their first paper - 13 examples of GPT-5 accelerating scientific research.
New Anthropic research: if you teach a model to cheat on coding tasks, it learns other bad behaviours too. But if you frame reward hacking as finding loopholes to patch (vs cheating), this generalisation does not happen.
Cline is building cline-bench - A real-world open source benchmark for agentic coding. Plus, there’s a new hard physics benchmark in the market - CritPt.
Grokipedia accepts proposed edits now, Elon’s replacing X’s content moderation with Grok, Replit and Campus Edu have a new 8-week vibe-coding course, and a small model from Microsoft tops computer use charts.
That’s it for today. Feel free to comment and share your thoughts. 👋
Read about me and Ben’s Bites
📷 thumbnail creds: @keshavatearth,
Thanks to today’s sponsors who made this newsletter possible :)
Cerbos, Speechmatics, Silvia and Valon.
Wanna partner with us? Last few slots left for the rest of the year.



