I signed up for another SaaS
new software benchmark
Hey folks,
“SaaS may be dead” - me, on tuesday
Just signed up for another SaaS tool - me, yesterday
I’m trying really hard to make interactive components for this course/reference manual I’m making. So you as a user can feel the concepts, to help understand them.
I’ve tried so many models, tools and ways to try and develop my own component styles that look good and feel right. And I think I finally found it…
I tweeted my frustrations and Pietro, who I met at OpenAI’s Dev Day last year reminded me to try Magic Path. You can have multiple agents generating design assets, components, animations, whatever on a big shared canvas.
I gave it a go on a fun experiment first and it generated some pretty awesome mechanical-style components.
So now I have an actual workflow and tools to generate all the components I’m after. I can play with different styles and tweak the smaller parts of the components - the buttons, prompt input box, etc.
So I blew through the Magic Path free plan pretty quickly and then promptly signed up for a pro plan 😬.
Ben’s Bites is brought to you by Palabra.ai — Real-Time Voice AI Translator
9.3× cheaper than a human interpreter. Palabra.ai delivers real-time voice translation in 60+ languages for calls, events and streams – or embed into any app via API. Trusted by DHL, UNICEF, Paramount, BCG and Deloitte. Try it free.
Headlines
Claude Code now has a security plugin that checks code as Claude writes it and warns when it spots common risky patterns, like unsafe command execution, insecure HTML handling, or dangerous Python code.
DeepSWE tests agents on 113 original long-horizon tasks across 91 active repos and five languages. Prompts are shorter than SWE-bench Pro, but the fixes are much bigger: 668 lines and seven files on average. Current leaderboard: GPT-5.5 70%, GPT-5.4 56%, Claude Opus 4.7 54%, Claude Sonnet 4.6 32%.
From the board to building the Software Factory. It doesn’t happen that often, but I’ve seen it a few times recently - investors in a company leave to join the company that they backed. Madison joining Factory is a big signal and a great addition to the team. If you remember, I am an investor in Factory who joined last year but I left earlier this year due to our army of young children running my life, leaving less and less time for work work. 3 under 3 is still A LOT of work 😅.
My feed
Software after software - this was a great read, highly recommended. Thorsten runs the coding harness, Amp and always has great takes on the space.
Clanker - A word for the machine.
Mainframe - turn the work done by your agents into short recap videos for your team.
Granite - long-term document for all your files. Drop them in without any tagging/folders, and later search for them in plain English.
Surya OCR 2 - 650M parameter OCR/document model.
Claude Code trick for non-technical tasks - put a bunch of files in a folder, then tell Claude Code it can write scripts and make HTML.
Ramp used 10,000 home-grown security agents to find, validate and patch nearly 100 security issues in six days, with humans reviewing PRs before merge.
Slippery Slope - It’s easy to let agents get in between a person and their craft.
Supermemory - Building blocks for adding context to your agents.
Cursor trained Composer 2.5 by doing RL inside the actual Cursor harness.
Polar - fine-tune a model with your agent harness as the training environment with no code changes.
Parse 2.0 - the most accurate document parsing API in the world.
OpenAI are slurping up a ton of talented builders, most recently, Eric who built RepoPrompt. Great get for OAI, and congrats to Eric 😊
Auto-review skill for your agents, from the power-house shipper Peter Steinberger.
howtoeval - the no-bullshit guide to eval’ing AI agents.
Afters
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com










