Anthropic built a model too risky to release
and Meta makes an unexpected entry
Hey folks, Keshav here. Ben is at AI Engineer this week, so I’m covering the intro.
A mis-timed blog last week leaked Anthropic’s next model - Claude Mythos. Well, it is real and has massive improvements on benchmarks over Opus 4.6:
53.4% → 77.8% on SWE-bench Pro
65.4% → 82% on Terminal-Bench 2.0
but we are not getting access to it anytime soon. Why? because it is really good at finding and exploiting software vulnerabilities. On Firefox exploit generation, Opus managed 2 working exploits out of hundreds of attempts. Mythos hit 181.
It found many-decades-old bugs in critical software projects like OpenBSD (27-year-old bug), FFmpeg (16-year-old bug) and more.
Instead of releasing it publicly, Anthropic is giving 12 companies access to a preview version of Mythos under “Project Glasswing” to find vulnerabilities in critical software. Anthropic is committing $100M in model usage credits and $4M in donations to open-source security orgs under this project.
Theo made a video on this, and I like his point: “Mythos is to Opus what Opus is to Sonnet.”
I tweeted a list of companies that Meta has acquired in the past year without anything to show for it, and soon after, Meta released details about their latest model - Muse Spark. At a glance, it sits somewhere between Sonnet 4.6 and Opus 4.6. Not usable yet: API access is coming, and there are promises about open-source too (rip llama).
Many people are dunking on Meta for its not-so-frontier model release after spending billions and a year of silence, but I think it’s a good step ahead. Plus, have you used Instagram search over the past couple of months? It’s gotten really good courtesy of AI.
As always, good recap from Ethan Mollick on the state of frontier models: Google, OpenAI and Anthropic lead, Meta joins the pack for now while xAI has fallen off, and the best Chinese models are still 7-9 months behind.
ps: Factory’s desktop app is now out of beta. It comes with a cloud computer, the ability to use other apps on your device, and, of course, the ability to run and manage multiple Droid sessions easily.
Ben’s Bites is brought to you by Attio, the AI CRM
Honestly, no one gets excited about a CRM. But then they try Attio. It connects to Claude Code and n8n through its MCP server, completely bridging the gap between my customer data and apps. Wait, there's more, like flagging churn risk and turning customer feedback into Linear projects. Try it now.
Headlines
Claude Managed Agents - You can use Claude’s developer console to build and deploy agents and let anthropic handle the infra for it, vs building it yourself. For example, Notion is using managed agents to build a “delegate tasks to Claude” feature. (Anthropic’s engineering blog on building this).
Cursor has a new design mode to annotate and target UI elements in the browser. Plus, run Cursor on any machine and control it from anywhere, including your phone.
Gemini app finally has projects - they call it notebooks. Similar features as Claude/ChatGPT projects - move chats in/out of notebooks, notebook-specific files and memories, with the additional feature to sync these notebooks between the Gemini app and NotebookLM.
Clicky is an ambient AI buddy on your Mac. It sees your screen, talks to you and points at things to guide you (demo). Farza built (and open-sourced) it as a learning tool, but people are using it for everything.
Choosing an accurate speech-to-text model is harder than it looks. Benchmarking one is even harder. See why standard word error rate falls short, and what better STT evaluation actually looks like.*
My feed
Chronicle: Cursor for slides. Never build a deck from scratch again. Turn ideas into stunning presentations in minutes.*
OpenRouter Spawn - Deploy OpenClaw and other agents to the cloud of your choice. Works with all models on OpenRouter.
Zapier’s SDK is now open to everyone. Programmatic access to all of Zapier’s capabilities. Free to use in beta. (docs)
Kiro.dev (spec-driven IDE from Amazon) is bringing its startup credits program back for startups with up to 30 people.
Cogito - Markdown editor for Mac. I’ve been using Clearly (recently updated) for the last few weeks to simply view and edit md files.
Graphify - Turn any codebase or folder into a queryable knowledge graph.
Pi and Mario (the maker of Pi) are joining Earendil, the company by the creator of Flask. The core harness stays open-source. New features will be a mix of enterprise & fair source (proprietary now, open-source later).
Impeccable - Free design skills for coding agents with 21 commands to audit and fix common mistakes.
Superset and Builder 2.0 - two new UIs for running parallel agents. Superset is more like Codex (terminal-first, worktrees), Builder is more kanban-style with Slack/Jira integration.
CSS Studio by Motion - Make design changes by hand on your website in the browser, then pass them over to your agent for implementation.
S3 Files from AWS allows storing data as a file system, making it easier for agents to use.
Every is running two parallel org charts - one for humans and another for each employee’s openclaw agents.
Afters
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com







