Can I get my agents on the phone?
I haven’t used OpenClaw in weeks
Hey folks
Google I/O starts today, and Logan tweeted: “The model is the product”. There have been some rumours that the latest Gemini model scores similar on benchmarks to GPT 5.5 - but we’ll see how it feels when actually using it - previous models also scored well but didn’t feel great to work with.
When models are so good, harnesses will be much less important. I just don’t think today is the day that happens. And on that point, the role of a harness will probably just shift - instead of managing how/which tools to use, the system prompt, context management etc it could be managed agents, sandboxing, cloud/local management.
I started using Codex on my phone…but not all that much to be honest. A lot of the agent harnesses these days have ways to control your sessions from your phone - Claude Code has /remote-control, Pi can build one for itself (i use a telegram one) and Droid has mobile web + Droid computers.
Most of my mobile first work at the moment is more brainstorming than building and I find myself flitting between all these options all the time.
I used to use my OpenClaw bot like an addict, but haven’t spoken to the poor bastard for weeks now.
It may help that I’m currently focused on just one (ish) main thing - this ‘course’. Which is really more of a library or reference manual on how I think about agents, how I steer them and build with them.
Ben’s Bites is brought to you by Hyperagent from Airtable
Hyperagent, the cloud agent system with full computing environments, is giving $10M in inference credits to help founders build and run agent-first companies. The first 500 qualifying applicants gain access to this limited founder offer. Applications close May 31st.
Headlines
Codex now connects your Mac to your phone. You can start tasks in Codex from your phone, but the actual work still runs on your Mac, devbox or remote machine, i.e. files, setup and credentials stay where they are, while you can approve commands, answer questions, and review diffs from your phone. This update also brings Hooks to Codex.
Anthropic is acquiring Stainless, a platform to build SDKs (also used by OpenAI), and they are shutting the service down. Also, at their London conference, they added self-hosted sandboxes and MCP tunnels to Claude Managed Agents - their “running agents made easy” product for companies.
Cloudflare tested Anthropic’s Mythos against 50 of its repos. Quick takeaways:
Mythos is great at spotting real attacks, which are often many small vulnerabilities connected in a chain.
A single model, however smart, without a good harness leaves a lot to be found.
“Find bugs fast and patch them faster” is not a good idea. Teams need to focus on making bugs harder to chain (even if they exist) and to exploit.
Cursor’s Composer 2.5 (partly trained on SpaceX’s GPUs) is out. The selective benchmarks that Cursor reports put the model roughly at the same place as Opus 4.7-xhigh and GPT-5.5-high, while being much cheaper than them.
My feed
Two AI startups worth watching: Magicpath (design canvas) and Raindrop AI (monitoring agents in production), both of which are making their products usable by external coding agents like Claude Code or Codex.
Even Grok/xAI has a coding CLI now. Let’s see what Google does with Gemini CLI at I/O today.
Linear Agent can now read the codebase directly to build a hypothesis, investigate support questions, find people who worked on a feature, and more.
Best practices for running Claude Code at scale.
Citadel’s founder, Ken Griffin, one of the anti-AI hype people, is now saying that they are seeing high-skilled jobs being “automated” by AI.
Browse.sh from Browserbase - open-source catalogue of skills/playbooks for agents to perform tasks on the internet.
Watchmen - skill files your coding agents should already have from your past sessions. Local and open-source.
Devin Auto-Triage monitors bugs, alerts and incidents, investigates them and comes back with context, next steps or a PR.
Motus Tracing - open-source observability for AI agents.
designmd.sh - a public registry for DESIGN.md files, so agents can understand design systems from repos.
Jason Liu on Codex maxxing - daily primitives for durable threads, shared memory, and keeping Codex useful across a real workflow.
Taste MCP beta - portable design preferences for Codex, Cursor, Claude Code, etc.
Claire Vo and Thariq on “HTML is the new markdown” - using HTML artifacts as specs, micro-UIs, and human-readable agent context.
Brian Lovin’s Notion Worker - syncs the people you follow on X into a Notion DB with optional AI enrichment.
Benedict Evans’ new “AI Is Eating The World” deck.
Coatue says its AI framework moved from “follow the GPU” to “follow the gigawatt”.
Afters
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com












I enjoyed reading this piece. The harness observation is the part most people will scroll past, but it is the most important line.
We are building an AI operations layer and seeing the same thing from the product side. The model is becoming commodity infrastructure. What actually determines whether an agent is useful over days and weeks - not just in a single session - is the persistence layer, the context management, and the orchestration underneath. The teams that get this right build systems their users cannot walk away from. The teams that chase model performance build demos.
The mobile control trend is interesting for a different reason. It is not really about phones. It is about the shift from "I sit down and use my AI tool" to "my AI is running and I check in when I need to." That is an operational relationship, not a tool relationship. Most products are not architected for that.
Curious what specifically made you drop OpenClaw. Was it a single thing or gradual drift?