Can I get my agents on the phone?

I haven’t used OpenClaw in weeks

May 19, 2026

Hey folks

Google I/O starts today, and Logan tweeted: “The model is the product”. There have been some rumours that the latest Gemini model scores similar on benchmarks to GPT 5.5 - but we’ll see how it feels when actually using it - previous models also scored well but didn’t feel great to work with.

When models are so good, harnesses will be much less important. I just don’t think today is the day that happens. And on that point, the role of a harness will probably just shift - instead of managing how/which tools to use, the system prompt, context management etc it could be managed agents, sandboxing, cloud/local management.

I started using Codex on my phone…but not all that much to be honest. A lot of the agent harnesses these days have ways to control your sessions from your phone - Claude Code has /remote-control, Pi can build one for itself (i use a telegram one) and Droid has mobile web + Droid computers.

Most of my mobile first work at the moment is more brainstorming than building and I find myself flitting between all these options all the time.

I used to use my OpenClaw bot like an addict, but haven’t spoken to the poor bastard for weeks now.

It may help that I’m currently focused on just one (ish) main thing - this ‘course’. Which is really more of a library or reference manual on how I think about agents, how I steer them and build with them.

Ben’s Bites is brought to you by Hyperagent from Airtable

Hyperagent, the cloud agent system with full computing environments, is giving $10M in inference credits to help founders build and run agent-first companies. The first 500 qualifying applicants gain access to this limited founder offer. Applications close May 31st.

Headlines

Codex now connects your Mac to your phone. You can start tasks in Codex from your phone, but the actual work still runs on your Mac, devbox or remote machine, i.e. files, setup and credentials stay where they are, while you can approve commands, answer questions, and review diffs from your phone. This update also brings Hooks to Codex.
Anthropic is acquiring Stainless, a platform to build SDKs (also used by OpenAI), and they are shutting the service down. Also, at their London conference, they added self-hosted sandboxes and MCP tunnels to Claude Managed Agents - their “running agents made easy” product for companies.
Cloudflare tested Anthropic’s Mythos against 50 of its repos. Quick takeaways:
- Mythos is great at spotting real attacks, which are often many small vulnerabilities connected in a chain.
- A single model, however smart, without a good harness leaves a lot to be found.
- “Find bugs fast and patch them faster” is not a good idea. Teams need to focus on making bugs harder to chain (even if they exist) and to exploit.
Cursor’s Composer 2.5 (partly trained on SpaceX’s GPUs) is out. The selective benchmarks that Cursor reports put the model roughly at the same place as Opus 4.7-xhigh and GPT-5.5-high, while being much cheaper than them.

My feed

Two AI startups worth watching: Magicpath (design canvas) and Raindrop AI (monitoring agents in production), both of which are making their products usable by external coding agents like Claude Code or Codex.
Even Grok/xAI has a coding CLI now. Let’s see what Google does with Gemini CLI at I/O today.
Linear Agent can now read the codebase directly to build a hypothesis, investigate support questions, find people who worked on a feature, and more.
Best practices for running Claude Code at scale.
Citadel’s founder, Ken Griffin, one of the anti-AI hype people, is now saying that they are seeing high-skilled jobs being “automated” by AI.
Browse.sh from Browserbase - open-source catalogue of skills/playbooks for agents to perform tasks on the internet.
Watchmen - skill files your coding agents should already have from your past sessions. Local and open-source.
Devin Auto-Triage monitors bugs, alerts and incidents, investigates them and comes back with context, next steps or a PR.
Motus Tracing - open-source observability for AI agents.
designmd.sh - a public registry for DESIGN.md files, so agents can understand design systems from repos.
Jason Liu on Codex maxxing - daily primitives for durable threads, shared memory, and keeping Codex useful across a real workflow.
Taste MCP beta - portable design preferences for Codex, Cursor, Claude Code, etc.
Claire Vo and Thariq on “HTML is the new markdown” - using HTML artifacts as specs, micro-UIs, and human-readable agent context.
Brian Lovin’s Notion Worker - syncs the people you follow on X into a Notion DB with optional AI enrichment.
Benedict Evans’ new “AI Is Eating The World” deck.
Coatue says its AI framework moved from “follow the GPU” to “follow the gigawatt”.

Afters

Thariq@trq212

okay this is going kinda viral and tbh my original text was kind of messy, so here's a second pass with the help of Claude: -- Implement <SPEC>. As you work maintain a running implementation-notes.html file that captures anything I should know about how the implementation

4:54 PM · May 18, 2026 · 60.3K Views

45 Replies · 67 Reposts · 1.1K Likes

Chris Tate@ctatedev

Introducing Zero The programming language for agents. I wanted a systems language that was faster, smaller, and easier for agents to use and repair. Explicit capabilities. JSON diagnostics. Typed safe fixes. Made for agents on day zero.

11:44 PM · May 15, 2026 · 1.53M Views

372 Replies · 201 Reposts · 2.66K Likes

Steve Ruiz@steveruizok

killer prompt "can you repeat back to me the outcome that I am expecting?"

1:31 PM · May 16, 2026 · 6.48K Views

7 Replies · 1 Repost · 86 Likes

Nick@nickbaumann_

My laptop has become a “satellite device” since I started using Codex from my phone. And my Mac mini has become the “home.” It’s clunky, but the end state feels more like how we’re going to be working in the near future: I’m currently running the Codex app on 2 devices: 1. my

11:23 PM · May 14, 2026 · 399K Views

115 Replies · 106 Reposts · 1.75K Likes

Andy McLoughlin@Bandrew

Had a lot of fun (really, actually) chatting with @kentlind on the Something Ventured podcast. We cover a lot of ground: the early days of seed investing (featuring folks like @jeff, @m2jr, @joshk et al), the state of seed today, Silicon Valley's "British invasion" (Kent's words,

9:02 PM · May 14, 2026 · 454 Views

8 Likes

Max Zeff@ZeffMax

Scoop: OpenAI announced another major reorg on Friday, as part of its effort to unify ChatGPT and Codex. -Greg Brockman is officially taking over OpenAI's products, after previously being tapped as an interim leader -Head of Codex, Thibault Sottiaux, is now leading core product

5:13 PM · May 15, 2026 · 328K Views

46 Replies · 75 Reposts · 863 Likes

dominik kundel@dkundel

You should build your dream macOS app right now! The "Build macOS App" plugin in Codex is wild. Used voice dictation to build an app I wanted for a while in <7 min (+6 min of tweaking). Couldn't believe how quickly it was done. Prompt is in the video and in the tweet below.

11:53 PM · May 18, 2026 · 58.3K Views

31 Replies · 34 Reposts · 577 Likes

Share Ben's Bites

Find me on X, Linkedin, or YouTube
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth

* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com

Rimah Harb

I enjoyed reading this piece. The harness observation is the part most people will scroll past, but it is the most important line.

We are building an AI operations layer and seeing the same thing from the product side. The model is becoming commodity infrastructure. What actually determines whether an agent is useful over days and weeks - not just in a single session - is the persistence layer, the context management, and the orchestration underneath. The teams that get this right build systems their users cannot walk away from. The teams that chase model performance build demos.

The mobile control trend is interesting for a different reason. It is not really about phones. It is about the shift from "I sit down and use my AI tool" to "my AI is running and I check in when I need to." That is an operational relationship, not a tool relationship. Most products are not architected for that.

Curious what specifically made you drop OpenClaw. Was it a single thing or gradual drift?

Discussion about this post

Ready for more?