Just use GPT-5.4 xhigh

workshop recording inside

Mar 10, 2026

Hey I’m Ben. I build stuff with agents, even though I’m not technical. Here’s all the stuff I’m reading and tinkering with. If you want to start building or level up your ‘vibe-coding’ skills, join our community.

Hey folks,

The ‘become a builder’ workshop last week went well-ish 😊 (Codex crapped out on us). The recording is available, but I’m working on a thorough guide to cover everything properly (plus the bits we didn’t get to cover). I’m ~50% through it so hope to have it out this week.

Also, Factory is hosting a hackathon this thursday, everyone gets 200M tokens, and a mac mini is on the line.

OpenAI released GPT 5.4 in “thinking” and “pro” variants. It brings the coding power of GPT-5.3-Codex to the main model series, with better vision, tool use efficiency and a context window of 1M tokens. It’s now much better at computer use (see demo) and financial tasks. It’s also a bit more expensive vs GPT-5.2 ($1.75/$14 → $2.5/$15 per million input/output tokens). OpenAI expects to keep this naming and capacity difference between instant models (GPT-5.3 Instant) and reasoning models moving forward.

🌐 What I’m consuming

Cursor’s third era - Cloud agents have overtaken tab autocomplete in the IDE.
a16z’s sixth edition of Top 100 consumer AI apps.
Why is everyone in AI talking about filesystems?
I was a 10x engineer. Now I’m useless.
Building for trillions of agents - They will need their own infra, access to files, identities, while maintaining security, compliance, and governance.
How OpenAI uses skills to maintain open-source repos for Agents SDK.
The next $1T company will be a software company masquerading as a services firm.
Using claude code as the chief of staff for a boutique consultancy.

⚙️ Tools and demos

Cursor Automations - Build always-on agents. Run them on a schedule or use events (like Slack messages) as a trigger.
T3 Code - Desktop app to use Codex CLI (alternative to Codex app). nice and smooth to use, still feels alpha though (because it is).
Handles by here.now - Personalised sub-domains for everything you publish with your agent.
Copilot Cowork - Handoff tasks to agents with the ability to work across your Microsoft 365 apps.
Air by JetBrains - Agentic dev environment built for working with agents from different vendors.
Clawcard - A real inbox, a phone number, and a credit card your agents can’t abuse.
21st Agents - Infra for adding agents to your app—runtime, sandboxing, billing, UI, streaming and more. Also see: Terminal Use (very similar, YC W26).
Code review tools:
- Warden by Sentry - Set of skills to review every PR on your codebase.
- Vet by Imbue - Fast and local code review tool to make sure the agent followed your instructions.
- OpenReview - Open-source, self-hosted AI code review bot powered by the Vercel AI Cloud.

🥣 Dev Dish

Notchi - Cute little Tamagotchi that lives in your notch. It cries when you yell at claude and gets happy when you praise it.
Context Hub - An open tool that gives your coding agent the up-to-date API documentation it needs. (read more)
Agent Safehouse - macOS-native sandboxing for local agents.
Flue by Astro - A framework to build sandboxed AI agents and CI workflows.
slacrawl - Get your Slack data locally with or without API keys.
claude-replay - Turn claude code session transcripts into self-contained, embeddable HTML replays.
executor - Local-first execution environment for AI agents. (read more)
agent-coworker - Agent backend that you can use from a terminal or a desktop app.
agent-kanban - VS Code extension that provides an integrated kanban board to manage coding agent tasks.
Fractals - A tool to break down tasks into subtasks on repeat, let agents complete them and manage the entire process.
Uithub is now open-source. Turn GitHub repos into LLM-ready context.
shadcn/cli v4 - comes with skills, presets, dry-run, monorepo and more.
Experimental UI to fork convos and explore side tangents without interrupting the main thread. (read more)
An agent skill to help you write smarter, simpler, and more modern SwiftUI.
Making OpenClaw and Codex app talk to each other using ACP.

🍦 Afters

MultiGen - new research from Google and Stanford to make level design possible for “generated” multiplayer games.
Opus helped the Mozilla team find 22 vulnerabilities in Firefox in just two weeks.
PinchBench - ranking the models based on tasks completed successfully on an OpenClaw setup.
Databricks’s research team trained KARL - Knowledge Agents via Reinforcement Learning to create faster and low-cost alternatives to frontier models for document-centric tasks. (tech report)
Anthropic is suing the DoD to block its supply chain risk designation, calling it unlawful. Meanwhile, the White House is preparing an executive order to formally ban federal agencies from using Anthropic’s tools.
OpenAI’s head of robotics, Caitlin Kalinowski, resigned, citing concerns with surveillance/weapons concerns after the DOD contract.

Enjoy this newsletter? Forward it to a friend.

That’s it for today. Feel free to comment and share your thoughts. 👋

Find me on X, Linkedin, or Instagram
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth

* sponsors who make this newsletter possible :)
Wanna partner with us for March? Last few slots available

Discussion about this post

Ready for more?