GPT-5.6 is here but...
Codex is still underrated
Hey folks, I’m back from a long weekend in Greece and have to tell you about a company I backed a couple years ago that’s coming out of stealth.
Training got all the attention for the last few years. But in 2026, the real fight is serving intelligence: lower latency, lower cost, less power, more tokens. The market is screaming for better inference systems, and we’ve never seen a market like inference. So much so, people can’t sleep thinking about it (see the tweet in afters 😂).
AI inference could be the biggest market in history and the bottleneck is inference hardware. Etched built the perfect product. Pulling it off will change how models run.
Etched are building frontier inference clusters with extreme vertical integration: chips, racks, software, manufacturing, and production co-designed end to end. First product in under three years, while most hardware peers took 7+.
Their traction is already insane: $800M raised, $1B+ in backlog orders, production underway with TSMC, and first chips working on the very first try (A0) on TSMC 4nm. A0 means the first physical version of a chip/system design that comes back from the fab. So yeah, A0 success is hard. Doing it this fast is wild.
I backed Gavin in 2023, as a 21-year old Harvard dropout, because he had first-principles technical conviction and the ability to recruit the most insane team who had actually built the last generation of compute infrastructure. Now Etched is 400+ people from NVIDIA, Google TPU, Broadcom, SK Hynix, TSMC and basically every serious AI chip program.
They’re backed by VentureTech Alliance (which has a strong partnership w/ TSMC), Peter Thiel, Jane Street, Two Sigma, Jump, HRT, Stripes, Ribbit and more.
As we’re seeing, who wins in AI won’t just be decided by who has the best models. But by who can serve them.
Ben’s Bites is brought to you by Render
Chainable compute. Right on queue.
Define tasks with Render’s lightweight SDK and chain them into long-running, distributed workflows. Launch your agents and batch jobs on demand. Render Workflows handles queuing, orchestration, and retries.
Try it: Use code RENDER-BENSBITES for $50 in credits.
Headlines
OpenAI has GPT-5.6, but its launch is also blocked by the US govt. Only “select partners” are getting access to GPT-5.6 Sol, Terra and Luna, three new models in the family. Sol is the biggest/smartest one of the bunch. It bypasses Mythos in certain benchmarks but is conveniently a little shy of Mythos-level in exploiting cybersecurity bugs. Sam Altman claims that GPT-5.6 will be available to regular users soon—though it might be US only to begin with, even if he’s working hard for worldwide.
OpenAI also released an economics paper based on the adoption of Codex inside & outside OpenAI. Non-technical adoption is quickly matching engineering departments. The lead of the Codex app was on Lenny’s podcast, where he talked about designing the app and the features that matter.
Cursor for iOS lets you launch always-on cloud agents from your phone & remote-control agents running on your computer. Composer 2.5 is 75% off in the app through July 5.
X launched a hosted MCP so Grok, Cursor and other MCP-compatible tools can connect to the X API and X developer docs without building the server yourself.
How do you build AI that actually understands you? Working Smarter, a podcast from Dropbox about AI and modern work, is back for season three. From context engineering to multimodal search, hear how engineers are building AI that works wherever you do. Listen to the first episode now.*
My feed
Replit has a desktop app now for both Mac and Windows.
US national design studio created Rampart - a 14.7MB in-browser ML model to redact PII before it’s sent to a server.
In-app custom agents work for regular users; power users want skills they can plug into their own agents to use your product.
Human in the /loop - how do you reliably get in and out of the way of agents?
Zaro - build live apps, agents and workflows from Slack, email, docs and calendars.
Unpeel - native Mac terminal for agents with persistent sessions and git worktrees. (demo)
Tau - educational agent harness for building TUIs, extensions and harnesses.
Inference.net lets you test GLM 5.2 on mirrored prod traffic before switching models for your live users.
Odessia Travel - opinionated AI trip planner that searches and books flights, stays and activities.
smolmachines - spin up hardware-isolated Linux microVMs.
Animation vocabulary - skill for asking AI for motion with words like morph and rubber-band.
Every time you think you need a dashboard, ask your agent to make a throwaway HTML.
Teaching agents product design at Vercel.
MCPs, APIs and CLIs - These are not three separate concepts.
New set of shadcn components for building chat interfaces - streaming chat, scrolling, messages and attachments.
Afters
Read about me and Ben’s Bites
📷 thumbnail via @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com





