Build tools, to build more
Codex Sites and open models
Hey folks,
I’m making progress on my agents manual! I think I finally figured out how I want the thing to look and feel.
I’ve built and rebuilt this damn thing so many times in this process, which is actually part of the process. I am a lazy workaholic (h/t Rick Rubin) - I have to spend time in the work, even if it feels like it’s not going anywhere, until ‘suddenly’ things click.
Whilst in the process, you find yourself wanting tools to exist to make things easier for yourself—that’s a huge part of why learning agents and how to steer them is so good.
You can build tools to enable you to build things.
I spun up this tool before bed last night where I can comment/delete on copy whilst I’m building, which I copy as one big block as agent feedback.
Ben’s Bites is brought to you by Attio
Attio is the CRM for the new way of GTM. Get agents working on every account, surfacing opportunities, and handle the work that used to take your team days. Open your inbox, the follow-ups are drafted. Walk into a meeting, you're already briefed. Got a question, just Ask Attio. Start for free today.
Headlines
Codex has two new additions: Plugins and Sites. Plugins are pre-built collections of skills, connectors to relevant apps (like Figma for designers) and instructions tuned for specific roles like data analysis and product design. Sites lets users create a shareable website/app with a database, file storage, env vars, access controls, and more. Initially only available to business and enterprise users.
A bunch of new open models released recently -
Gemma 4 12B - Multimodal (i.e. accepts images and audio as input) and performs nearly as well as the two-month-old 26B variant.
Ideogram 4.0 - 9.3B model for image generation. Trained on JSON prompts for control over the layout, colours and text for each element on the image. Also check Reve 2.0 for the focus on layout of elements in an image (but it’s closed-source).
Miso One - 8B text-to-speech model claiming expressive speech with 110ms latency.
Also, just like Cursor’s Composer, more companies are trying out fine-tuning big open-weights models for their domain-specific work. Latest entry → Harvey got a Kimi 2.6 agent to beat Opus 4.7 on its legal benchmark at ~11x lower cost.
Microsoft Scout is an always-on Microsoft 365 agent built on OpenClaw (reminder: openclaw is open-source). Different approach from what Google is doing with Gemini Spark.
Ramp Stack - An accounting assistant that helps with month-end close work: reconciling accounts, preparing schedules/accruals and more with reviewable sources. They also published a nice blog post explaining their efforts to benchmark Stack against other frontier models.
Financial fraud is evolving fast. It’s time to fight back—with AI. Read MIT Technology Review and Plaid’s report to see how technology is reshaping financial defenses. Learn more and see how smarter tools and industry collaboration can help fight against the rise of fraud. Read the report.*
My feed
Smallest AI lets you deploy voice agents at scale, powered by realtime STT & TTS and production-ready telephony infrastructure.*
Bloom turns your brand assets, site, decks, Figma and socials into a callable system that agents can use via API/MCP to generate on-brand assets.
Windsurf is now Devin Desktop. It manages fleets of local and cloud agents from the editor. Nous also released a desktop app for its CLI agent Hermes.
Hallmark v1.1 - open-source design skill for coding agents.
40% of Cursor's internal PRs now coming from cloud agents. More deets in their post about lessons from building cloud agents.
Factory Router routes each agent session to the right model and keeps near-frontier performance at 20-25% lower cost in its benchmarks.
The next frontier of visual AI is code.
ViBench - benchmark from Replit with tasks focused on end-to-end app creation; Opus 4.8 beats GPT-5.5 on price/performance for vibe coding.
Skills for macOS - app for browsing and editing local skills, MCP configs and plugins.
Ollie - AI assistant for parents to manage the chores to free up time for family.
Television - visual workspace for personal agents. Notion-like kanban board vibes but with each tile attached to an agent.
Building software is learning - it’s an iterative process that will run into questions and obstacles. You should want that to happen as fast as possible.
A functional taxonomy of world models.
Collection of agentic engineering hacks for June 2026.
Modern Engineering Values - a workflow and engineering values built after shipping several mostly or fully AI-written projects.
SDKs I’ve come across:
Email SDK - unified API for sending emails. works across multiple providers.
storagesdk - object storage with snapshots and forks.
Afters
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com












