My cheatsheet for a clean context

fast intelligence, managed infra and desktop apps

Apr 16, 2026

Hey folks

Boarding my flight to SF very shortly, and I got an email to let me know - no WiFi today. Uh oh. I was kinda hoping my 11 hours uninterrupted hours without the kids would be productive for once (I’m usually a very OOO long-hauler, no internet). But I still have some work to polish this talk I’m giving on Tuesday.

I’m also in town looking to deploy $100k cheques to dev tools and infra founders, plus see some of my wonderful LPs and meeting new ones. Ben’s Bites Fund II has already started investing.

So my flight… I’ve had to hurriedly download a few local models so I can use my agents offline and I think, so far, Gemma 4: 26b is going to be my choice.

We’re so spoiled today with fast intelligence at our fingertips and it’s funny how used to the new intelligence levels we get

Local models are slow to boot up (you’ve got to be more mindful of what context is being loaded on startup (so I’m running with no-skills to get it to go faster, I can call the skills when I want — maybe I’d actually prefer to do that generally 🤔). And they feel pretty slow to do work, but only because of said spoils.

I’ve been in the weeds of context management recently because of the course I’m working on. And it’s been useful to just remind myself about how prickly it can be;

If an agent runs web searches - presumably you didn’t read them, its gobbling up context from content you do not know is 1. right, 2. not ai-slop, and 3. by a source you’d recommend.
Little (or big) lines of slop, misdirection, misinformation slip in to the context and compound over time
Reaching ~60% of a context window is probably the limit of where you want to be
Use other sessions as context-gathering sessions, if there’s lots of documents then create one summary file with the information (and try to read or at least skim it! - I am trying, promise)
I don’t trust 1M context windows, there’s a great post by Thariq from Anthropic below about this window. I shouldn’t need my context for my tasks to need perfect recall beyond ~150k tokens, that’s a lot of words. Only until 1M context windows are the norm, the models dont forget anything and help clean polluted context along the way!

Anyway, got to head to the gate! This was a little different of an intro, let me know if you liked it. I need to share more as I’m learning (or diving deeper).

Ben’s Bites is brought to you by Attio, the AI CRM

Honestly, no one gets excited about a CRM. But then they try Attio. It connects to Claude Code and n8n through its MCP server, completely bridging the gap between my customer data and apps. Wait, there's more, like flagging churn risk and turning customer feedback into Linear projects. Try it now.

Headlines

Claude Code’s desktop got a redesign. Brings many CLI-only features and more (like split windows for multiple sessions) to the desktop app. Big improvement, but still a lot is missing. It picks up some CLI sessions but not all, opening/editing files isn’t obvious, and it keeps asking for permission even with “bypass” settings on.
- Gemini also has a native Mac app now. But it’s light on features - no Gems, no notebooks - and the design feels rough to say the least.
New models - GPT-5.4-Cyber from OpenAI, fine-tuned for cybersecurity, with limited access to trusted partners. And Gemini 3.1 Flash TTS from Google - better voices, audio tags for controlling tone and pacing, and 70 languages.
Routines in Claude Code are now in research preview - set up a prompt, a repo, and your connectors once, then run it on a schedule (or via API/GitHub trigger). Runs on Anthropic’s infra, so you don’t need your laptop open. Basically, extended cron jobs. OpenClaw calls these heartbeats.
With the latest update to OpenAI’s Agents SDK, you can run Codex-style agents in production without building the whole harness yourself. You get sandboxed execution, computer-use, skills, memory, and compaction built in.
Most RAG systems return wrong answers with complete confidence. Gauntlet's free Night School covers how production AI engineers actually fix that — setup, evaluation, the full loop. Wednesday, April 22. Register free*

My feed

Skills in Chrome let you save prompts as reusable one-click workflows that run on whatever page you’re viewing.
Cursor can now respond with interactive canvases - dashboards and custom interfaces instead of just text.
Resend shipped a new email editor with BYOA (bring your own agent). There’s a built-in LLM, but you can also MCP into the editor with your own setup.
Sparkle v4 from Every - let AI organise your filesystem like you would.
Daniel pointed an agent at 5 years of home-building emails (511 events, 690 documents, 170 finance records) and got back a full project timeline in ~$500 of Opus tokens.
Impeccable v2 - the design skill for coding agents. v2 adds a CLI scanner (works without an LLM), a Chrome extension, and a /shape command that runs a design interview before writing any code.
Using Claude Code - guide on session management, compaction, and the 1M context window.
30 min tutorial on building software with agents in Cursor.
Lindy AI’s founder says GLM 5.1 will likely become their default over closed-source models for most use cases, saving them a bunch on inference (their biggest cost, more than payroll).
OpenRouter now offers video generation models with one universal API across all video models.
Copilot in Word now tracks changes and leaves comments.
Windsurf 2.0 - Manage all your agents from one place and delegate work to the cloud with Devin.
Gradient Bang - a fun multiplayer game with subagents in space. Built with Pipecat, Supabase, and open-source.