Hey Siri, meet AI
what's the deal with loops
Hey folks,
A lot of chatter about loops on X recently. And it’s a topic I’ve been toying with. My interpretation from what Peter posted is:
Agents are loops, you give it a task, it looks at it with the context it has, uses tools to gather more and gives you an output when it thinks the task is done.
But mimicking those parts into a bigger system so your agents can run more autonomously, longer and on harder chunks of work is what I think ‘designing the loop’ is talking about.
So you want to design a bigger task up front, like a plan.md file with a bunch of tasks, new features to implement, etc. A way for those tasks to be deemed ‘complete, and verified’ - ie. does the report contain all 10 points from the plan, does the UI have all the features working correctly, do all the tests pass. And then prompting itself to go back to the plan.md file and pick up the next one.
I’ve been toying with it for this reference manual - I’m making lots of interactive components, so I’ve tried designing all the component pieces first and then building a workflow to compose those together into the interactive.
But also this is why a ton of people bring skills into their workflows. Do the planning skill, then split tasks with a PRD skill, then research skill for each feature, then building skill, then review skill, then testing skill.
It’s all designing text instructions for an agent to follow, making sure it can access any tools it needs to do the tasks.
Further reading:
Loop Engineering by Addy Osmani
WTF is a loop by Matt Van Horn (although AI-sloppy)
Ben’s Bites is brought to you by Smallest AI
Smallest AI Voice Agents gives you production-ready infrastructure to run inbound & outbound campaigns at scale, powered by its in-house realtime latency STT & TTS and enterprise-grade telephony. Enterprises trust it to handle millions of minutes. See what it can do for yours - book a demo!
Headlines
Apple finally has a dedicated AI product, Siri AI. Imagine about a year-old ChatGPT - with great dictation, image analysis and some interaction with external apps like Messages and Maps. Not bad *if* it works. The new Siri AI uses a mix of local and cloud models (some based on Gemini), all under the AFM 3 model family. These models also power other “AI features” embedded inside apps. I’m keeping an eye out for the one that vibe-codes Safari extensions and Apple Shortcuts using plain English.
ChatGPT’s memory system runs a background process to save memories that you can see and edit. They are calling the latest iteration Dreaming v3, which has better recall, follows your long-term preferences more closely and corrects itself as time passes.
New blogpost from Anthropic claims that developers are writing 8x more code (with Claude’s help) than they were in 2025, and it is now helping train the next versions of Claude. Hence, they advocate for an “option” to pause AI development if the need arises.
OpenAI shared three goals for its next phase: build an automated AI researcher, accelerate the economy and give everyone on Earth a personal AGI. They’ve also filed a confidential S-1 while claiming no urgency for an IPO.
NotebookLM’s core chat is getting upgraded from the old RAG system to an agent-like system (Antigravity harness). Each notebook gets a cloud computer to run code for analysing the files that you’ve uploaded with the latest Gemini 3.5 models.
Your Oura ring scores sleep. Your Apple Watch tracks your heart. Workera Ambient does the same for your career: always-on, capability capture from the work that's already happening. Your data, your choice. Learn more from Workera's CEO and join the waitlist.*
My feed
Cursor’s Canvas lets users spin up internal apps, dashboards and reports that are shareable with others. Another entry in the “Claude Artifacts but 2026” feature from all the coding agents.
Can Claude become a chemist? What about powering agents for biology?
Firecrawl Workflows - installable skills for repeatable web tasks (like deep research, SEO audits and more).
Eloquent - Local transcription app from Google (uses Gemma).
FrontierCode - coding eval to test whether code is actually maintainable rather than just passing tests.
A guide to using /goal in Codex.
Fin Voice 2 - natural, fast and intelligent customer support over the phone.
Raindrop 2.0 - catches a production failure, hands it to your coding agent to fix, and turns it into an eval so it can't recur.
Cognition is guaranteeing up to $10M in credits if Devin underdelivers on an annual enterprise contract.
skills.sh by Vercel now has an API for querying its collection of 600k+ skills.
Upstash Agent Analytics - 3 lines of code to track AI/agent traffic to your website.
Spiral by Every - writing partner for humans and agents with stylometry, CLI, MCP/API, team styles.
Google is making its budget AI plan even cheaper ($7.99/mo to $4.99/mo) while offering 2x the storage space.
Afters
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com








