A peek inside CLI tools

No more funny videos at OpenAI

Mar 26, 2026

Agents are LLMs with tool-use. They don’t just respond to you, they can go and do things for you. But what does ‘tool-use’ actually mean? What tools?

The most common tools are in the form of CLI. Agents communicate in text, CLIs are text in/text out, so it’s a natural fit. A CLI is a text-based way to control software. You type a command, something happens.

Here’s a simple example - organising files, using the bash tool.

"Rename all 400 product photos to match our SKU format, resize them to 1200x1200, and sort them into folders by category."

First the agent lists files to understand what it’s working with.

COMMAND:
ls ./product-photos/

OUTPUT:
file_1.jpg
file_2.jpg
file_3.jpg ... (400 files)

’ls’ is the command for ‘list’
./product-photos/ is where the files are located

Then creates the folder structure

COMMAND:
mkdir -p ./output/{shoes,bags,jackets,hats}

OUPUT:
output/
├── shoes/
├── bags/
├── jackets/
└── hats/

‘mkdir’ is the command for ‘make directory’ (directory is a folder), here it’s creating 5 - output, output/shoes, output/bags, output/jackets, output/hats
flags modify what a command does: -p here means ‘create any missing parent folders too.’ So if ./output/ doesn’t exist yet, it’ll make that too

Then resizes the images

COMMAND:
mogrify -resize 1200x1200 ./product-photos/*.jpg

OUTPUT:
400 images resized ✓

mogrify is an image-editing tool that edits files in place (overwrites the originals)
*.jpg is a wildcard pattern meaning “all files ending in .jpg”.

Then renames and sorts each file

COMMAND:
mv IMG_0291.jpg ./output/shoes/SKU-1042-BLK.jpg
mv IMG_0292.jpg ./output/bags/SKU-2187-TAN.jpg
mv IMG_0293.jpg ./output/jackets/SKU-3301-NVY.jpg
... repeats for all 400 files

OUTPUT:
400 files renamed and sorted ✓

'mv' is the command for 'move' (which also renames the file when you move it to a new name)

Then it verifies the result before sending back to you

COMMAND:
ls -R ./output/ | head -20

output/shoes/
SKU-1042-BLK.jpg
SKU-1043-WHT.jpg
SKU-1044-RED.jpg
...(112 files)
output/bags/
SKU-2187-TAN.jpg
SKU-2188-BLK.jpg
...(89 files)

On ‘ls -R ./output/ | head -20’. The | sends the output of one command into another, ‘head -20’ just means 'show me the first 20 lines.

It does all this in seconds. It would take you a couple of hours manually.

This is one CLI, called bash, the general-purpose command line that comes with your computer. But there are purpose-built CLIs for specific jobs too:

Stripe CLI — pull revenue data, manage subscriptions, test payments
Playwright — control a web browser: navigate, click, fill forms, take screenshots
AWS CLI — spin up servers, manage databases, scale infrastructure
Vercel CLI — deploy a website live in one command

Each of these is a separate tool an agent can use. The file organising example used one tool (bash). But give an agent the Stripe CLI too and now it can pull your revenue numbers. Add Playwright and it can browse the web. Add Vercel and it can deploy what it builds.

That’s what “tool use” means. The more CLIs you give an agent access to, the more it can do. Your job is to make sure it has the right ones for the task.

It all sounds a bit technical, and it is, but you’d only see those raw commands if you’re using a terminal or watching them fly by in tools like Claude Code. They’re present even when you don’t see them.

If an agent like Cowork is doing a task, you can click to expand what it ran and see the detail — like this example listing files to find recent fund updates.

Every agent is running commands like this under the hood. The interface just hides and abstracts them away.

Headlines

Claude Code launched auto mode, a middle ground between manually approving every action and skipping all permissions dangerously (how they designed it). Claude connectors for work tools are now available on mobile too. They are also cooking something called auto-dream for compacting memory overnight. Claude Code can now use iMessage to text you and others. (see docs)
Sora is shutting down. OpenAI is killing its standalone video generation app along with the API. Its $1B deal with Disney is also cancelled as a result. The Information reports that OpenAI is culling its side projects and focusing on a few key bets, with a new model codenamed Spud.
ARC-AGI-3 launched with 135 mini games, nearly 1K levels, all human-solvable. But all models, when given basic prompts, score less than 1%. They have 25 games publicly available to play (as humans) and don’t tell anyone that I spent 4 hours on them yesterday.
Google released the Pro version of Lyria 3, extending the music generation from 30 seconds to 3 minutes. It’s available in both the Gemini App and AI Studio for developers.
The Figma canvas is now open to agents. You can now use AI agents to design directly on the canvas using the new use_figma MCP tool.
Why Portkey is making its latest Gateway launch completely open source.*

My feed

Chronicle – Cursor for slides. Turn ideas and notes into stunning, professional decks in minutes.*
Paper Snapshot - Snapshot your live website and paste it into Paper as editable HTML/CSS layers.
Ghostwriter by Sierra - Chat with an agent to build more agents.
Mario, founder of the popular open source agent Pi, wrote a post yesterday, “Thoughts on slowing the fuck down“, that says software quality appears to be declining as more companies rely on agents.
Building CLIs for agents - Eric from Cursor wrote a thread on making CLIs that actually work for agents. ElevenLabs has already made their CLI agent-friendly using these tips.
Building deep research that works from your CLI with BrowserBase. (resulting code)
Hark – New AI lab from Brett Adcock (yes, the Figure robotics guy). 8 months in stealth, focused on "the most advanced personal intelligence" paired with next-gen hardware.
GitHub has been going down wayyy too often these days. Plans to fix it and alternatives are starting to show up.
How USV built a team of internal agents that live in their group email threads and learn from team feedback.
Feynman - Read papers, research and get cited meta-analysis for your question from your CLI.
Brave registered the .agent TLD and is making it a community effort. I tried to reserve 10 domains 😬
Lil Agents – Tiny AI companions that live above your dock. Each one has its own Claude session and mini window. Now open source. Adorable.

Afters

Ben Tossell@bentossell

merch gifts have gone up a level ty @OpenAI

1:12 PM · Mar 26, 2026 · 620 Views

6 Replies · 5 Likes

maria@maria_rcks

Since we all know that terminals are made for complex UIs... I decided to make T1Code (1T, because a terminal is all you need). I know @theo really likes this kind of complex UI right on the terminal... so lets hope he likes it!

9:53 PM · Mar 25, 2026 · 105K Views

53 Replies · 26 Reposts · 812 Likes

Cursor@cursor_ai

Cursor cloud agents can now run on your infrastructure. Get the same cloud agent harness and experience, but keep your code and tool execution entirely in your own network.

cursor.com

Run cloud agents in your own infrastructure · Cursor

6:32 PM · Mar 25, 2026 · 121K Views

92 Replies · 113 Reposts · 1.76K Likes

Sam Altman@sama

AI will help discover new science, such as cures for diseases, which is perhaps the most important way to increase quality of life long-term. AI will also present new threats to society that we have to address. No company can sufficiently mitigate these on their own; we will

5:01 PM · Mar 24, 2026 · 913K Views

1.67K Replies · 554 Reposts · 6.62K Likes

Aiden Bai@aidenybai

Introducing Expect Let agents test your code in a real browser 1. Run Claude Code / Codex to QA your app 2. Watch a video of every bug found 3. Fix and repeat until passing Run as a CLI or agent skill. Fully open source

4:06 PM · Mar 25, 2026 · 407K Views

162 Replies · 216 Reposts · 3.26K Likes

Sawyer Hood@sawyerhood

Introducing the new dev-browser cli. The fastest way for an agent to use a browser is to let it write code. Just `npm i -g dev-browser` and tell your agent to "use dev-browser"

4:27 PM · Mar 25, 2026 · 463K Views

94 Replies · 176 Reposts · 1.87K Likes

dotta@dotta

Announcing companies.sh - the open standard for Agent Companies Import and run entire companies with a single command Just run `npx companies.sh add <repo/company>` More 👇

4:12 PM · Mar 25, 2026 · 130K Views

108 Replies · 114 Reposts · 1.16K Likes

Daniel Griesser@DanielGri

I updated my interactive subagents to free up the main agent to be interactive as well (basically /btw but just a normal continuation) and the subagent asynchronously returns its result to the starting session github.com/hazat/pi-inter…

10:50 AM · Mar 24, 2026 · 27.2K Views

12 Replies · 15 Reposts · 225 Likes

Share Ben's Bites

Find me on X, Linkedin, or YouTube
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth

* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com

windflash

May 11

The bash example with the 400 product photos really clicked for me — it made 'tool use' feel concrete instead of abstract. The point about layering CLIs (Stripe + Playwright + Vercel) to expand what an agent can do was the real takeaway. Looking forward to more breakdowns like this. https://landman.blog

Alex Johnson

Mar 29

Booklet AI helps turn prompts into researched A4 booklets: https://bookletai.org

Discussion about this post

Ready for more?