One breach after another
separate and sandbox your agent's access
Security issues are popping up all over.
Railway accidentally let unauthenticated users access data that should’ve been behind an auth wall.
Mercor AI has allegedly been breached.
Claude Code’s source code has been leaked - and the community are going crazy saving copies on GitHub. Docs from the codebase.
And, Axios, with 100M weekly installs got compromised through package manager, npm when one of the lead maintainers’ GitHub accounts was hijacked. npm has removed the malicious versions now. (Claude code uses axios)
Like manufacturing, code has a supply chain.
When working with software, it relies on other software. Instead of writing all the other code into your project, you install a package through a package manager - agents do this very frequently on your behalf.
One package, Axios, was compromised, which means if an agent (or you) ran the install command, a malicious package is now on your computer.
This will stress the importance of sandboxes. Tools like Claude Cowork and Codex do this for you by running commands in a sandbox, a computer with a copy of your current folder isolated from your computer. So if any bad code sneaks in, it doesn’t mess up your actual stuff!
I sent this to my agents this morning:
there’s been a security breach https://markdown.new/socket.dev/blog/axios-npm-package-compromised
make sure this computer and my mac-mini have not been compromised
What am I building this week?
I’m purposefully trying not to build too much (hence no Ben’s Builds email last Saturday) because I’m focusing on this course. It’s taking shape now, and I hope to send out some preview lessons asap. I’ll be presenting a version of this to Stanford students in SF next month.
I really do want to finally spin up my own email client, probably by cloning this, made by a YC partner.
Building security or sandbox-related developer tools or infra? I invest 👋
Ben’s Bites is brought to you by Attio, the AI CRM
Honestly, no one gets excited about a CRM. But then they try Attio. It connects to Claude Code and n8n through its MCP server, completely bridging the gap between my customer data and apps. Wait, there’s more, like flagging churn risk and turning customer feedback into Linear projects. Try it now.
Headlines
Computer use is now in Claude Code. Claude can interact with your computer using the UI (like we do) to test apps or do tasks. Available in research preview on Pro and Max plans—expect it to be slow, clunky and expensive. Separately, Claude Code auto-fix works in the cloud, via web and mobile sessions. It watches PRs, fixes CI failures and addresses comments remotely.
Projects.dev by Stripe lets agents use third-party services from the CLI. Run a command, and it creates an account, gets an API key, and sets up billing with partnered apps like Posthog, Supabase, Clerk, PlanetScale and more. Developer preview is live, open to everyone soon. I got access, and it’s pretty great, much simpler than using multiple tools and connecting them.
Gemini Live is powered by a new model now - Gemini 3.1 Flash Live. Takes in anything—text, images, audio and video to output text & audio natively. Better than GPT-Realtime 1.5 and others on following complex instructions given via voice. Available for developers too. Gemini now also supports importing your entire chat history from other AI chatbots (with no way to export your Gemini chats at all). Diabolical.
Codex has plugins now, i.e. a bundle of skills, app integrations, and MCP servers for building reusable workflows. They also created a plugin for Claude Code that lets you use Codex inside CC (how to use it).
My feed
Remodex lets you control Codex (running on your Mac) from your iPhone. Pico lets you do the same for pi-coding-agent running on any machine, via any mobile.
Shopify released a suite of free tools to create images in a new mobile app called Tinker. It lets you create images and videos like social media posts, product staging, virtual try-ons and more.
Litter - Codex on your phone.
here.now sites can now connect to external services: Supabase, OpenRouter, Stripe, and Resend. No backend needed. One of my favourite tools (I’m an investor) just got even better! I’ll cover building with this soon.
Plus One by Every - A hosted OpenClaw that lives in your Slack, pre-loaded with skills, workflows, and connected to other Every tools like Cora (email), Spiral (writing), and Proof (docs).
Everyone is building a software factory; no one has figured it out fully yet.
Hyperbox - rent Mac-mini’s as your virtual computer (re the security topic today!)
Vercel published excerpts from an internal talk on how to agent responsibly.
How Claude Cowork’s design lead uses it to collect and summarise user feedback to decide what gets built next.
Chroma and Intercom have both trained custom models for their use cases. Chroma’s Context-1 is a better search agent, and Intercom’s Apex 1.0 helps their agent Fin achieve a higher resolution rate. Intercom’s CEO makes the case for vertical models.
Cohere Transcribe - 2B text speech to text model with faster and better performance than most similar-sized open source models.
Warp stopped buying SaaS and moved everything to agents, skills and just-in-time apps. Saving $10k+/year on cancelled subscriptions.
Users who talked with Macy’s new AI Chatbot spend about 4.5x more than users who don’t.
Daniel thinks online courses should position themselves as training for agents now.
Afters
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com









