Make any media searchable
web access CLIs, sandboxes and another openclaw clone
Hey I’m Ben. I build stuff with agents, even though I’m not technical. Here’s all the stuff I’m reading and tinkering with. If you want to start building or level up your ‘vibe-coding’ skills, join our community.
Hey folks,
Google released Gemini Embedding 2, and it is multimodal, so you can embed text, audio, images, video and PDF documents using the same model. It’s a little expensive compared to other options in text, but videos at low fps and audio are really cheap with the unmatched feature of embedding them all at the same time. This should open a lot of startup ideas that are basically “search over a large amount of non-textual data.”
Replit released its Agent 4 with multiple parallel agents, live collaboration with teammates, and an interactive design canvas that both you and the agent can edit on. Agent 4 can make more than just web apps; it can create animations, slides, mobile apps, data visualisations, and more. All of it is possible in a single project. Plus, Replit raised $400M and is now valued at $9B.
Meta acqui-hired the team behind Moltbook - the reddit-like social media platform for openclaw agents that went viral earlier this year.
Perplexity teased Personal Computer - They say it’s always on version of Perplexity Computer with access to your files, apps, and sessions through a continuously running Mac mini. That sounds kinda like openclaw, doesn’t it?
Async Voice API is a human-like, low-latency text-to-speech API for real-time apps and agents. 15 languages, streaming-ready, integrations with n8n, LiveKit, Twilio, and more. Top-ranked on the Hugging Face TTS Arena. From just $0.50/hour with a 24/7 SLA. Try it now.*
🌐 What I’m consuming
Annotated breakdown of Karpathy’s autoresearch prompt.
From developer to fleet commander.
AI should help us produce better code, not just more code.
We’re going to need an even bigger IDE - Karpathy.
Building a full programming language with Claude Code.
Anthropic vs DoW is a warning shot, and I’m glad this episode happened.
How Codewall hacked McKinsey’s AI platform. They gained read access to 46.5M messages, 57k user accounts and write access to its system prompts. Now patched.
A set of non-swe benchmarks from Ramp evaluating models on real-world financial tasks.
⚙️ Tools and demos
Proof - Collaborative document editor where humans and AI agents work together.
Wondering - Turn any topic into a guided path with bite-sized visual lessons.
Blazing Transcribe - Get real-time speech to text on your Mac without sending any data to the cloud. (demo)
Ramp Agent Cards - Credit cards for AI agents with spend limits, merchant controls, and full visibility.
Upstash Box - The best way to give your AI agents a computer.
Expo Agent - Build truly native iOS and Android apps from a prompt. From React to SwiftUI to Jetpack Compose.
Gists.sh - Clean typography, syntax highlighting, dark mode for GitHub Gists.
Comfy UI now has an App mode to hide away the nodes for your users and Comfy Hub to discover & share community workflows.
Gemini inside docs, sheets and slides can now do more, like formatting the docs, filling in missing data and editing collaboratively.
ChatGPT now lets students learn maths and science concepts with interactive visualisations. (but they’re all just sliders)
🥣 Dev Dish
slopmeter - create a nice, shareable graph to show off your Codex, Claude Code, or OpenCode usage. npx slopmeter@latest
/btw in Claude Code - Have side chain conversations while Claude is working.
Mastra remote sandboxes - Give your agent a secure, isolated environment to run untrusted user code.
OpenUI - 3x faster and 67% fewer tokens than json-render to let AI agents stream UI on demand. (repo)
twitter-cli - terminal-first CLI to read timelines, bookmarks, and user profiles without API keys.
Firecrawl CLI - Toolkit for agents to scrape, search, and browse the web.
Parallel CLI - Allow agents to search, access and extract high-quality data from the open web.
Fetch API by BrowserBase - Simple, cheap and reliable way to get the page content from a URL.
/crawl endpoint from Cloudflare - one API call and an entire site crawled while following robots.txt. (also see these other endpoints)
TADA - open source TTS model from Hume. Comes in 1B (English) and 3B (multilingual) parameters, i.e. possible to run on a mobile phone.
on my watch: agent-browser-protocol — runanywhere cli
🍦 Afters
NVIDIA plans to spend $26B over the next 5 years to build the world’s best open-source models. They just released Nemotron 3 Super - 120B params (12B active) model with similar performance to GPT-oss 120B and Qwen 3.5 122B
Two new interesting benchmarks:
PostTrainBench - Measuring how well AI agents can post-train language models
RuneBench - Long-horizon goal optimisation across 14 AI coding models inside Runescape.
The Anthropic Institute - New team from Anthropic with a focus on communicating the impact of AI to the world.
Runway Labs - A generative AI incubator to explore use cases of AI video and general world models.
Netflix might pay upto $600M for Ben Affleck’s AI moviemaking company.
Cursor is reportedly raising at a $50-$60B valuation.
Enjoy this newsletter? Forward it to a friend.
That’s it for today. Feel free to comment and share your thoughts. 👋
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who make this newsletter possible :)
Wanna partner with us for March? Last few slots available


