Agents that keep running

plus a promise of personalization and useful skills

Jan 15, 2026

The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.

Hey folks,

I just posted a little video walkthrough of something I built! I gave a tweet of a video to droid, said reverse engineer this and rebuild it (and it did!) It lets you have a little dashboard for agents looping on tasks (or you can use GitHub issues). Super fun little project to test out how easy it is to point at a video and reverse engineer something in a morning. Enjoy!

Cursor let agents build a web browser from scratch by letting them run for about a week. They found GPT-5.2 to be the best model for longer, more autonomous work (vs. Opus 4.5) and prompts to matter more than they expected. Also, OpenAI just made GPT-5.2-Codex available in the API, i.e. it’s available in Droid, Cursor, Windsurf and other apps too.

Claude Code can now search the right tool from your MCP server without clogging up all of your context window. Also, you can now add a comment by pressing tab when accepting/rejecting a permission prompt, i.e. “yes, and {do it this way}.”

Anthropic is expanding its experiments garage. Anthropic’s Labs team is hiring builders to create new products like Claude Code, MCP, and Claude in Chrome under Mike Krieger and Ben Mann.

Google is adding Personal Intelligence i.e. deep data integration with your Gmail, Photos, YouTube history and more into the Gemini app. I tried testing it, but it’s US only, and I’m very suspicious of it working as well as the demos show.

I’ll be joining Every for their Vibe Code Camp - where a bunch of us ‘vibe-coders’ will be building stuff, sharing what we did and tips. Other folks you know will be there!

🌐 What I’m consuming

AI may unleash the most entrepreneurial generation we’ve ever seen.
How do you store and retrieve information from the web in a database?
2026: This is AGI. A case for long-running agents satisfying a functional definition of AGI.
Understanding Manus sandbox - your cloud computer.
Give your agent a laboratory, not a task.

⚙️ Tools and demos

Scroll.ai turns any knowledge base into an enterprise-grade chatbot. Tap into accuracy and depth that generic models can’t touch.*
1code by 21st.dev - Calm, visual client for real code work. open-source.
Slackbot is now AI powered. It can create canvases, search and access messages, uploaded documents etc. and more.
Skillsync - Find elite (but overlooked!) engineers on GitHub based on what they have actually built.

🥣 Dev Dish

json-render - Let AI render UI on demand based on your defined catalogue.
agent-browser - Browser automation CLI for AI agents.
react-best-practices - Vercel released this skill with a collection of documents on how to code better with React. (read more)
npx add-skill - run this command and install any skill for your coding agents.

🍦 Afters

Google released new open-source models for medical usecases - MedGemma 1.5 4B and MedASR.
Delphi is hiring a designer for its consumer product.
Three of Thinking Machines’ six original co-founders are back at OpenAI.

Enjoy this newsletter? Forward it to a friend.

That’s it for today. Feel free to comment and share your thoughts. 👋

Find me on X, Linkedin, or Instagram
Read about me and Ben’s Bites
📷 thumbnail sourced from X

Thanks to today’s sponsors who made this newsletter possible :)
Wanna partner with us for Q1?

Discussion about this post

Ready for more?