Skills are taking over
A new model I'm excited about
The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
OpenAI hosted a town hall answering questions from builders. I used YouTube’s “Ask” button to get the gist: OpenAI can drive costs down by 100x in the next two years, still very focused on general models and will have a model with better writing in GPT-5.x
Andrej Karpathy posted his reflections on where we’re at with coding agents/vibe coding. I have similar feelings. The TLDR is; people are mostly moving to predominantly agent work with minimal human input, “no more IDEs” + Agent swarms are too hypey right now, agents just power through tasks and never get tired (which is insane to think about if you consider it a super-intelligent teammate), feeling way faster with what you can produce and instead of us lounging on the beach we’re, shock, producing more, learning how to guide models is becoming an art - write failing tests and then pass them or put in a loop with a browser to verify, working with agents is genuinely so fun and 2026 will be the year of slop - given the above advancements (I agree - but slopping our way to learn and produce things that aren’t slop is still a reasonable path).
I jumped on Every’s Vibe Code Camp (full recording here) alongside other well-known builders. Chatting about how I reverse engineer tools, build stuff and generally tinker a lot with agents and code (even though I’m not technical).
Signals - Droid learns from all its failures and suggests actionable work items for the team to improve itself. Currently, humans review and merge the changes, but these are early signs of self-improving agents.
Claude now has interactive interfaces for apps like Slack, Asana, Figma and more. Very similar to ChatGPT Apps and built on top of MCP.
Claude Code is replacing Todos with Tasks. Suitable for longer projects (as models improve), and saved on your device so that multiple agents can access/complete them.
ChatGPT can now pip/npm install packages, run bash and download files in Code Interpreter.
Vercel has built skills.sh - A directory for agent skills and a simple way to install them. Context7 has a similar attempt. Some skills I came across over the weekend:
Postgres best practices, browser use, React emails using resend
marketing skills, research last 30 days for a topic and image gen + editing with Flux
Kimi K2.5 is a new open-weights model from China, and it scores better than Opus 4.5 or GPT-5.2 on benchmarks in all areas other than coding. It’s also great at vision like Gemini 3 Pro, and it’s priced similarly to Gemini 3 Flash. This has actually got me excited to try it, as they are also going hard on tooling around the model with Kimi Code (CLI) and their web app for slides generation, general tasks and more. Also see, Qwen 3 Max Thinking, which has similar performance but not open weights.
Enterprises hold valuable data locked inside content, but manual processes make it difficult to unlock. Box Extract securely & accurately extracts valuable data at scale to drive faster decisions & automated workflows. Learn more from Box on how to transform enterprise content into actionable data.*
🌐 What I’m consuming
Decision-time guidance by Replit - Injecting short, situational instructions to keep the agent on track for much longer.
New essay on AI risks from Dario Amodei: The adolescence of technology.
Lessons from building AI agents for financial services.
Using GitHub Pages to preview simple outputs from Claude Code on your phone.
Claude Code has a hidden TeammateTool for running a multi‑agent team system.
I stopped reading code. My code reviews got better.
Can LLMs finally do social science? Tracking latent cultural concepts over time.
New iOS apps on the App Store grew by 60% in 2025 after being almost flat for the last three years.
Building new consumer experiences in speech AI demands low latency, sustainable economics, and mass personalization. The current #1 on the leaderboards, TTS-1.5 by Inworld, delivers on it: sub-250ms latency, 40% lower error rates with 25x lower cost for developers. Check out TTS-1.5 here.*
⚙️ Tools and demos
Scroll.ai turns any knowledge base into an enterprise-grade AI agent. Get 2 free months ($158 value) with code BENSBITES26.*
Feynman 3 by Opennote - A thinking partner for learning.
Text to diagram in Excalidraw now streams the output, is faster and smarter.
Company Search on Exa - Semantically search over 60M+ companies and get structured information on each (web traffic, headcount, financials, and more).
Pencil - Infinite design canvas for Claude Code.
Cavo - Excalidraw with a webcam and screen recording setup. (read more)
Swing by Cartwheel - Generate 3D movements for your characters.
gcombinator - Change ‘y’ to ‘g’ to get the full context of the article & comments from HN.
🥣 Dev Dish
Monitor by Parallel AI - Always-on web searches that notify you when new information becomes available on the web in the schema of your choice.
Sprites - Stateful sandbox environments with checkpoint & restore to run AI-generated or user-uploaded code safely.
Clawdbot is now called Molty. Deploy it on Railway or let Devin set it up for you.
Cursor now uses subagents to complete parts of a task, generate images, and ask clarifying questions while working in the background. Plus, they introduced “Cursor Blame.”
🍦 Afters
Applications for Embed’s 6th cohort are open till 9th Feb. Prev companies include Cognition, Listen Labs, Physical Intelligence, Pika Labs, and Yutori.
Logical Intelligence is piloting a new “energy-based model” for reasoning.
qwen3-TTS - A powerful text-to-speech model that you can run on your Mac. (try here)
Ricursive Intelligence raised $300M at $4B valuation to design better chips with AI.
That’s it for today. Feel free to comment and share your thoughts. 👋
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth
* sponsors who made this newsletter possible :)
Wanna partner with us for Q1?


