GPT-5 doesn't suck anymore
Developer access is always a hint of what's coming next
The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
OpenAI released GPT 5.2 with three variants - instant, thinking and pro. The small version bump hides the magnitude of improvement it is over GPT-5.1. Significant bump in SWE benchmarks, extremely good at spreadsheets, better vision and lower hallucination rates. It’s now on par with Gemini 3 Pro and Opus 4.5 (though I’d say not definitely better than any of them).
ChatGPT and Codex are secretly adopting Skills. First released by Anthropic, Skills borrow many principles from MCP but are a simpler alternative for teaching the model to do something repeatedly. A “skill” is just a folder with markdown files and some code scripts. The code interpreter in ChatGPT has a new folder “/home/oai/skills” that covers working with spreadsheets, docx and PDFs with presumably options to add our own coming soon.
Tinker by Thinking Machines is open for everyone now. Tinker lets researchers fine-tune and do “frontier-grade” experiments with open models easily. The public launch comes with the support for Kimi K2 Thinking (probably the best open model out there), image-based fine-tuning, and inference that’s compatible with the OpenAI API.
Gemini has a new Interactions API that combines access to models and agents for developers. The first agent exposed to developers is Deep Research. A few weeks ago, someone asked me “who uses deep research anymore” because thinking models gave a good enough answer without waiting for 5-10 mins. I think this answers that question: deep research capability will be built into third party apps for specific tasks and work in the background just like how RAG is everywhere but you don’t actively think of it.
Pitch your AI idea to top leaders in the industry at MongoDB’s Agentic and Collaboration Hackathon on Jan. 10th. Register now to secure your team’s spot in a chance to win over $30,000 in cash prizes. Finalists will receive complimentary access to MongoDB’s .local San Francisco on Jan. 15th.*
🌐 What I’m consuming
Where’s my flying car? - Five things I thought we’d have by the end of 2025 in the LLM era.
Longer-running agents are starting to work; what about debugging and improving them?
What happens when the coding becomes the least interesting part of the work.
Claude Code’s DX is too good, and that’s a problem.
How Instacart built Pixel - an unified image generation platfrom.
Joining OpenAI at 10 - From Fidji Simo, the former CEO of Instacart, who is now the CEO of Applications at OpenAI. OpenAI now has three ex-CEOs in top positions: Sarah Friar from Nextdoor as CFO and Denise Dresser from Slack as CRO.
This week, Adobe added three of its most popular apps, Photoshop, Adobe Express and Acrobat, into ChatGPT. So now you can edit photos, create designs and edit PDFs directly in your ChatGPT conversations. This handy tutorial shows you how to get started for free.*
⚙️ Tools and demos
Craft: Your best personal productivity app - connect it to any tool via MCP. Share your workflow in our Challenge and win up to $10,000.*
shadcn/create – Build your own shadcn/ui with full customisation. Icons, base colour, theme, fonts and the component library. (see how)
Code Wiki by Google - Gemini-generated interactive docs for open-source repos (with private repos coming soon).
btca.dev - the better context app. CLI for asking questions about libraries/frameworks by cloning their repos.
HyperBookLM - Open-source NotebookLM with web-agents.
OpenCode has a desktop app now.
An AI agent that watches user sessions + emails users when they get stuck (demo)
🥣 Dev Dish
OpenAI released three new audio models, mini ones for both transcription and TTS and a bigger one for realtime audio to audio.
Chorus, an AI chat app from the makers of conductor is now open source.
OpenCommit - AI commits done properly with git messages, changelog & documentation generation.
Deno and Warp are both (separately) building Sandboxes now to run untrusted code generated by LLMs safely.
Android Use - An open source library that gives AI agents hands to control native Android apps.
Publish your documentation as an npm package for LLMs.
🍦 Afters
Notion is making half of its revenue from AI now.
You can now gift claude to others.
That’s it for today. Feel free to comment and share your thoughts. 👋
Read about me and Ben’s Bites
📷 thumbnail creds: @keshavatearth
Thanks to today’s sponsors who made this newsletter possible :)
Wanna partner with us for Q1?



Many top-tier companies are using long-running agents to generate or verify pull requests to their monorepos. Enterprise is really heating up!