I write a newsletter about startups and investing—for ai builders of all levels.
I record mini-tutorials, review tools I’m testing, share my insights from an exited founder turned investor.
mcp is all the rage at the moment and openai’s responses api now supports remote mcp servers, image gen, code interpreter and file search.
reminder:
mcp = universal plug that lets llm call tools without a bespoke integration e.g. “find my emails” → (uses the gmail tool)
remote server = it can hit api endpoints anywhere on the web and treat them like built in functions.
zapier supports it - which means you can connect zapier to an mcp client like claude, cursor, and now the openai api. e.g. set up zapier mcp, add all gmail and gcal actions (find email/event/draft/etc), then use your client for things like; ‘find the email from ben’, ‘who am i meeting with this week’ etc. zapier has 8k+ apps which are supported already.
(if you didn’t know, my last co was acquired by zapier. they’re doing a great job supporting this new integrations world imo)
and so does shopify, stripe, twilio and many others.
mcps are effectively tools that llms can use - .
for agents to be truly effective and proactive, tool use is extremely important
i spoke with des from intercom this morning about their upgrades to fin (their ai-support agent), and how they’re focused so much on tool use within their agents. which has meant they can give csat estimates from actual conversations (rather than users saying if they were happy), feature request categorisation (ie how many support messages mention features on our roadmap, whats not on there but should be), and suggested content improvements (what your customers say vs your docs - is it clear? could something be added to help?)
i tweeted yesterday ‘this is the difference of good vs bad ai software’ re: shopify, and intercom does it well here too. which is great to see from larger companies!
im not saying any of the experiences are incredible, but as everything matures and it all gets better, the workflow and custom agent infra is what will make a big difference.
🔎 News worth knowing
Google I/O was jam-packed. Google shared NotebookLM project with all announcements if you want to chat with it, and we made an interactive chart for a quick overview. Here are the key updates:
Veo 3 can now generate with sound. The videos are coherent, and short-duration ads might actually be “cooked”. It’s available in a new tool called Flow that also includes the new Imagen 4 (image generation) and Lyria 2 (music gen). US only for now.
For builders: Stitch, a design tool turning prompts into HTML/Figma for web and mobile apps (this came from the Galileo AI acquisition, which I invested in!). Jules, Google’s AI coding agent, like OpenAI’s Codex, is open and free for now. I haven’t tested it yet, but 80% of my timeline says it’s better than Codex.
AI mode in search is a dedicated UI for getting answers and not 10 blue link. It’s fast and better than Perplexity. Deep search, analysis, chart generation for sports and finance will come soon. Shopping on search has a new Try On feature that looks perfect.
Gemini App has two pricing plans now. The old Advanced plan is renamed to Pro (at $20/mo), and there’s a new plan, Gemini Ultra (at $250/mo). Deep Research in the app can reason over your own files now, with full drive integration coming soon. Creating in Canvas has a few templated options: webpage, quiz, audio overview and infographics.
Project Astra’s basic capabilities (camera and screen sharing) are already in Gemini Live now. Future testing ground has abilities like ignoring distractions, sending texts (or doing other tasks) in the background while talking to you. The demos for Astra and XR glasses are really impressive.
Project Mariner, Google’s browser using agent, is now available in the US for Ultra subscribers. Computer Use tools are coming to api this summer.
Models and API: Gemini 2.5 Flash got an upgrade (live now) and 2.5 Pro has a new variant—Deep Think (coming soon). Gemini Diffusion is a new model, an experiment with a new architecture. Gemma 3n, another new architecture model is a small, open-source beast giving Claude 3.7 Sonnet a run.
In the API, we are getting MCP support, a new tool called URLcontext (to get data from urls), text-to-speech in both 2.5 pro and flash, async function calling, and better structured outputs. All really good additions.
Sam Altman and Jony Ive’s collab is official after 2 years of leaks. OpenAI is acquiring Jony's company, called io (💀 name), for $6.5B. Is that the biggest acquihire in tech? Jony will take on a design role at OpenAI, focusing on physical AI products.
xAI’s Grok API now has Live Search, allowing it to pull real-time data from X (Twitter), the internet, trending news, and RSS feeds. Twitter search is great. It's free in beta until June 5, 2025, I expect it'll be pricey later. Here’s a Replit template to play with it.
Easiest way on Earth to build any app without writing a single line of code
Emergent, the world’s first agentic builder, transforms how apps are created by turning simple conversations into production-ready apps. From MVPs to full platforms, Emergent does it all. Use VIBEWITHBEN to get early access.*
*sponsored
want to partner with us? Click here
🌐 What I’m consuming
A formula for AI in companies.
I don’t have access to Google Flow, but these short films created with Flow are better than half the stuff on Netflix these days.
Functionality vs design - what comes first when building with AI?
Some interesting observations on Veo 3 generations and the weird nuances of creating dialogue-based videos with it.
This State of Talent Report from SignalFire - entry-level hiring is collapsing, elite AI labs are hunting and locking in top talent (Anthropic has 80% retention!), and Big Tech is slowing GTM hiring to prioritise technical roles.
⚙️ Tools I’m tinkering with
Prompt Kit offers ready-to-use, customizable React components for building AI interfaces. See Zola, an open-source chat app with Prompt Kit components in action.
Magic Animator lets you animate your existing designs in seconds using AI.
Linear now lets you assign tasks to AI agents, treating them like teammates in your project management flow.
Snapdeck - AI slide generation in Figma
Flowith Neo - new task completing agent.
Replit has a new element editor that lets you make UI changes directly in your app preview, with the code updating instantly. UI edits are free.
🥣 Dev dish
Notte is an open-source, full-stack framework for AI browser agents. It handles headless browser instances and credential management, though I wish these tools would integrate with my existing logged-in Chrome. Great for scraping tasks.
Langchain Sandbox provides a way to run untrusted Python code safely within your AI agents.
The ARC Prize has finalised the ARC-AGI 2 benchmark featuring 360 visual reasoning tasks. Current top AI models score less than 5%, where humans solve 100%, taking about 2.3 minutes per task.
Nvidia has a new reasoning model post-trained on Llama 3.1 - Nemotron Nano 4B.
Mistral has launched Devstral, a 27B parameter open-source model for agentic coding, better than Claude 3.5 Haiku on SWE-Bench Verified.
2 new vector indexing methods now live on Chroma
Display any CSV file as a searchable, filterable, pretty HTML table
Not Diamond’s Prompt Adaptation automatically refines your prompts to work best across different LLMs, claiming to outperform manual engineering.
Following Windsurf, V0 by Vercel now has its own AI model, specialized in web development knowledge and offering an OpenAI-compatible API. Plus Vercel’s AI gateway for quickly swapping between popular models.
🍦 Afters
LM Arena has raised $100 million in seed funding from a16z and the University of California.
Events:
The Game Changers Vibe Coding Hackathon on June 13th in SF.
The AI Founders, CEOs, and Product Leaders Summit on June 6th at AGI House.
That’s it for today. Feel free to hit reply and share your thoughts. 👋
Enjoy this newsletter? Please forward to a friend.