Grok 4 is here, but 4 who?

AI browsers are data-collecting machines

Jul 10, 2025

I write a newsletter about startups and investing—for ai builders of all levels.

I record mini-tutorials, review tools I’m testing, share my insights from an exited founder turned investor.

Hey folks,

We’ve got a few open ad slots over the summer. Wanna partner with us?

💰 Fund Update: first money is in already, so fundraising season fully kicks off for me - navigating all those lovely European breaks other investors will be enjoying (me too ofc). BUT, we’ve had overwhelming interest for this fund already (thank you!). Given the fund’s size, we’re going to be limited to 99 LPs who can participate. I’ll be prioritising chat’s with the folks who filled in the form with higher $ buckets, and then go in descending order. There’s also all sorts of confidentiality I need to be careful of by sharing info widely. If you’re interested in writing $100k+ cheques, please let me know.

Grok 4 is out in the world. The first batch has two chickens - a normal one and a heavy one. The normal version of Grok 4 is better at benchmarks than all the other models. 25.4% on Humanity’s Last Exam (scary name, ik, but basically PhD-like problems) vs 2nd best to Gemini’s 21.6% without any tools.

The heavy one is a multi-agent system with an even bigger gap across all benchmarks (44% on HLE). It’s a good model sir, has updates to the voice version and more variants are coming soon, including a coding model by next month.

On other benchmarks, I feel like it’s the regular 1-2 month cycle of a new model gaining a few % points higher. But on ARC-AGI-2, a simple benchmark with visual tasks that most humans can solve easily, Grok 4 gets 16%, double that of the second best—Opus 4 at 8%

The pace of improvement for xAI is impressive, especially being one of the later ones to join the battle.

Grok 4 steaming ahead on benchmarks is great, but outside of X, I don’t think many people are using Grok in their coding or day-to-day — I never do. However, as Ben Thompson from Stratechery talks about; it’s important to keep the other labs on their toes and be an alternative option for people to use if they don’t keep up. That in of itself is worth a lot.

The chat app only gives you Grok 4 if you pay $30/mo. API pricing is similar to Sonnet 4, but running this set of benchmarks costs 5x on Grok vs Sonnet.

In other news, Anthropic released four new courses taught by their team. Two on MCP - a basic one and an advanced one. There’s one on Claude Code and another on Claude’s API.

It’s browser season, after Dia and Comet, OpenAI might launch their own browser in a few weeks. I’d love to see a different take on browser + AI from them, I suspect there will be but we shall see. I’ll try and get my hands on it early. My biggest feeling around the browser wars is, they are data-collecting machines - and what will they do with that data? Perplexity is obviously going down the ad route (plus shopping - do they have affiliates? p.s. thats how google first tested its ad product), Dia I dont know but it’s maybe why a VC backed co can just build delightful products and worry about it later. OpenAI - we can speculate on many other things plus the big one, training new models.

Restive Ventures is now accepting applications from startups using AI to reinvent financial services. They will invest $500k+ at market terms and open up their extensive network of founders, regulators, and industry leaders. Apply now to partner with one of fintech's leading early-stage funds!*

Langchain is in talks to raise $100M with a potential $1B+ valuation. (link without paywall) It’s funny, lots of people I’ve spoken to since Langchain started said it’s good for prototyping but not for production (recent chats still sound the same). I’ve seen lots of data and spoken to others building agents who will just always build their own agent systems, people want the control and no extra fluff I guess.

v0 has an SDK now - It looks like you can make API requests to v0 to do the code generation for you - and you can customise it within your apps code. I’ve been waiting for something like this - too many weeks spent trying to build my own, fork others and just getting lost in the vibes (not in the good way here… 😅).

*sponsored

want to partner with us? Click here

🌐 What I’m consuming

The architecture behind Lovable and Bolt. Despite the scary A word, it’s an easy read.
How to use Claude Code for notes & research.
Designing the AI future - control over memory, ads, openness, collaboration and more.
Crash course for improving your RAG implementation.
AI makes wishes real, be careful what you wish for.
How to spend your 20s in the AI era.

⚙️ Tools I’m looking into

Chronicle helps you make designer-grade presentations with AI for free. It is like Cursor for presentations!*
Infinte Chat by SuperMemory - Your memory, engineered to fill any AI model’s context to get the best responses.
Rendable3d - Make 3d models that you can work with in Blender.
Blok - Let AI agents that mimic your users decide if you should build that next feature.
Billy - A drag-and-drop-based bill splitter.
Perplexity’s Comet browser is now available to their Max subscribers. I wrote about my experience with Comet (and Dia) so far on Tuesday.
Dia has a concept called skills, i.e. repeatable prompts. I found this website collecting some common and uncommon ones.

*sponsored

🥣 dev dish

Better-T-Stack - A buffet with options to make your own starter template before you start building an app.
Fal has Veo 3 Fast on their API now. Go make apps for AI video creation.
zerank -1 - Better reranker than cohere 3.5, prompted Gemini 2.5 Flash for embedding retrieval.

🍦 Afters

Replit is in cahoots with Microsoft, making vibe coding available for enterprise. Acquisition slash weird big money partnership+ownership coming? Let’s hope not…Amjad seems like he’d rather always be captain of the ship.
Anthropic is scared of models faking alignment, but only 5 out of 25 models did so in their testing, so everything is alright for now.
HuggingFace partnered with Pollen Robotics to make a cute little robot with open-source LLMs running on it. They call it “Reachy Mini”. I’ve hovered over buying this a few times - just wondering how much use I’ll get out of it with my kids (2yr olds) - but a fun thing to tinker with. ps. They did 250k+ revenue in the first 10 hours.
OpenAI is again in the researchers poaching news, this time bringing a few new faces TO their team.

That’s it for today. Feel free to hit reply and share your thoughts. 👋

Enjoy this newsletter? Please forward to a friend.

Find me on X, Linkedin, or Instagram
Read about me and ben’s bites

Discussion about this post

Ready for more?