From Pixels to Possibilities: AI Vision

GPT-4V (or more simply, GPT Vision) search volume is starting to take off, and don’t expect it to slow down as it becomes more widely known. Let’s explore some examples and opportunities from this trend.

Search volume for ‘ChatGPT vision’

Search volume for ‘GPT-4V’

What is it?

It’s AI that can see. Using GPT4-Vision API or uploading an image to ChatGPT, you can get the model to interpret what is in an image or video. “What’s this thing on my bike?” is great and all, but how about these examples:

Screenshot to code. It’s what you think it is. Take a screenshot of something, and it’ll turn it into actual code. Clone the YouTube, Instagram, Hacker News websites etc. (github repo here, no-code version here)

Cursor, the popular AI coding tool, lets you copy components with a screenshot and add it to your code, modify it etc.

Tldraw has been everywhere on my Twitter (X) feed recently. And for good reason. An unsuspecting whiteboard app that came alive with the new AI model. They added GPT4-V into its ‘Make Real’ feature - so you could draw boxes of a web application (let's say a calculator) and it would actually create a functional calculator.

So drawing code is now real. It’s only a matter of time before more applications are made this way. The true no-code (I’ve been harping on about this for years!!)

Also, its ‘drawing’ capabilities are insane too from such a basic starting point.

Be My Eyes is an app for the visually impaired to let volunteers essentially FaceTime the impaired to help them with daily tasks. Now, powered by OpenAI - AI can be the helper.

Taking control of a user’s computer. This guy asked it to find his youtube channel and you see the AI literally go to Google Chrome, type in the address bar and click on a search result. WITHOUT HIM TOUCHING ANYTHING. While this is a basic task, you can imagine what this kind of thing unlocks.

And, if you can’t, I did:

Opportunities

You can generate AI voiceovers for your product demos like this guy just built. So instead of going through the process of scriptwriting, GPT4-V will help do that for you.

Make Pokemon Go, but for real life, like this demo.

Use Vision to count cards at an online casino. Ok, this isn’t recommended, but technically possible?

Get a breakdown of how much you spend waste on social media each week.

‘Sketch your dream’ app.

Virtual time-travel experiences.

Personal stylist assistant.

Analyse my weightlifting technique, my work posture (like this demo), my tennis swing etc.

Set up a productised service to turn real estate listings into more enhanced virtual viewing experiences.

Create a ton of infographics and interesting reports on topics that you can sell access to.

A user feedback tool where I just upload a video (a loom?) of me using your site and it interprets where I’m getting stuck, where I’m spending too much time, where I’m clicking vs where I should be. Forget heatmaps.

How about an automated system that signs up for every AI tool, goes through onboarding and posts the recording on a site so others can check it out? Works for PageFlows (which I believe, has a human behind it).

Get full access

✔️ All 100+ courses & tutorials in our catalog
✔️ New content added weekly
✔️ Private community access
✔️ No subscription, $250 paid once
✔️ Expense it using this template. Or get a team account.
✔️ 30-day refund policy. No questions asked
Join 5,163 learners from companies like Microsoft, Coca Cola, NBA, Adobe & Google

If you scrolled this far, you must be a little interested...

Start learning ->

Join 5,163 professionals already learning