We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 509: OpenAI o3 and o4 Unlocked - Inside the newest, most powerful AI models

2025/4/22

Everyday AI Podcast – An AI and ChatGPT Podcast

Shownotes Transcript

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. There's a new, most powerful AI model in the world. Yeah. Sometimes I feel like DJ Khaled because each week it's like another one.

Another one, another most powerful AI model in the world. Y'all, the last couple of weeks, couple of months, it has been a back and forth, I think specifically between OpenAI and Google for the ever changing title of most powerful AI model in the world. And I think now with OpenAI's new O3 specifically, it is the most powerful AI model in the world.

Is it the most flexible? Will it be the most used model? I don't know, but we're going to be going over that and a lot more today on Everyday AI as we talk about the new OpenAI's 03 and 04 mini models unlocked inside the world's newest, most powerful AI models.

All right. What's going on, y'all? My name is Jordan Wilson, and I'm the host of Everyday AI. And this thing, it's for you. It is your daily live stream podcast and free daily newsletter helping us all

Not just keep up with AI, but how we can use it to get ahead, to grow our companies and our careers. If that's what you're trying to do, you are in the right place. So you need to go to youreverydayai.com and there on our website, you can not just sign up for our free daily newsletter where we will be recapping the most important aspects of this show and sharing a lot more.

But we are going to share with you everything else that's happening in the business world, in AI world. So you can be the smartest person in AI at your company or in your department. All right. So make sure if you haven't already to go to your everyday AI dot com to do that.

So I am very excited today to talk about the new 03 and 04 models from OpenAI. But before we do, let's start as we do most days by going over the AI news. And hey, live stream crew, it's technically a two-part show, so I need your help. Let me know as I go over the AI news, what 03 use cases should we cover in tomorrow's show in part two. All right.

Here's what's happening in the world of AI news. Couple of big things. So Chinese tech giant Huawei is preparing to begin mass shipments of its new 910 CAI chip in May, aiming to fill the gap left by US restrictions on Nvidia's H20 chips, according to Reuters.

So the new chip from Huawei, the 910C, achieves performance comparable to NVIDIA's H100 by combining two existing 910B processors, representing a key shift for Chinese AI developers who need domestic alternatives.

So Washington's latest AI export controls have pushed Chinese AI companies to seek more homegrown solutions, making Huawei's 910C likely to become the main AI chip for China's tech sector. So yeah, it looks like Nvidia could potentially have a strong new competitor in Huawei. All right, next. A small thing, but I think that could have a big impact.

So OpenAI has quietly introduced memory with search, much different than their memory feature they rolled out about two weeks ago. So this allows ChatGPT to use personal details from prior chats specifically to tailor web

search queries. All right. So yes, OpenAI rolled out their expanded memory feature a couple of weeks ago that allows ChatGPT to use personal details, but that did not apply to web queries. So this new update means ChatGPT can now rewrite user prompts to reflect individual preferences while browsing the web, such as, you know, whatever you share with it, dietary restrictions, location, et cetera, to bring you more accurate search results.

So this move follows recent upgrades that let ChatGPT reference users' entire chat history, further distinguishing it from competitors that don't have this feature enabled.

Users can turn off this feature in settings, but the rollout appears to be very limited so far with only a few accounts reporting early access. So yeah, make sure to keep an eye out for that. All right. One last thing to keep an eye out on is while bringing AI into the classroom. So the Trump administration is weighing an executive order that would require federal agencies to

promote artificial intelligence training in K through 12 education. And this is according to a draft obtained by the Washington Post. This is technically super breaking news, only a couple minutes old. So the draft policy directs agencies to train students in using AI and integrate the technology into teaching tasks, signaling a potential change

national shift in how schools approach technology education. So agencies would also partner with private companies to develop and implement AI related programs for students, aiming to better prepare them for careers shaped by AI. So the proposal is in draft

form right now is still under review and could change or be abandoned. However, if it is enacted, it could significantly shape how the next generation learns and works with artificial intelligence.

I would love to see this happen personally. This little, hey, little tidbit, y'all. I haven't shared this much, but I just saw Jackie here in our comments holding it down. I'm teaching a course at DePaul here in Chicago, and I'm flipping the script on its head. I'm saying you have to use AI.

at every single junction. Like don't go old school. Don't, right? In all of these aspects, you should be using AI in every single aspect. So it should be pretty interesting to see how this new executive order unfolds and if it actually is introduced.

All right. A lot more on those stories and a ton more on our website, youreverydayai.com. All right. Let's get into it. Let's talk about the newest and I think the most powerful AI models in the world. All right. From OpenAI. But again, I don't necessarily think that means if it's just because it's the most powerful, I don't think it's necessarily the best or the most powerful.

Right. Those are three very different things. I do think by far the new OpenAI 03, which is the full version. And then we have the 04 Mini and 04 Mini High. Yeah, the naming is terrible. OpenAI has said that they're going to address this naming problem because it's extremely problematic. Right. But the new 03 and 04 models are extremely impressive.

impressive, specifically the 03. All right. And if you're confused like, oh, Jordan, why is the 03 better than the 04? Well, that's because the 04 is a mini. So we have 04 mini and 04 mini high. But now we have the 03 full model. Right.

Whereas previously we had 03 Mini and 03 Mini High. Confusing. But this is the first kind of full O model that we've had since 01. Yes, I know it's confusing that to skip 02 because of some naming rights with, I believe, a British telecom. Very confusing with the model names. But here is what is not confusing. This new model is extremely impressive. All right.

So, uh, live stream audience. Good morning. Good morning. Like what will said here on LinkedIn. Uh, love, love to see it. Uh, everyone, let me know what questions you have, uh, about this new Oh three and Oh four models. Um,

I'll either tackle them today later on our live stream here, or I will make sure that we do this tomorrow in part two. So it's good to see everyone on LinkedIn and on YouTube. Thanks for tuning in everyone. Love to see us learning together live. All right, let's get into it, shall we? So here's the overview.

on the new 03 and 04 models. So these were just released about a week ago, and this is the kind of the newest successors in OpenAI's O series. So yeah, I just laid out a bunch of O's, which by the way, has anyone had O's the cereal? I was talking about this with my wife.

They are so underrated, like maybe my favorite top five favorite cereal. That's beside the point. But so many different O's, right? You have O and still, right? So they got rid of O3 mini high. But, you know, if you're on a pro plan right now, as an example, I believe you have O1, you have O1 pro, you have O3 full, and then you have O4 mini, O4 mini high. It's five different O series models across three different classes. Extremely confusing. Uh,

And obviously, OpenAI is in the future moving away from this and treating GPT-5 as a system. But essentially, if you're wondering what's all these O models, these are the thinking models. These are the models that can reason and plan ahead step by step under the hood before they give you a response. Whereas the GPT models, so as an example, GPT-4 or GPT-4.5,

They are more instantaneous, right? They're not necessarily thinking like a human would step by step using this chain of thought reasoning under the hood before it gives you a response. So I like to say there's two very different classes of models from open AI. You have your quote unquote old school transformers, and then you have your quote unquote new school O series model, which are your thinkers and your reasoners. All right. So this was just released, uh,

less than a week ago. And here's the biggest part. It is capable of using all of OpenAI's tools, which is the biggest differentiator between the 01 and the 03 models that could not use every single tool. Because when we talk about agentic AI, and yeah, that's what I think 03 is. It is an agentic model.

at its core. And we're going to see that, I think tomorrow, uh, when we go through some of these use cases alive, but the biggest difference or one of the biggest differentiators here is, uh, oh three can use all tools, web search, Python, uh, FOT,

file uploads, computer vision with the visual input reasoning and also image generation. It can literally do everything. Whereas the previous O series models were a little limited, right? And some of them were different. You know, even now you can use canvas, which is more of this interactive mode that can run and render code inside the new O3 model. Whereas before

It's like, okay, the O1 model is the only one that could use canvas, but O1 wasn't very good at many things because O1 pro and O3 mini were better or sorry, O3 mini high. And then O3 mini high could use the internet, but you couldn't, uh, you couldn't upload files, uh, and it couldn't, um, use canvas. Right. And then you had O1 pro that you could upload files, but you couldn't use a canvas and you couldn't browse the web. Right. So it was kind of hard with all these different O models.

And, you know, they all kind of had their own kind of unique features. But now, 03, I do think this is an agentic model.

And I know that sounds crazy to say, but it is extremely powerful and it can use every single tool under its kind of tool belt. And it's trained to autonomously decide when and how to use these tools. That is what I think makes it probably the most powerful AI model in the world. And it responds with rich answers, typically in under a minute.

And it is right now, if you have a paid plan to Chad GPT, you have access to it. So whether that's Chad GPT plus pro teams, et cetera, you have access. It's also available in the API. There are limits though. All right. So.

If you are on either a chat GPT plus account, that's your standard paid account at $20 a month. Or if you're on a team account or enterprise account, it's pretty limited. So you only have 50 messages a week with the best one, which again,

is 03, not 04, right? So 04 Mini is not the best one. 03 is, right? I'm just going to say 03 Full. It's what a lot of people, including myself, are calling it since we previously had the 03 Mini and then we're having to deal with the 04 Mini and people are confused. So 03 Full.

is the best model but right now if you're on a paid plan you only have about seven messages a day or about 50 messages a week so not a ton with 04 mini you have 150 messages a day in 04 mini high you have 50 messages a day so if you are a power user on a paid plan you might want to start with 04 mini high

Right. You have 50 messages a day and then maybe save those seven messages a day for the time that you really need a little more juice, a little bit kind of more compute, more smarts. Then you can hand those over to O3 full. If you are on the pro plan, which is two hundred dollars a month, you have, quote unquote, near unlimited access.

So, you know, OpenAI says, yeah, there's, you know, some, you know, fair use things that you have to adhere to. But for the most part, it is unlimited. So I have free plans. I have

$20 a month plans. I have multiple team plans. I have multiple enterprise accounts for companies that hire us to train their employees. So yeah, if you're trying to do that, you can reach out to us. We can train your team. Right. So it is kind of weird. I'd say that the teams account in the enterprise accounts have the same model as the plus account. You would think or hope it would have two X, three X, especially the enterprise y'all open AI. You got to get together. I'm hearing a lot of grumblings about,

from companies that have invested heavily into enterprise accounts and they can't get kind of the same power that you can get with an individual account. I know it comes with a pricing, right? Paying, I think, anywhere between $30 to $50 for an enterprise seat versus $200 for a pro seat. But so many of these companies are investing in hundreds or thousands of seats for their enterprise teams. OpenAI, you gotta give them more juice. Just saying, all right?

So what the heck is new? Let's go over it. So advanced tool use. So like I talked about, it has autonomous access to browsing, coding and visual tools. The image understanding, it is improved. The visual capabilities are much improved. And O3 does a great job at interpreting complex visual inputs, like as an example, research papers.

It has a much larger context window in the chat GPT interface. Finally, right? So finally, within the chat GPT interface, we have a 200K token context window.

Okay, so it can handle longer multi-step tasks seamlessly, and you can share a ton of information without it forgetting things. Whereas previously, you know, unless you were on an enterprise plan, we still, for the most part, had a 32,000 token context window on the

chat side of ChatGPT, right? It was different on the API side, but a lot of users inside of ChatGPT, if they were especially copying and pasting a lot of information, ChatGPT was forgetting things, right? Because that 32,000 context window, it's about 27, 28,000 words of input and output, which isn't a ton. So it's a welcome site to see a 200K token context.

in the new O models. Improved reasoning, another thing that it's new and the ability to chain multiple tool calls together for layered analysis. And I think that is probably the standout feature. And there's some new safety features as well, right? OpenAI doesn't want to accidentally start

a biochemical war, which you might be like kind of chuckling and rolling your eyes, but no, seriously. So good on OpenAI for addressing these things when they release new models and they give it essentially levels or warning levels. So they address that on their website as well. And there's new features that can reduce the risk and enhance trust. All right.

Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. And if you're a little confused and you're like, wait, this is the new feature? I thought it was a different feature. Yeah.

Let me quickly get you up to speed. If you've been sleeping under an AI rock for three weeks, here's what else is new at OpenAI and ChatGPT because you might be confused. And I want to really tell you, no, no, no, no. This is separate, right? So yeah, we've been hearing a lot of buzz the last couple of weeks about this new GPT-4.0 image gen, okay? That is different. This is, you know, 03, different beasts altogether, but it can use the image gen.

Then in April, we had the memory rollout across all chats. So essentially, if you have this enabled, ChatGPT can pull in conversation or can pull in information from past chats, which is different than memories, which were essentially individual nuggets that were stored in kind of a memory bank. But now ChatGPT, it does this via kind of a search poll and a semantic keyword matching and then deliver you kind of personalized results. I personally

hate this, right? Because it's always trying to personalize things based on my past chats, but that's new. All right. And then we also had the Google drive connector roll out for chat GPT teams accounts, uh, about three weeks ago. And then also last week, uh, we got,

Was that last week? Yeah, my weeks are starting to blur together, y'all. So yeah, it was last Monday that OpenAI released another set of new models. So don't get confused. These other models were GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano. However, those are not available inside ChatGPT. Those are only available on the developer side.

All right. So I think those kind of the highlights of those highlights,

Context window to a million tokens, huge. Actually, the GPT-4.1 mini was stealing a lot of the headlines, rightfully so, because it was really outpunching its mini moniker. But the 4.1 models, I think, were much better in coding and just a pretty big improvement both on cost and performance when it came to the model that it was following in GPT-4.0.

All right. So these new O series models are not that right. But I do think it was worth pointing out. Yeah, there's been a lot of new things happening. Uh,

inside ChatGPT that are not these O series models. So I figured I'd take two minutes here to get you caught up. Yeah. Like what Jackie's saying, need a cheat sheet. Yeah. Maybe I should create one. Kevin is saying, Kevin from YouTube is saying, it's annoying in the paid education version. I still can't access it. So I'm guessing, Kevin, you're talking about O3. Yeah.

Uh, yeah, it should, it should be rolling out. You know, I know this sounds weird. It's kind of like, Oh yeah. You know, restart your computer, you know, take out the SNES cartridge and blow on it. Right. So many times it is like a cookie issue, uh, or a caching issue. So if you, you know, log out of your chat GPT account, maybe clear your

cache and log back in, it might be there. That's actually the way I always do it. Whenever there's new models announced, I do that like two or three times a day to try and get access a little earlier, even though OpenAI does kind of control those rollouts. All right. Let me answer the question. Is this the best model in the world? So yes and no.

I think it is the most powerful AI model in the world. I think best depends on your use case. Is it the most flexible right now? No. So let me say that again. I, yes, I a hundred percent believe it is the most powerful AI model in the world. It is not the most flexible. And if it's the best depends on your use case. So obviously yes,

Right now, it's kind of jabbing back and forth with the Gemini 2.5 Pro from Google. And we'll see as, you know, more user feedback starts to roll out. But when it comes to just pure upside, just the ceiling, strictly power, I think O3 is unmatched right now.

Does it like, does that mean that I'm only like, right. Does that mean me personally, I'm only going to be using Oh three. Absolutely not. Right. I'm still going to be using Gemini 2.5 pro all the time. The big difference is y'all. And we're going to talk about this a little bit with benchmarks.

Gemini 2.5 Pro is a hybrid model, which makes it much more flexible because in certain instances, especially if you're having iterative conversations, back and forth conversations with a model, which is what you should be doing. Sometimes if you're using these O series models, you can ask a very simple query or a very simple follow-up query, and it might think for like minutes, right? So

in terms of flexibility and usability might not always be the best for some of those conversations that are a little more nuanced and don't just require you know big ai brains but if you need big ai brains in an agentic type of large language model interface o3 is it and it is so so impressive right

But let's look at some of the benchmarks. And here's one thing that I kind of wanted to call out, right? So on this show, we talk a lot about the LM Arena, right? And this thing called an ELO score.

And what that means is you put in a prompt, okay? And then you get two blind outputs and you decide which one is better, output A or output B. All right. And that essentially over time, when there's enough votes, a new model that gets released gets an ELO score. Essentially, you know, it comes from ELO scores in chess and it's like, hey, head to head, this is what humans prefer the most. So right now, the top on that list is Gemini 2.5 Pro. Okay.

And here's why I'm bringing this up as a caveat right now. Oh, three full does not yet have enough votes to be on the, uh,

chatbot arena leaderboard that could change in a couple of hours or in a couple of days it could be up there pretty soon however i do not expect the o3 full model to do very well when it comes to head-to-head human comparisons and here's the reason why when you look at o3 mini high right which was my workhorse model right before gemini 2.5 pro came out i'd say oh oh three mini high that was getting about

60% of my usage. Humans, head to head, for the most part, don't prefer it. Right? And one of the reasons why, think. You have these

traditional large language models that focus on kind of quick snappy responses. You have these thinking models which just take longer and really only showcase their abilities when it comes to when you're asking it for a very tough question. Right. And then you have your hybrid models. So I think

Ultimately, the hybrid models are going to be the ones that on a head-to-head ELO score, those are going to be the ones that do best. I don't think these thinking models, strictly thinking models are ever going to do that great in human comparison. The way I think about it is like, okay, think of someone you know that's super personable and has a ton of business savvy and is super smart, right? That's like Gemini 2.5 Pro.

Then you think of something like Einstein, right? And a lot of people, what they're putting queries into LM arena, it's kind of quippy things, fun things, right? Like, write me a haiku about explaining large language models using basketball terms, right?

not something that an Einstein level model wouldn't necessarily excel at. So I'm just putting this out there. Once the O3 model full hits the chatbot arena, I don't necessarily foresee it, you know, being a top, you know, a top three model. I do think probably Gemini 2.5 Pro, because it is a hybrid model, will still retain its lead on that specific benchmark. However, however,

Look at some of the other comprehensive sets of benchmarks that have already gone through with the new O3 full, or as some people are calling it, O3 high, uh,

And it's the best. So as an example, if you look at live bench, okay, so live bench is a benchmark for large language models designed with test sets, contamination and objective evaluation in mind. So I'm reading off their website here. It has the following properties. Live bench limits potential contamination by releasing new questions regularly. So then that way it won't get into, you know, models, testing sets.

Each question has verifiable objective ground truth answers, right? So it eliminates kind of the need for a large language model judge. So it's fact or fiction, no gray area. And then live bench currently has a set of 18 diverse tasks across six categories, right?

So language, data analysis, math, coding, reasoning, et cetera. And then you have a global average. So on LiveBench, which I think is a good third-party benchmarking system, O3 is better than Gemini 2.5 with a global average of 81.5. And Gemini 2.5 is the next best model aside from OpenAI's O models, which actually take up the first three spots. So Gemini 2.5 comes in at a 77.4.

So 03 high, much better at 81.5. Similarly, another one that we talk about a lot is the artificial analysis index.

So again, a very reputable and I'd say probably one of the most trustworthy third party benchmarking services out there. So they haven't done 03 full yet, I believe, because not all of the capabilities are available in the API, whereas on 04 mini high, they are. Okay. So on 04 mini high, which is a mini model.

On the intelligence index, it is the best model or the most powerful model in the world. All right.

So right now, it is ahead of Gemini 2.5 Pro by two points. All right. And this, I think, is pretty important because, again, you are comparing a mini model. So I assume once the full model is put through some of these tests, it will be even further ahead. But the 04 Mini High is two points ahead.

of Gemini 2.5 Pro. So when it comes to

unbiased third party benchmarks that look at a lot. It has been decided 03 and 04, right? This is the most powerful model in the world. Could Google clap back next week and release a brand new, you know, 2.6 pro? Absolutely. I'm sure they have something ready to go. But today, if you are looking for the most powerful model in the world, 03 and 04, it's where it's at.

So the standout feature, which is something that we're going to be doing in part two tomorrow. And let me know again, what use cases do you want to see in our part two? But the standout feature by far is agentic tool use. So if you're listening on the podcast, this will make a little bit more sense on the live stream where I have a couple of graphics here. Okay. But as an example, and this is from open AIs, um,

kind of website going over 03. It says, I took this pic earlier. So again, visual understanding, the ability to reason with photos and use kind of on its own terms, decide when

and how often to use these tools. So it says, I took this pic earlier. Can you find the name of the biggest ship you see and where it will dock next? All right, this is tricky because there's in this photo that they upload, all right, the ships are number one, they're out of focus, they're a little blurry, but also they're at different perspectives, right? So it could be one ship just appears bigger because it's closer and the other ship could be larger, but it's further away.

So it reasoned for only a minute and a half.

And it even is talking it through, right? So like, here's kind of the chain of thought or the reasoning that the model is going through. It says, I think I miss the ships in the crop. They seem to be off to the left, which my human eye did not even see this. It says, I'll zoom in to better inspect. Then after it literally crops in, zooms in, gets a clear kind of vision

view of the photo, then it says, okay, I see the ships now. The largest one appears to be the red hold tanker on the right with another smaller ship in the center.

The user wants to know the name of the largest ship and where it will dock next. At this distance, the name of the hull may be too blurry to read. Maybe zooming in further will help get a clearer view. So it essentially enhances the image, continues to zoom, and then it decides at a certain point, okay,

I've now understood the location, right? So then it goes on and it uses things like location data. It looks up using the internet to correctly identify what that ship actually is. So I also, there is a browse comp agentic browsing benchmark from OpenAI. And I think this is worth pointing out because

If you've ever used the 4.0 model, and if you've uploaded an image and then had it go browse, such as the case in this example, 4.0 is not good, right? So it only has a 1.9% accuracy rate.

Whereas now, right, when you look at 03 with Python, okay, so again, that means it can kind of create its own code and render code to help solve problems on the fly. So when you have this new reasoning model that has a better visual understanding, it can run code to help it solve problems and it can browse the internet. That 1.9% accuracy from 4.0 with browsing goes to nearly nothing.

50% with O three, an extremely impressive job. All right. Um, and also FYI, I threw this in here. Yeah, should've been a couple slides back, but we did cover, uh, when we talk about use cases, since we're going to be jumping into use cases tomorrow, uh, there's actually some use cases. I think a lot of people are sleeping on that. We went over in the new four Oh image gen, but this also the new model can do image gen in O three.

So here's the overall features and takeaway as we wrap up today's show. So it is, oh, three is a powerhouse of reasoning. It excels in coding, math, science, and visual tasks. So it provides deep insights in complex solutions. And it does this by tackling intricate coding, science data in creative tasks. It can quickly analyze complex data sets. Yeah. You can upload files, uh, and it can create, uh,

new intelligence with those files that you upload for human level insights. It thrives where deep understanding and factual accuracy are essential and it's ideal for applications demanding high level expertise, right? So if you've used OpenAI's deep research, it actually...

Uh, that was the only, uh, I guess tool or mode previously that used O three, the full version, right? Whereas, you know, for the last couple of months when we've had deep research, it was, it was not using O three mini, right? And there's a huge jump between O three mini and this O three full or O three high, whatever you want to call it. Right. And it does a fantastic job of this agentic browsing on the web and iterating, uh,

and kind of changing course midway through, again, depending on what you start with. And it's ideal for applications demanding high level of expertise. Oh, for many, if I'm being honest, unless you're using oh for many because you don't want to run out of prompts, right? Of those like 50 messages a week. Otherwise, there's no reason to use it on the front end.

There's not, but I think 04 mini will be probably in the long run more for developers because right now it's faster and it's more efficient. So the big thing with 04 mini here, it's speed, scalability, and efficiency. It's a smaller model, but it balances reasoning with computational efficiency and it excels where speed and costs are key and it's ideal for

high volume use. It's quicker, yet it is still insightful in interpreting data and it streamlines workflows with adaptable processing into connectivity. So yeah, I don't think if you're

on a paid plan, you know, in using ChatGPT on the front end, you should probably never prefer to use 04 mini. It should really only be if you've kind of hit your quota for the week with 03. But, you know, if you're a casual user and you're like, okay, 50 messages a week,

I can get by with that for O3. You shouldn't be using O4 mini. But if you're a power user, yeah, you might have to use O4 mini for some of those tasks and then kind of pocket O3 for the more complex things or things that require, you know, kind of juggling these tools. And that's ultimately where O3 excels in, you know, it's a gentic use of multiple tools and researching in changing course. It's extremely impressive.

So tool chaining, that's something you're probably going to start here. Uh, you're going to start hearing a lot. Uh, and that's why it's important. And that's why I think what makes it the most powerful model in the world is the ability to use multiple of these tools at the same time for you to be able to upload files for you to, uh, start with computer vision, right. Or start by, um, you know, uploading a photo and have it to be able to reason over that photo, uh, the ability to

essentially do deep research, right? So it's not just blanket doing one search and pulling in all of that aggregate data and thinking over it at once.

It's going literally step by step and it's researching. And if it finds something in its research, I've seen this, it will change course. I've had it a couple of times start by using computer vision. Then it goes and starts on the web. Then it goes and starts using Python to create something. And then in the middle of that, it's like, oh, wait, I need to go back to

uh to the web and then it's like oh wait i need to go zoom in on that photo right so that's where this really excels in this in kind of a special sauce and why like when i first started using this my jaw kind of dropped which is hard for me to do as someone that spends so much time on ai tools is it's agentic tool chaining and putting these different capabilities together and deciding on its own when it should use what tool and then going back and reiterating

on its own. So it can think with images, it can crop, zoom and rotate visuals during analysis. The 200K token context is great for deep layered workflows. And then to seamlessly chain together tools, the web, Python and ImageGen for complex queries like forecasting things, right? And then to have this autonomous decision-making. So complex queries, this is your model.

right? Because of that autonomous ability to chain together these different tools. So Google has a shorter, smaller version of this, but for the most part, when I'm using Gemini 2.5, I don't see Gemini 2.5's ability to go back and forth and reiterate on its tool use. So yes, it can create things in its canvas mode in Gemini 2.5 Pro. It can query on the web, but

for the most part, it is more of this unilateral approach where O3 does these in parallel and it iterates on its own tool use, right? Which is, it is, right? I don't know if people remember when I used to talk about plugin packs and how they were so powerful back when ChatGPT had plugins. And I'm like, y'all are missing the big thing here, right? And it hasn't been until now that I've had that same feeling because essentially, right? You look at these different agentic,

kind of like plugins or tasks, right? So part of it will, uh, analyze the, uh, the image and then it'll use that information to go find, uh, you know, updating information on the web. Then it will pull that and maybe start using Python. Then it'll look at the image again. So I almost think of it as kind of like multiple specialists working together, um,

but they'll work one at a time. And then the researcher will come and find things and then bring that back to the data analyst, which is Python, right? And it'll keep working iteratively and then even use the canvas mode. So it's almost like you have a UI UX designer, right? So it does all of these things iteratively where I don't think we've really had that with any models, right? So even with Gemini 2.5,

Pro, again, this model hasn't been out for very long. It does seem and feel and under the hood look like a more unilateral approach where I think where O3 shines is that it can adapt its own strategy on the fly. It reacts to information, it refines its tool use, and it can tackle those tasks requiring up-to-date data, expanded reasoning, and diverse outputs. All right, that's

A wrap y'all. I'm gonna scroll through and if I see any questions, Joe just says, thanks for this report. Very helpful. I wonder how OpenAI has resolved intermodal communications for chaining. Yeah, we'll see, right? So we have heard, and this has been pushed out, right? That in the future, you're not going to be able to decide which model to use, right? And GPT-5 will actually be an architecture that houses some of these modes or some of these models under the hood and you may not get to choose.

I don't want that to happen. I don't want GPT-5, right? I want to be able to choose my own models, right? So it should be interesting to see how that happens. All right, we have a LinkedIn comment here. Someone said, "In your newsletter, you mentioned you have been struggling to push past O3's limits and would love to hear more about that. What limits have you been pushing?" Yeah, great question. And yeah, sorry, for whatever reason, LinkedIn settings, I don't see your name.

It's been very easy for me to push models to the limit. And one of the reasons is you give them complex tasks that would normally unfold over the course of like an hour long conversation, right? You know, saying, hey, analyze, you know, analyze this photo, then go, you know, create a chart where you forecast something based on information that you pull from this photo. So as an example,

you know, here's a photo with a bunch of AI tools. And this is probably an example I'll do tomorrow, right? Go look up pricing for all these tools. Go look up, you know, what's included on a free and paid tier. Then, you know, using, you know, kind of your coding abilities, create a chart, but then go out and also create, I don't know, a website or an interactive graph on this. So, you know, it's been difficult for me to kind of break

some of these models because they don't have essentially complex tool use and O3 does. And it seems like at least in my very initial testing, which hasn't meant a lot, right? I've probably only been able to give O3, I don't know, maybe 10 or so hours so far. I've been very busy. I had a

a keynote and a workshop and I moderated a panel at 1871 and, uh, you know, planning all these episodes. So I haven't had my normal amount of time. You know, we had the Easter weekend. So I was, uh, you know, uh, trying to spend as much time with family as possible. Uh, so I haven't had as much time to break it, but I haven't been able to break Oh three yet because it's extremely, uh, extremely capable.

So McDonald asking, do you remember, do you recommend using this for building games? It depends, right? I still would probably start that in Gemini 2.5 Pro again, just because O3 is the most powerful model in the world does not mean it's necessarily the best. I think the use cases are going to be when you need to string together all of these agentic use cases. At least for me, if I'm looking for one off, you know, building games as an example,

I'm not a coder, but I would probably still do that in Gemini 2.5 Pro. It's going to be faster and its coding capabilities are outstanding. All right, let me just real quick before we wrap this up, see if there's any more questions. I always try to get to questions at the end. Big Bogey Face from YouTube saying, why use a sledgehammer when a rock and hammer will do? Yeah, that's a great point.

Renee is asking, what about Manus? So Manus is a little different. You have to choose a model for Manus. Manus is not publicly available yet, right? You have to get on a wait list, get access. And it's different, right? That's why people sometimes are like, oh, what about Perplexity? Well, Perplexity at its core is not a large language model. Neither is Manus.

Manus, you have to use a model and then Manus is essentially a collection of tools. And right now it runs on Claude Sonnet. So it is completely different. That is a true kind of operating agent, whereas this is more interfacing inside of a chat like you would a traditional large language model.

All right, we have some proposed use cases for tomorrow. All right, we have one more question here from Kieran saying, how might the advancements in the 03 and 04 mini models influence the development of future AI systems such as the anticipated GPT-5? That's a great question, Kieran. I don't have the answer, right?

I'm lucky enough. I have contacts over at OpenAI that I chat with. I don't know the answer to this. As I get the answer, I will get it to you. But again, OpenAI has delayed GPT-5. And they said that they've been struggling to essentially put all of these capabilities under this kind of umbrella and turning it into a system. So like I said, personally, yeah.

Personally, I'm not looking forward to GPT-5. I love, right? Even though a lot of people look at it as this chaotic mess, I love going into my ChadGPD account and seeing, you know, seven to 10 different models to choose from, right? Because I'm a power user. I know what I'm doing. And generally, I have a better idea than a GPT-5 system probably would of knowing which model is best because I've used them all for hundreds of hours for my own use cases, right?

Maria is saying, I'm still waiting for the OMG model. For me, you know, I think the Gemini 2.5 wasn't, oh my gosh,

model in oh three is a, oh my gosh model. Right? So I went from, oh, M uh, with the Gemini 2.5 pro, which most, most of the time models, I'm just like, eh, okay, you know, cool. This is nice. Gemini 2.5 was, oh my, and oh three was, oh my gosh. All right. So we're going to continue this tomorrow. So

make sure you tune in for part two. We're going to be going over different use cases, but also let me know what do you want to see? So if you're listening on the podcast, thanks for tuning in. Make sure to go to youreverydayai.com. Sign up for the free daily newsletter. We're going to be recapping the most important takeaways from today's episode, but also you can just reply to today's email that's going to come out in a couple of hours and let me know what is the use case you want to see tomorrow, right? I really want to tackle things that

that are on your mind. I call this your everyday AI because it's for you. So I want to hear from you. What do you want to see this new O3 model tackle? Yeah, maybe you have limited messages and you don't have kind of the message blockers

budget so to speak uh to tackle this i've got unlimited put me to work let me know what you want me to see or let me let me know what you want to see in our part two uh if this was helpful please this would help click that little repost button y'all uh share this with your network

I know you're trying to be the smartest person in AI at your company, in your department. That's what we try to help you with at Your Everyday AI. But this thing only works when you share it with others. So if you're listening on social, please share it with others. If you're listening on the podcast, please follow the show. Click that little button. If you could leave us a rating, I'd really appreciate it. So thank you for tuning in. We'll see you back tomorrow and every day for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 509: OpenAI o3 and o4 Unlocked - Inside the newest, most powerful AI models 49:13 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Shownotes Transcript

EP 509: OpenAI o3 and o4 Unlocked - Inside the newest, most powerful AI models