EP 472: OpenAI’s new GPT-4.5: What’s new and who can benefit the most

GPT-4.5：一场关于“氛围感”的AI革命

我花了数小时深入体验OpenAI最新的GPT-4.5模型，它与之前的模型有着显著的不同，这种不同并非体现在单纯的性能提升上，而是更侧重于与用户的互动性和情感连接。它并非在所有基准测试中都拔得头筹，反而更像是一个强调用户体验的“氛围感”模型。

GPT-4.5是OpenAI最后一个非“思维链”（chain of thought）模型。OpenAI CEO Sam Altman已经明确表示，未来的AI系统将采用混合方法，不再局限于单一模型类型。这意味着，未来的AI交互将更加智能化，系统会根据用户的需求自动选择最合适的模型进行处理。

目前，GPT-4.5仅对订阅了200美元/月专业版计划的用户开放，API价格也极其昂贵。但这只是暂时的，OpenAI承诺会在未来几周内向所有付费用户开放。

对于AI专业人士而言，GPT-4.5或许并非最佳选择。 它并非在速度或性能上追求极致，这与我们习惯的AI模型迭代方式有所不同。然而，对于普通用户和那些之前没有深入接触过AI的企业来说，GPT-4.5的可靠性和易用性使其成为理想之选。 它更擅长理解人类语言的细微差别，在客户支持、教育和创意工作等方面展现出显著优势。

GPT-4.5的训练使用了比以往GPT模型多10倍的计算能力，并拥有更长的上下文窗口（128,000 tokens），这使得它能够进行更深入的对话。它在多模态能力（例如图像理解）和代码生成方面也有所提升，但其API的高昂价格无疑是一个门槛。此外，GPT-4.5的知识截止日期为2023年10月，用户需要主动提供更准确和最新的信息才能获得理想的结果。

在准确性和事实性问题方面，GPT-4.5表现出色，但并非完美无缺。 OpenAI的内部基准测试显示，它拥有更高的准确性和更低的幻觉率。然而，在需要复杂推理的任务中，它仍然不及一些专门的模型。

从战略角度来看，GPT-4.5是OpenAI为未来AI发展奠定基础的模型。 它标志着AI模型发展进入一个新的阶段，更注重情感智能（EQ）与技术能力（IQ）的整合。 GPT-4.5也暗示了当前GPT架构在持续扩展方面的局限性，未来OpenAI将转向混合架构，这将是AI领域的一次重大变革。 GPT-4.5的出现，并非仅仅是技术上的升级，更是OpenAI对AI未来发展方向的一次大胆尝试，它更强调AI与人类之间的自然流畅的交互体验。

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. The AI model releases don't stop. In the past 10 days,

We've gotten some groundbreaking large language model updates from Grok out of XAI, from Anthropic with their new Sonnet 3.7, and now OpenAI. So after a lot of waiting, what seems like years, we have OpenAI and ChatGPT's next big step forward with GPT 4.5.

But let me tell you something. This one's weird. Not saying in a bad way. It's different. And probably for the first time in a long time, I've spent hours now at least playing with GPT 4.5. And I'm like, this isn't for me. Again, not in a bad way.

I just think that OpenAI's new model and GPT 4.5 is something much different than it's released before. It's not breaking any records. It's not climbing to the top of every single benchmark. It's a vibes model.

It's to feel more relatable and more relational with the end users, all of us. All right. So I'm excited today to talk about OpenAI's new GPT 4.5, what's new and who can benefit the most. All right.

If you're excited to learn about it, you're in the right place. Welcome. My name is Jordan Wilson and I'm the host of everyday AI. This thing is for you. It's your daily live stream podcast and free daily newsletter, helping us all not just learn what's happening in the world of AI, but how we can all actually leverage it, what it means and how we can use the information to be the smartest person in AI at your company. If that's already you, or if that's what you're looking to do, welcome. We just became best friends. Uh,

Also, your actual best friend is our website, youreverydayai.com. Go sign up for our free daily newsletter there. Also, people don't know this. You can go listen to every single podcast episode ever on our website. Go watch every single video. There's like close to 500 now, all sort of by category. So no matter what you want to learn, we have it for you. We've probably had a world's leading expert already come and share their secrets. So make sure you go check that out.

All right, so I am excited to talk today about OpenAI's new model. But before we get started, we're going to start off as we do most days by going over the AI news.

So NVIDIA's revenue has soared 78% year over year to $39.3 billion in the fiscal fourth quarter, ending January 26, fueled by strong demand for GPUs. So the company's data center products accounted for $35 billion.

billion of the total revenue with half of that coming from cloud service providers like AWS, Google cloud, Microsoft Azure and Oracle cloud.

So NVIDIA's Blackwell GPU, which was launched in December, generated $11 billion in revenue in the first quarter, marking the fastest product ramp in the company's history. So yeah, NVIDIA's earnings coming out. So pretty, pretty interesting here. And CEO Jensen Wong announced the upcoming launch of Blackwell Ultra in the second half of 2025, promising a smoother transition compared to the hopper to Blackwell Shift.

which faced production challenges due to design changes. So Blackwell Ultra will feature advancements in networking, memory, and processors, while NVIDIA's next-generation Vera Rubin architecture combining CPU and GPU technology is set to debut next year in 2026.

So Wang emphasized that NVIDIA's manufacturing partner, TSMC, exceeded expectations in expanding production capacity, helping meet surging demand despite initial hurdles. So NVIDIA's revenue from China has dropped by half since U.S. restrictions on chip exports began in 2022. But the company now offers a less advanced processor, the H20, specifically for the Chinese market.

All right, next piece of AI news, Meta is looking to compete with ChatGPT in a completely different way. So according to reports, Meta is planning to release a standalone Meta AI app in the second quarter of 2025, according to sources familiar with the subject, marking a pretty big step in CEO Mark Zuckerberg's push to dominate the AI space.

So the app will reportedly expand Meta AI beyond its current integration with Facebook, Instagram, WhatsApp, and Messenger, allowing users to interact more deeply with the Gen AI assistant. So right now you can obviously just go to meta.ai and use their AI that way, but it looks like Meta is looking to compete more directly with OpenAI as a standalone AI app.

So in April 2024, Meta replaced the search feature in its apps with Meta AI, positioning the chatbot as a central feature for billions of users. So the new standalone app will allow for greater personalization, conversational history organization, and integration with Meta's hardware, such as Ray-Ban smart glasses, according to Zuckerberg publicly agreed with on Threads.

So Meta is also exploring a paid subscription for Meta AI. Interesting, right? A fairly open source model, but you will have to pay to use it. Similar to OpenAI's ChatGPT Plus and Microsoft Copilot, which could generate revenue through premium features and paid recommendations. So Meta AI currently has 700 million active users. So yeah.

Pretty wild there. It should be interesting. And, you know, CEO Sam Altman, OpenAI CEO, kind of responded jokingly on Twitter and said, you know, hey, maybe we'll just release a social media app. All right, let's get into it. Let's talk about what's new inside OpenAI's GPT 4.5. And this is part one of two, right?

I understand y'all. Sometimes these shows go way too long. And the other day I'm like, oh yeah, there's some updates. We're going to do a short show. And that show ended up being an hour. Whoops. I got to stop doing that, right? No one wants to listen to me blab when I'm tired and over caffeinated for an hour plus. So we're actually going to be breaking this one down. I'm not going to be doing any live demos today. Those usually take a lot of time to put together. So probably in the future when there's at least big,

new models like this, we're going to break it up into two portions. Just like I always say, hey, we're going to learn and leverage. So today we're going to learn about the model, what's new, who I think it's going to benefit. And then we're going to have a second show probably next week on the best ways to leverage it. Probably do some live demos, some examples, all that good stuff.

All right. So what the heck is GPT 4.5? Well, it is the last non chain of thought model from open AI. So in the future, uh, Sam Altman has said that future systems are going to be, uh, hybrid. So what that means is, uh,

Reportedly GPT-5 will be more of a system, uh, and you aren't going to necessarily be choosing between these quote unquote old school transformer models like GPT-4.0 or GPT-4.5 and, uh,

reasoning models like 03 and 01. So in the future, it said it's going to be more of a hybrid approach and it's going to be a system that you talk to and maybe the system is just going to choose which model is best for your query or maybe it's going to use, hopefully, one of my predictions, one of my AI 2025 predictions, which you should go listen to those shows,

is moving away even from a mixture of experts and going to a mixture of models. I hope we see that, right? I hope if in the future, if you have a very advanced query, part of it might use in theory under the hood, a GPT type model, and then part of it might use an O model, but it is the last non-chain of thought model from OpenAI. And it's,

really expensive on the API side. So right now, just FYI, this is only available to pro users. It's only available to people on that $200 a month plan. Although OpenAI did say that it will be rolling out in the coming weeks.

to all paid users. So, you know, I feel most people listening to the show are probably chat GPT Plus on the $20 a month plan. So you don't have this yet, but probably I'm guessing sometime early to mid-March, most paid users should have access to GPT 4.5.

but it's super expensive on the API. So developers or maybe if you are a technical person and something at your company runs on the backend on GPT 4.0 maybe or 4.0 mini, you're probably not going to be using this, if I'm being honest, and more on that in a bit. But ultimately,

I think humans are going to like this, right? I think humans are going to like this. And hey, live stream audience, thank you for tuning in. I forgot to shout you guys out. But if you do have a question, let me know. So thanks for Big Bogey and Harvey and Samuel joining on YouTube and Woozie Rogers joining us on LinkedIn. Dr. Harvey Kessler doing double time joining us on LinkedIn and YouTube. Love to see it.

Steven, happy Friday to you and Brian and Joe, Michelle, Dr. Scott, everyone can't go through everyone, but thank you all for joining live. If you do have questions, try to get them in now. I'll try to answer them either as we go, as they pop up, right? Yeah, this is a,

unprompted, unscripted live stream, the realest thing in artificial intelligence. Um, so, you know, get your questions in. I'll try to either tackle them as we go or at the very end. So here's some more details on what you need to know on the new GPT 4.5 model. So like I said, it is $200 a month right now to use on the pro plan. So that's, if you're using it on the front end chat bot, right? Logging into chat, gpt.com is not going to be there unless you're on that $200 a month pro plan. Um,

but it should be rolling out in the coming weeks. It's the first major model upgrade in over two years though. So that's important. So we've seen iterations and upgrades over the GPT-4 model, right? But it's been more than two years since this base model was actually refreshed, right? Let me tell you what I mean by that. So GPT-4,

came out back in, gosh, 2023, right? But it was kind of refreshed. So then we went to GPT-4 Turbo. Then we went to GPT-4-0 or Omni, right? So it was this Omni model bringing more modalities, but the base, the engine was still kind of the same. And OpenAI, you know, they did some fancy engineering and, you know, tweaked it a little bit, but for the most part, it was an

old engine that was still running this thing, but it was still the most powerful single-use model in the world. And when I'm talking about single-use, I'm meaning non-racers. So this is pretty big. It's the first major model upgrade in over two years. But here's the thing. It's built for

Empathy, it's built for relationships. It's built for intuitive conversations, right? People are saying this is a vibe model, right? It's not shooting off the charts on every single benchmark, but OpenAI hopes that when you talk to ChatGPT, you're like, oh, this is very human-like.

Right. They're they're like, oh, you're going to feel some AGI vibe, some artificial general intelligence. Right. But here's I started the show by saying this. This is not for us.

If you're a power user, if you're following AI every day like me, depending on the day, I know it changes. I'm spending, who knows, anywhere from three to eight hours a day using large language models. For the most part, ChatGPT, I'm in ChatGPT all day. This isn't for me. This model is not for me. It's really not. I think it's for everyone else. This is for casual users.

This is for my mom, right? This is for my mom. This is for companies that maybe did not get on board with AI previously. So it's not just like that because of a vibe. It's a vibe model, right? Oh, it feels good. It feels natural. It feels human. It feels like it understands my emotions, right? Because it's not the best. It's not the fastest. It's not the cheapest, right?

So then it's like, what the heck is this thing then? It's not the best. It's not the fastest. It's not the cheapest. It's not for power users. Like, what the heck, OpenAI? I really focus this on two things. And again, this model literally just came out. I wasn't part of the early testing group, so I look tired if you're on the live stream and the coffee's probably a little stronger. But I think probably if I had to boil this down to two words,

It would be reliable and relatable. And again, for me as a power user, I've never had problems with those, right? I don't run into a lot of hallucinations because I know prompt engineering very well. I know how to make sure and to refine a large model and kind of train it up on a very smaller skill set and to increase the accuracy and decrease the hallucinations. But now out of the box, hallucinations are lowered.

Now, out of the box, it's not going to sound like talking to a robot, right? Maybe that's why for the first time I'm like, ah, this doesn't really seem like for me. It's still probably going to be a model that I use very often, even though it's not the best, not the fastest, not the cheapest, right? But I will assume that, you know, GPT-4.0 won't be around for forever, right?

So I do need to also understand that I need to start using this model. I need to get used to it. I need to adjust how I talk to it. I need to adjust my expectations, right? This is why also I'm updating our free Prime Prompt Polish course. Yeah, I know it's been a few months. Don't worry. We're shooting for a March date. Keep an eye on the newsletter for that, right? This isn't for me, but I'm still going to use it.

Right. It's funny because I think for probably a good 15 years, you know, my friends and coworkers have called me a computer, right? They're like, Oh, Jordan's a computer. Yeah. He's not human. Beep boop, beep boop. That's why for me, right. I don't know. I don't like for me, I don't need a relatable chat bot. I don't, I don't like, I don't need to talk to something and be like, Oh, this feels human. Right. Maybe it's because I,

you know my eq is not off the charts right but if you are someone uh that really cares about feeling heard about feeling understand uh or sorry understood if you want to feel a relatable relationship with an ai chatbot not in a weird way right but when i think about things like business like a business coach

a strategist for your company, for your department, right? A true creative thought partner. GPT 4.5 is going to be much better at those things. Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. Angie says, Jordan's an AI agent. Sometimes I wish I was.

I think agents don't need sleep and agents don't get tired. Those are two things that I'm both struggling with right now. Nancy, former Everyday AI guest. What's up, Nancy? Says the single reason I couldn't live without the pro subscription of Claude is because I couldn't stand to talk to ChatGPT all day. LOL. Yeah, that's a great point, right? Because I've never liked Claude.

I know people do because, you know, they're like, oh, it spits out more human sounding content and it feels more like I'm talking to a human. Well, this is, I'm not saying that GBT 4.5 is OpenAI's answer to Claude. It's not, but you will get those vibes.

You'll get those vibes that the output, the written text is going to look and seem much more human-like. It's going to seem like a much less robotic process, both in the output and in the interaction between you and the chatbot.

So yeah, a lot of people are saying, you know, and Nancy is definitely not alone in this, right? That people, a lot of people prefer Claude, uh, who use AI just for content writing and who don't want to necessarily go the extra mile, uh, in prompt engineering. And they just want to be able to get more human sounding output out of the gate, out of the box. And they want to be able to, uh,

Have it feel more like the assistant understands you, like the AI chatbot understands you, right? Which is something I think Claude's been great at. Again, for me, I'm a human or like I'm a human, but I don't know. I feel like thinking bits and bytes. I think in ones and zeros. So I don't necessarily need to feel understood by an AI or anything like that, right?

But that's a great point there. So open, and this is interesting, y'all. So OpenAI says GPT 4.5 is not a frontier model, which is wild to think, right? And that means that it's a model that does not represent a groundbreaking or revolutionary advancement over its predecessors, right? OpenAI straight up said this. They're like, yeah, this isn't like benchmarking off the charts.

This is not, they literally said, this is not a frontier model, right? Frontier models are those large language models that are supposed to be revolutionary, right? This is not it. This is more of a foundational model. But I think here, what we're actually doing is this is building for the future. I think this is all about the training data. This is about how we're interacting with this model and OpenAI is obviously collecting all of that.

They're not collecting the data that you upload, right? FYI, people always get that wrong. People are like, oh, anything I upload into chat GPT, it's, it's like, you know, it's like I'm printing it on the internet. No, it's not what that is. Turn off, uh, turn off your, your data sharing and then you're not sharing anything. Right. But you always have an opportunity. Chat GPT will ask you, sometimes it'll give you two responses, which one's better.

Right. That's being sent to open AI. Right. If you say this is wrong, that's being sent to open AI. So what I think is actually happening here is there's a big, large, expensive. They didn't say how many parameters this model is, but apparently it's enormous because the API costs are insanely high.

I don't know who's going to be using the GPT 4.5 API. I'm going to show you the prices. It doesn't compute. Using that much compute doesn't compute. So this model has to be enormous. But I do think that this is going to be the last enormous model from OpenAI. Right?

Because I think what this is setting the stage for is to get that data on how users like and interact with the model and also those that don't turn off data sharing. And I think this is going to lead for better and smaller distilled models for like, as an example, when we talk about the GPT-5 and the 04, 03 models of the future, I think are just going to be distilled based off of this super big model.

So it's expensive. And also this thing max out open AI's compute. CEO Sam Altman literally said, yo, we're out of GPUs, right? At least that's what he said. You know, he said, hey, we can't bring this out to all ChatGPT plus users right now because it'll burn us. He literally said, we're out of compute. We're out of GPUs, right? Which means this thing is enormous.

Like I said, the benchmarking improvements, modest, not meaningful, right? A lot of times when you get a new frontier model, it completely shifts the conversation on benchmarks. And you're like, well, this one went through the roof. This is not that.

And it's designed for more natural human interaction. And like I said, I think this is laying the foundation for future smaller models. And it more combines the EQ with the IQ, right? That emotional intelligence with traditional intelligence. That's what this is. This is a much more human touch. All right. Here's what OpenAI said specifically about 4.5. Said, we're releasing a research preview. Yeah, this is a research preview, y'all. Keep that in mind.

Our largest and best model for chat. They didn't say our best model. They said our best model for chat, best model for humans to chat with, right? Uh, GPT 4.5 is a step forward in scaling up pre-training and post-training by scaling unsupervised learning. GPT 4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning.

Early testing shows that interacting with GPT 4.5 feels more natural. It's broader knowledge base, improved ability to follow user intent and greater EQ, emotional intelligence, makes it useful for tasks like improving writing, programming, and solving practical problems. We also expect it to hallucinate less. Remember those two words I said? I think it's going to be more reliable and more relatable.

So who can benefit the most? I told you kind of what's new, gave you some of the bullet points. Who can actually benefit the most, right? Like when are you going to use this?

So I think for more human-like conversations, it's going to be ideal in the long run for companies to use this for customer support or for you to use this for customer support, right? Maybe you're in customer service and you just copy and paste a bunch of information in here and you're trying to work through tough customer support problems. I think it is great for understanding nuances in human language, right? It's probably...

going to be pretty soon better than humans at understanding nuances in human communication, which is weird to think about, right? So I think it's ideal for customer support, therapy, education. It has enhanced creativity.

creativity that I think will help writers, marketers, and designers generate creative ideas. And for some advanced coding and technical abilities will benefit developers and data analysts, not everything, right? This is not going to be something that you're going to, you know, plug in and use to code. I don't think that's what we're going to see here.

Although, interestingly enough, even though the benchmarks did not shoot up, this is pretty interesting. So Cognition Labs, right, so they have Devin, which is an AI programmer, very popular, and they were kind of looking side by side, looking at different models for agentic coding evaluations, right?

And even though GPT-4.5 is not necessarily supposed to be a coding tool, it did very, very well on their evaluation. So as an example, its predecessor, GPT-4.0, got a 49% on this agentic coding evaluation. And GPT-4.5 got a 65%, only trailing Sonnet 3.7, which got a 67%. So

Again, it's not going to be used. Programmers aren't going to use this. Developers aren't going to use this because right now the API costs are high. It's slow, right? Even using it, obviously, anytime a model first comes out, it's always going to be slower. But I expect this to be slower in the long run. So it's not the fastest. It's not the best. It's not the cheapest. But it's very capable, even when it comes to agentic coding. So that's according to Cognition.

Let's talk about the emotional intelligence. So more natural human-like interactions than GPT-4.0. It's going to be better at reading and responding to emotional cues. And according to OpenAI, it was preferred by users in about 56 to 63% of different use case tests against GPT-4.0. So showing kind of two responses side by side. So the majority of the time, users prefer this to GPT-4.0.

So, uh, it will presumably, and in my limited testing so far, this is true. Uh, great at storytelling and generating ideas and just generating written content, right? Which, uh,

That's something that you use AI for, which I know a lot of people do. I think there's so many use cases people should be using AI for, but they're not. And they're just like, yo, I need help writing this blog post or I need help writing a paper. Right. And ultimately that's what they're using it for, which I, why I think a lot of people flocked to Claude early on. And I'm like, nah, like you can do this in chat GPT. You just got to know how to use it. Right. So I think it's going to work.

become a much better writer. It's going to write more clearly and concisely. And also I think it's going to be much stronger in design and creative tasks. This is a good news, y'all. This is good news. Second straight day that the sun is shining in my face.

And I had to close the curtain here. Oh, bless up. There's nothing worse than waking up for a live stream in the months of winter. And it's just dark outside, right? Ah, sunshine, right? Maybe I need to go outside and touch grass. And then I'll appreciate this more human side of GPT 4.5.

All right, let's talk about some of the technical features. So according to OpenAI, it is trained. This was trained with 10 times more computing power than previous GPT models. Also, 128,000 token context window for deeper conversations. I'm going to be testing that one ASAP.

Because for years, OpenAI has said, and maybe they just didn't differentiate, and maybe this is just the API, and they didn't say, hey, it's 128,000 in the API versus 32,000 when you're using chat GPT. So I'm going to be testing this. Don't worry. And I'm going to talk about that in our part two of the show, because for so long, when you were using the chat GPT,

version, right? The front end chatgbt.com, uh, it hasn't had 128,000 token, uh, context window, right? So that means that chat GPT will start to forget things much sooner. So it actually had a 32,000, uh, token context window, which is about 26, 27,000 words of back and forth interaction with chat GPT. And then it would start to forget things.

So I'm excited to test that out to see if that is just on the API side or if that's going to be also in the chat window. If that is in the front end chat, that's going to be big. Also improved coding, especially in complex tasks.

So, again, according to OpenAI, this was pre-trained simultaneously in multiple data centers, which I believe will be the first for a model like this, right? That says something when, you know, OpenAI has access to some of the biggest data centers and the most compute in the world. They're like, yeah, we can't train this in one place. This is too big. I don't know how big this model is, right? It's got to be big.

is multitudes larger than the original GPT model, right? If it's costing this much in compute, if it's costing this much via the API, if it's causing open AI to run straight up, run out of GPUs, it's gotta be huge. I don't know, right? Reportedly earlier versions of GPT-4 were about 1.8 trillion parameters. I don't know. This thing's gotta be

Double that? Maybe? Maybe more? I don't know. But that's not sustainable in the long run, right? Which is why I think this is actually a foundation for OpenAI to better distill and to make better, smaller hybrid models as they switch to that kind of setup. Also, can handle tasks involving visual understanding. So I did some tests on this.

Vision capabilities, pretty good so far. We're going to do more on that in our part two show. We're probably going to show some comparisons between GPT-4.0 and GPT-4.5. But out of the box, does very well with visual understanding and being able to see and synthesize information in photos. So yes, this is multimodal. FYI, right now it has access to all of the tools. I should have maybe started with that, right? Because these reasoning models, a lot of them, the O1 series,

doesn't have access to all the other tools, right? Canvas and, you know, Dolly and advanced data and chat GPT search, right? All these other tools that really make a large language model

agentic, right? Didn't have access. So right now you do have the rest of the tools, although I would like and hope that eventually we will see tasks get GPT 4.5 as well as GPTs. My gosh, open AI. I know there's a lot, not a lot, but plenty of you all listening because you reach out and let me know. Can we update GPTs please?

These poor things are just like that, that poor forgotten about child in the corner. Right. This is Macaulay Culkin in home alone. You know, we're leaving to the airport without GPTs, GPTs,

Y'all, enterprise companies, they hire us and they want to build us GPTs, right? And I'm like, y'all, like, I don't know. We might have to build you projects instead because poor GPTs are in the corner and they haven't been updated in forever, right? So hopefully we see the GPT 4.5 model eventually be rolled out to other things like tasks and like GPTs. Here's the thing, reliability. Let's talk about accuracy and knowledge because I started the show off by saying,

A lot of companies didn't get on board with AI because they're like, yo, it lies. It hallucinates is GPT 4.5 free of hallucinations. Absolutely not. If you know how to use it, you're probably going to see a great reduction in hallucinations, but, uh, according to open AI, it knows more and hallucinates less. So, uh, the hallucination, uh, rate has gone down significantly higher accuracy and factual questions, but what's important to know the knowledge, uh,

cutoff has actually been rolled back. So, uh, GPT-4-0 has a knowledge cutoff of June 2024, which is reasonable to work with. This one is October 2023. Uh, right. So I'm sure they'll be updating the knowledge cutoff in the future, but just know that

If you're using GPT 4.5 right now in the chat or when you're using it, when it rolls out to chat GPT plus users, you should probably in many use cases, use our refined cue method that we teach in our free prime prompt polish prompting course. Okay. You need to bring in more accurate and more up-to-date information for whatever it is you're working on to get started with, or make sure you go retrieve that by using chat GPT search, right? Here's the thing.

I'm going to have to do a dedicated episode just on training data and what this means, right? So people think, oh...

That means it knows every single thing and it's a hundred percent accurate and up to date by October, 2023. No one doesn't, uh, right. A lot of these data sets that companies use to train their models, you know, by saying, Oh, it cut off in October, 2023. Well, what happens to that data set is updated once a year. What happens to that data set has some extremely outdated information. You hope that through reinforcement learning with human feedback, uh,

You know, a lot of that older information gets kicked off when they're going through and they're, you know, training the model, but not necessarily. So keep that in mind. The knowledge cutoff is rolled back. You need to do a better job. If you are using GPT 4.5, you need to do a better job at making sure it has more accurate, more up-to-date and relevant up, like fresh information. If you are relying on it for accurate analysis,

up-to-date outputs. The entire world changes around us every single day. So to work with knowledge, a knowledge cutoff from 2023, you gotta be careful, right?

It is computationally demanding. So like we said, it's very limited right now. OpenAI's ability to scale this out to users because of GPTs, sorry, because of GPUs. Also, it does have weaker performance and complex reasoning compared to specialized models. All right, let's talk a little bit about accuracy and knowledge. All right, so...

This is simple QA, which is actually OpenAI's own benchmark, right? I would really like other people to start using this or something like it, but this is essentially like, is this getting things correct, right? So simple QA accuracy where higher is better. This is just, is it factual? Is it getting questions correct? Can it recall information in the right way? So on this,

Some of GPT's previous models or some of OpenAI's previous models. So GPT-4.0 scored a 38% on this, where GPT-4.5, not double, but pretty close, got a 62%, where even the reasoning models got a 47% and a 15%. So if you're wondering what's the point of this model, boil it down to two words. It's...

relatable and it's reliable. It has a much higher accuracy. And let's be honest, we just sometimes look past large language models and we just assume that it's always accurate and we can always rely on them. That's bad. I don't know why people are trying to take human out of the loop and we expect large language models to always be 100% factual and accurate, right?

They're trained off the internet. Is the internet 100% factual and 100% accurate? Absolutely not, right? I read an article on, I think it was ChatGPT, from a huge publication last week. And it was completely wrong. All their facts were wrong, right? I'm not going to name shame them. Maybe I should. But a publication we've all heard of. Everyone out here reads it.

I was thinking about like roasting them on Twitter and fact checking it. And I'm like, this is all wrong. This is all not correct. Right. But guess what? All this information that people put out on the internet, sometimes people intentionally put out misinformation, disinformation. Sometimes people don't know what they're talking about, but all that goes out on the internet models, gobble this up. And you hope that humans can pick out, uh, you know, information that's in the training data. That's not right. Versus what's right, right through reinforcement learning. But,

much more accurate almost twice as accurate and what's what's pretty interesting for me at least

is O3 Mini there with a 15% on this simple QA accuracy and GBT 4.5 with a 62%. I love O3 Mini. It's probably my most used model, right? Again, I do a good job at making sure I feed it the accurate and relevant information that it needs. And I'm not necessarily always relying on it to go and seek and find the absolute truth on its own. But out of the box,

Jeep GPT 4.5, according to open AI's own internal benchmarks, extremely reliable. Uh, and let's talk about hallucination rate. Same thing, uh, much lower in this case, lower is better. So in their tests, it's only, uh, it's a 37% hallucination rate. Now I want you to keep in mind that doesn't mean it hallucinates 37% of the time.

In these tests and in these benchmarks, they're hard, they're tricky. They are made to get the model to kind of screw up, right? So a very, very, very low hallucination rate actually for 4.5 with a 37%, where O3 Mini as an example, 80% and the GPT-4-0 at 61%. So again, these are intentionally very difficult questions that are meant to make

models hallucinate. So it's more reliable. It lies less.

All right. Other benchmarks. Again, nothing here is jumping off the page. Many of the major benchmarks, this is not OpenAI's best model, right? In some cases, it's actually about the same or on par with GPT-4.0, or it's just behind O3 Mini, which again, that is my workhorse model. I'll probably do, let me know, live stream audience, let me know yes or no.

Should I just do a show where I tell you what models I'm using and for what? I might have to wait a couple of weeks to see how and where I'm using GPT-4.0.

I had some people ask about it recently. I didn't think it was that interesting, but if it's interesting, let me know. Maybe, I don't know, maybe it will be more interesting now that we have like nine models to choose from. But one thing that I thought was pretty impressive about these benchmarks. So it did score better in the MMU, which is the multimodal equivalent of the MMLU.

And it scored fairly well on the M M M L U, which is the multilingual equivalent of M M L U. So M M L U has historically been one of the, you know, it's one of the benchmarks that we talk about most. I say it's like the ACTs for AI models, right? So it did perform well or better than GBT 4.0 in those models pretty significantly, but a sweet Lancer. I love this. So this is a,

an actual test that, uh, open AI developed and, you know, other models, they use other models. Right. And essentially like when Claude came out, Claude was better and open AI said that they're like, yo, Claude does way better at sweet Lancer. Right. This is essentially a test where, um, it goes out and performs the type of tasks you would see on like Upwork. Right. Uh,

But this one outperformed the other models by far. It completed 32.6% of tasks, whereas OpenAI 03 Mini completed 10.8% and GBT 4.0 completed 23%. So that's interesting. Also,

O3 Mini, I'm guessing at the time, did not have access to all the same tools. O3 Mini does have access to the internet, which is huge because the other O models do not have access to the internet. All right, here we go. Here we go. The costs. I don't know. You know what? I actually can't wait to talk to companies that are using this on the API.

Because I'm not sure who is going to use it. It costs $75 per million input tokens and $150 per million output tokens. So expensive. So I guess it's going to be those people that, those companies that really value a reliable and relatable marketplace.

So this is just if you're using it on the backend, on the API, right? So if you're logging into chadgpt.com, you don't got to worry about this, right? But I do, I would assume that when this does roll out to plus users, it's got to be limited. I don't see them rolling out

this extremely expensive model that they're probably going to be losing money on. When you look at the API costs, I don't see ChatGPT Plus users getting unlimited access to this. I would assume that there would have to be some rate limits. So let's go ahead and look at some of the cost comparisons. Ready? So I said $75 per million input, GPT-40, $250.

$2.50. So we went from $2.50 to $75. Yikes. Yikes. 30 times more expensive. Is that right? Did I, I just did that math in my head. Hopefully that's right. Uh, and then the output 15 times more expensive, the output for 1 million tokens for GPT 4.0, uh,

$10. And then on GPT 4.5, 150, right? And like everyone was losing their marbles when Claude 3.7 Sonnet less than a week ago came out and their API pricing didn't change, right? And everyone's like, oh, Claude Sonnet is so expensive, right? And aside from if you're using it from coding, there's no need to ever use Claude Sonnet 3.7 via the API. Now you're looking at GPT 4.5 and you're like, all right, well, Claude 3.7 Sonnet doesn't sound like that bad.

$3 per million tokens input and $15.4 million output. So yes, I mean, we were looking at the Claude 3.7 Sonnet versus GPD 4.0 and Claude is like, oh, it's like, okay, well, that's 50% more expensive for output. Oh my gosh. And then GPD 4.5 comes out and says, hold my GPU, right? You won't believe this price. I don't believe it, but we'll see. We'll see who uses it.

Clearly someone's going to use it. So let's talk about the strategic impact. Like I said, I think this is a base model for future AI development. I think it moves open AI towards integrating soft skills with technical skills, right? That's like when we talk about, oh, this is a vibes model. This is an EQ model, right? Where I think previously, which is why I never had a problem with it. I don't need vibes. I don't need EQ, but I think a lot of people do.

I think looking at it even as soft skills, this is a soft skills model, which I think why it's actually a big step forward. But it's a big step forward in areas that we're not used to looking at. Normally, we look at big step forwards in AI models, in benchmarks, in features, but we're not looking at it in terms of it being more relatable and reliable, like a human. So I think that's a big thing is this is going to be probably the most human model out there.

is going to be the best at certain tasks? No. Is it going to be the most reliable and relatable model? Probably. And I also think that this is indicating a possible limits to continued scaling with the current GPT versus O-series architecture. So yeah, next we're going to see this hybrid setup. All right. So

That's a wrap, y'all. Look at that. We didn't go one full hour. Bless up. So we're going to do a part two next week. Let me know what use cases do you want to see? Let me know in the comments today and the newsletter. So make sure you go sign up to it. You can just reply to the newsletter. Let me know. What do you want to see?

Do you want to see writing use cases? Do you want to see a creative strategist? What do you want to push the boundaries on? So we'll do this show likely either next week or the week after. I got to look at what we have scheduled. We have some great guests coming up, y'all.

I'm very excited. I know sometimes, you know, the show, I do a lot of the shows. Sometimes we go through periods where it's a lot of guests. Sometimes it's a little bit in between. We have some fantastic guests coming up. But let me know what type of hands-on you want to do. We're going to do it live. We're going to do it hands-on. Let me know what you want to see. Also, go to our website, youreverydayai.com.

Sign up for our free daily newsletter. We're going to be recapping this. But like I said, high level here, GPT 4.5, it's out. It's only out for pro users right now on that $200 a month plan or if you're paying through it via the API, which is crazy expensive.

This is not a groundbreaking model by traditional metrics. All right. But I do think it could be a groundbreaking model by just the vibes, by how we feel, by how we interact and think about the AI. All right. A couple of questions. Let me see.

Douglas was saying, how would a mixture of models compare to the idea of a reasoning orchestrator and then transformer agents specialized? Oh, Douglas, you're really trying to push this episode to more than an hour. All right. I think I covered most of that in the AI predictions show. So say,

I'll say, go listen to that Douglas. Maybe, maybe I'll leave you a more thoughtful comment on the live stream later and explain that. Yashal, sorry if I got that wrong, is asking Jordan, would you use 4.5 instead of 4.0 moving forward? Here's the thing.

It's better. 4.5 is better than 4-row, right? It's just not better in the same step that normally a new model would be, right? There's very few metrics or instances where 4.5 is going to be worse, right? Which is interesting because a lot of the chatter so far around Sonnet 3.7 is a lot of people are saying it's worse for certain situations than 3.5.

It's still too soon to answer that, but from everything that I've used it for, unless I need speed out of 4.0, which is usually not something I'm looking for, right? I'm patient enough. Uh, but I don't think that I'll be using 4.0 much, uh, except, you know, in GPTs, right? Uh, except with tasks. But for the most part, if I'm looking at a non-reasoning model, I'm probably going to be using 4.5. Uh,

Samuel asking, does 4.5 support live voice, Canvas, et cetera? So voice is still powered by 4.0, but you can be in 4.5 mode and use voice. I did test that last night. So it's not a new voice model, but 4.5 still integrates and works.

with voice mode. It does also work with canvas. I did test that last night. I also test the combination of the two. So you can be in a GPT 4.5. You can use voice mode and it will update in canvas. So pretty cool. Uh, Sam Sara from YouTube is saying, why is Google so bad? They're so bad that no one wants to compare their models against Gemini. I think Google's great.

if I'm being honest, right? I think their front end Gemini chat really was neglected until about five to six months ago. I think the new Gemini models are fantastic. I think their integration into Google workspace leaves a lot to be desired. I think their AI studio is extremely powerful. But no, I think the Google models, I mean, they're top of the charts for many benchmarks, including the LM arena. Yeah.

Yashel with another great question here. Is there a benchmark to measure EQ for AI products? As far as I know, no, because I was researching the same thing. So yeah, that should be interesting. How can you benchmark these soft skills? I don't know if there's going to be one that's developed. I would assume after this model, there will be one that developed right now. There isn't one. Douglas asking, are you going to look for a PPP update that has transformer model and reasoning model for the different methodologies?

Great. Great question, Douglas. So the PPP, and it's still going to be free. The updated PPP is still going to be based on the GPT infrastructure and the PPP Pro also free. We'll go over prompting for reasoners as well as some other advanced features. All right.

We got through most of the questions, y'all. Thank you for tuning in. I hope this was helpful. Let me know in the comments. Please share this with your friends. If this was helpful, our team spends a lot of time putting this together. We want you to be the smartest person in AI at your company. So if this was helpful, please let me know and let others know as well. Share this and go to youreverydayai.com. Thanks for tuning in. Y'all, we'll see you tomorrow and every day for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

GPT-4.5：一场关于“氛围感”的AI革命

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 472: OpenAI’s new GPT-4.5: What’s new and who can benefit the most 52:09 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

GPT-4.5：一场关于“氛围感”的AI革命

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 472: OpenAI’s new GPT-4.5: What’s new and who can benefit the most