This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. GPT 4.5 is officially the world's best large language model. But how do we use it, right?
This is something that a lot of people have been talking about for the last few days since OpenAI released its first big updated base model in more than two years. Because two things that we're going to talk about today, and we're going to do a live demo as well, about how this new model, it's more relatable and more reliable. But that's really just begged the question, okay,
What does that mean for actually how we use it? What is it actually better at, right? And when we talk about a model excelling in emotional intelligence, I mean, you can't really benchmark that. So how do you actually know when you might want to really take advantage of this new GPT 4.5 model?
All right. We're going to be answering hopefully those questions and a lot more today on Everyday AI. What's going on, y'all? My name is Jordan Wilson, and I'm the host. And Everyday AI, it's for you. It is your daily live stream podcast and free daily newsletter helping us all not just understand AI, but how we can use it to actually grow our companies and grow our careers. So yeah, when all these new models come out seemingly every single week,
You might be wondering, is this for my department? Is this for my company? Well, hopefully at least after today's show, you'll have a little bit better of an idea, at least when it comes to GPT 4.5, OpenAI's newest model. All right, so if you're new here, thank you for tuning in.
It's this thing's live. It's unscripted. It's unedited. So, you know, I try to bring you something real in artificial intelligence, which I think is rare nowadays, right? Everyone's creating all these super polished rundowns of, you know, models and, you know,
you know, using AI, even avatars, even right. Like this is real. So, you know, if you are listening on the podcast, thank you for tuning in. Please make sure to subscribe to the show, leave us a rating. That would be great. And join the live stream, right? Yeah. We have real other real humans, you know, hanging out here in the live stream. So thanks for everyone joining us. Max Holman down in Chicago, big bogey face on, on the YouTube machine, Douglas, Woozie, Sandra,
Let's see who else. Christopher, Brian, Marie. Thank you all for joining. All right. So I am going to need your help. All right. I don't know if these comments went on YouTube. Maybe they did. Maybe they didn't. All right. But I listed 13 different questions.
I want you all to go through. I have them on my screen here. I'll show them in a bit. They're super small. Just write down the number of which one you want. All right. So just say, hey, I want to see number five. You know, I want to see number seven. All right. So live stream audience, I need a little help from you guys. If you scroll through the comments, hopefully you should see it. I do have a slide up here later. It's super small, but let me know which one you actually want to hear.
All right, let's get into it, y'all. So if you want the daily AI news, sorry, go check the newsletter.
Didn't have time to put it all together this morning, if I'm being honest, because I was putting in a lot of work on this show. I don't sleep a lot, if you haven't noticed, you know, live stream audience by looking at me. I don't sleep a lot. Yeah, I have AI go do a lot of my homework, but it's a lot. So if you do want the AI news, we're going to have that in the newsletter. Don't worry. And this is also as an FYI, this is part two. So I specifically, you know, I noticed and I heard from a lot of you all, right? Like you're like, hey, these shows are too freaking long, right? Right.
I don't need an hour, 10-minute podcast on a new AI model. So we actually broke a bigger show down into two separate parts. So if you want to hear the first part, where we went over a lot more of the technical detail, we went over some benchmarks, we went over a lot of those other things for OpenAI's new GPT-405 model. So if you want that, go listen to episode 472. So you can just scroll like two episodes back, two episodes back.
That one is called OpenAI's new GPT 4.5, what's new and who can benefit the most. But today we're going to be looking at really a comparison. We're going to be going in and actually using this model, hopefully with some feedback and suggestions, some suggestions from our live stream audience. But
Let me just go over some of the basics. Some of these we already covered in the previous show. Some things we didn't. So here's kind of some talking points from OpenAI. So they are reiterating that GPT 4.5 is a research preview. It is their largest and best model for chat.
All right, for chat. It is a step forward in scaling up pre-training and post-training. And by scaling unsupervised learning, GPT 4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning.
All right. Early testers, early testing shows that interacting with GPT 4.5 feels more natural. It's broader knowledge base, improved ability to follow user intent and greater EQ, emotional intelligence, make it useful for tasks like improving writing, programming and solving practical problems. And they also expect it to hallucinate less.
All right. So this is kind of some of the bullet points that I said on our first show. You know, this is the last non-chain of thought model developed by OpenAI. So OpenAI CEO Sam Altman did say, hey, future models that are under this GPT-5 kind of architecture, it's going to be a hybrid model. All right. So keep this in mind. But this is a new, essentially a new base model.
All right. And when we talk about reasoning models like 01 and 03, right? So we might not actually, you know, see an 04 as an example, right? Just like we might not, you know, see, you know, certain minis, 04 mini, right? It just might all be under GPT-5. We don't know yet. They might say it's GPT-5 and it uses 04 reasoning, right? But in the future,
you're just going to be working with one model. And that's why this is extremely important. I think a lot of people were kind of like upset, right? And they're like, oh, this GPT 4.5 didn't break every single benchmark, right? This GPT 4.5 is extremely expensive in the API. Yeah, I don't know any company that is going to be able to afford to use this in the API, right? For like $75,000.
input in 150 output per million tokens, which is just, you know, 30 times more expensive than their previous GPT-4 models. So, but I did say after the first show, I said, humans are really going to like it, right? Because this is, I think I've always, you know, anytime you go chat, quote unquote chat with any AI model, I don't know, to me, it's never felt human.
Feels like you're chatting with a machine. GPT 4.5 is the first time I felt different, right? To me, it doesn't feel like I'm chatting with a human. I don't know what it is about me and how I tick as a person, right? I know, right? Yeah, it's like, yeah, I'm talking to a computer still, but it felt like real human.
Felt like a real human computer person, right? I know a lot of people like reading people's first experiences. They're getting enthralled in GBT 4.5's ability to carry on a conversation and to show kind of these EQ skills
tones that make humans human, right? People are always like, hey, what separates humans from AI, from humans, from large language models? And widely, those things are usually considered things that are EQ, right? Emotional intelligence, being able to understand nuanced conversation, right? And right now, at least from a text standpoint, GPT 4.5 is doing amazingly well. And I did predict that humans are going to like it. And sure enough,
Humans loved it because even though GPT 4.5, and again, this is a preview,
Even though it did not break every single benchmark ever, right? Which is what I think a lot of people were expecting or were hoping from this model. What it did do in the LM arena. Okay. So I talk about this, the easiest way to think about this is, you know, those like blind, like Pepsi versus Coke tests, right? From, I don't know when that was, the late nineties, early two thousands, right? Someone goes, there's no label. They drink both and they say, oh, this one's better.
Right. Uh, that that's kind of what, uh, ELO rankings are, uh, or the arena score from LM arena. All right. So what this means is you put in a prompt, uh, you get two different outputs. They're blind. You, you know, so you don't know which one's which and you choose which one is better. So this is at least in terms of how humans actually use a model, right? Yes. We have dozens of benchmarks that test different things from, from coding to, uh,
writing, to math, to science, right? So you have all of these kind of, you know, systemized and organized and categorized benchmarks, but it's always like, well, what about humans, right? Do humans care? Will humans notice? Well, with GPT-4 or 5, the answer is yes, right? Because it quickly shot up to the number one spot in the LM arena board. So think of this, you know,
Every single model out there is in this, right? When you go into this blind taste test. And GPT 4.5, immediately, once they got enough votes to rank on the chart, they were number one. So the best model in the world. I do know a couple hours later, Grok's newest version came on. So I think it's technically in a tie now. But still, even without smashing every single benchmark,
This new model, GBD 4.5, just elevated itself to, I think, probably the most preferred general use case model in the world, which is extremely important because, like I said, in the future, these reasoning models are going to be built on top of this. All right. So let's talk a little bit before we jump in live.
In live stream audience, I see a couple of you voted. If you could go through, let me know which one you want to see. I agree with Douglas. Douglas said Jordan needs coffee. That was me sipping on the coffee. But let's talk about a little bit about the model itself and how it performs inside of chat GPT, inside of the chat GPT interface. Also, FYI, let me get this off my chest, right?
Cause there's people online, you know, and they're like, oh, I tried this model. It's in, you know, and I'm like, oh, okay. How'd you try it? Oh, via, via a third party. You can't do that. All right. So I, I do have to talk about access because at least as of this hour, uh, GPT 4.5 is only available for pro users inside of chat GPT. All right. That does cost $200 a month, but presumably either by this week or early next week,
Uh, that will be going to all paid chat GPT subscribers. So even if you are on the $20 a month chat GPT plus,
you should be getting access to GPT 4.5. So you might not have access to it now, but I highly encourage you. Yes, there's third-party platforms. If you're on paid versions of other, perplexity or POE or something like that, you can probably go use 4.5 in a limited capacity if you're on a paid plan for one of those services, but that's not the best way to understand a model, right? You should be using it in its natural environment. So yes, there's also the API that's available, but that's extremely expensive.
All right. So if you are trying to see what's best for your team, and a lot of times I don't understand why every single big fortune 500 in America doesn't have at least a teams or an enterprise chat GPT account. It's
It's mind boggling to me because yes, your company can have an internal version, right? That maybe you use for customer support or sales or something like that, but you should all, you know, and I'm not just saying chat GPT, but you should every single employee. If you are a CEO of a small to medium sized company, if you will, if you are a, an IT leader, if you're a CMO, whatever, you need to be pushing for your entire team, whatever your AI operating system of choices, you need to also implement.
have a full team or enterprise license whether that's chat gpt gemini obviously co-pilot right if you're a windows organization microsoft organization uh quad whatever it is but uh because when you are using these models inside the chat interface they come with a lot of tools right uh i
I did a show probably about a year ago. I should update it. Kind of like, hey, what needs to happen for us to get to something like artificial general intelligence? And one of the things is number one, a model needs to have access to the internet. And number two, it needs to also have tool use, right? So this tool use, this occurs inside of the ChatGPT interface. And yes, third parties sometimes have versions of some of these tools, but I mean, the tools are best in the native interface.
But right now, at least, not all of the tools and features work with GPT 4.5. So let's go over what does work and what is available versus what isn't. Because OpenAI didn't say this. I went through and tested it all for you so you know. So again, whether you have a pro account now or you're going to be getting GPT 4.5 in the coming days or weeks, here's what's available out of the box. Okay, so projects.
You can use GPT-445 in projects. You can use DALI, you know, the AI image generator, which I don't know why anyone would, right? It's not that good, although it will be getting updated soon. Sora does have photo capabilities for early beta testers, just
FYI. And also if you don't know projects, that's essentially where you can organize chats into one folder, which is great, but you can also upload documents that that folders chats can access to as well as special instructions. So it's similar to GPTs, a little different.
Uh, so, uh, GPT four or five does have access to projects. It does has, it does have access to Dolly. Uh, it does have access to chat GPT search, which is extremely important because actually the knowledge cutoff for GPT 4.5 was rolled back
So its memory is a little worse, or at least it's the recency in the training data. So GPT-4.0 is June 2024. GPT-4.5 is October 2023. So keep that in mind. And that's why it's important that GPT-4.5 has access to chat GPT search.
You can upload files to GPT-45, which is a must. Glad that's there. Also Canvas mode, one of the most underrated, I think, tools or functionality of any large language model company out there. Canvas is available.
So right now unavailable, and this is as of the time of, you know, when I checked, uh, 19 minutes ago. Okay. Uh, right now tasks do not work with GP, GPT four or five and GPTs. So those custom small versions, uh, of chat GPT that you can create, uh,
Doesn't work right now with 4.5. Both of those things, both tasks and GPTs, those obviously still work with GPT-4.0. All right. So let me just boil this down to two things. I wish OpenAI would just put these two words somewhere very large on their GPT-4.5 page because a lot of people are asking.
Two things that I think really separates the biggest difference between four or five. And this is in my experience so far, uh, it is more relatable. So more human ask, uh, right. That EQ and more reliable. So we went over the reliability a little bit in our first show going over benchmarks, accuracy, lower hallucinations, et cetera. It just knows more. It knows way more. Uh, actually there was a, you know, there's a website that does, uh,
Sorry, IQ scores for large language models and GPT-4-5 actually got the highest score for a non-reasoning model, which is pretty impressive because that was the first time a non-reasoning model performed at the same IQ level as the average human.
right? Which is pretty big, right? When you have a reasoning model, it does way better because it uses more compute. But the fact that a non-reasoning model in GPT-4 or 5 scored this high on an offline
IQ test, right? So this is an IQ test that is not in training data. It's pretty impressive. So it is definitely more reliable, but it is also much more relatable because the emotional intelligence. So this is from OpenAI, but more natural human-like interactions than GPT-4.0. It's better at reading and responding to emotional cues, and it is preferred by users as well against GPT-4.0.
All right. So we're going to jump in. We're going to jump in soon. All right. I know this is small on the screen. I don't know if these comments posted to YouTube as well, but I'm going to go ahead and try to send them again. So live stream audience. I know a couple of, a couple of you guys have already voted. I don't know if these comments are coming through. Hopefully they are. But I have 13 essentially examples. All right. And I want to do these live. I want to maybe do two or three. We'll see how long it takes. And I want to show you the difference between,
The difference between a query in four or five in a query in four. Oh, full disclosure. Haven't run any of these yet. Right. I run a ton of tests, but I like to do this live. This is unedited unscripted. Right. So live stream audience, just put the number, right. Try not to put anything else, just put the number. And I'm going to scroll through the comments here. Uh,
on the right side of my screen, bringing in comments from LinkedIn, Twitter, YouTube, et cetera. So which one do you actually want to see? So I haven't done these and I'm going to read the prompt out. These are very short prompts, right? They're supposed to be short. I'm not going to go through the whole like prime prompt polish process, which if you want the best output, you should be doing the basics of prompt engineering still. But I want to show you just hopefully some short
prompts, the inputs, and then the outputs. And we're going to run this in, uh, GPT four five and GPT, um, four. Oh, and talk a little bit about the differences and, you know, hopefully we'll see the difference. Who knows? Maybe we won't. Uh, that's the downside of trying to do, uh,
unscripted, unedited demos and examples inside generative AI. So maybe if you're brand new to chat GPT, large language models, generative AI, and you don't know a ton, that's fine. We try to keep it simple. But let me say this, generative AI is
in large language models, they're generative. They're not deterministic. So what that means, as an example, if you go search for something on the internet, search engines are, for the most part, deterministic, right? Yes, there's some personalization and some localization, but for the most part, those search results are going to be roughly the same every single time you put them in. A large language model is completely deterministic.
It is a roll of the dice. It is generative, right? It is, you know, there's some next token prediction. So, you know, you could, in theory, put the same prompt 10 times. You could get nine very different answers. You could get two very different answers. You could get five things that are pretty much the same, but just worded differently. So that's another thing to keep in mind. Generative AI is generative, right? Which is why sometimes these live demos are super fun. All right. I see some votes here.
All right. I'm looking through here. I'm seeing which are some of the most, uh, some of the most voted ones. So I have 13 different examples on the screen. Uh, and, and I really focused on, uh, a couple of things. So these prompts are supposed to, uh, you know, rely on creativity and intuition, you know, so storytelling, uh, being able to, to think and write clearly, uh, strong and designing creative tasks, but they're also really around these four categories where I think EQ shines in a large language model.
Right. So think so if you are using ChatGPT as a personal or life coach. So some of these prompts are more in line with that. If you're using it as a therapist, if you're a work therapist, even right to work through tough problems, tough issues. How do I send an email? Right. Those things. Content writer. Content writer.
business strategist, and creative partner. So that's where I think some of the categories where the everyday person is going to really see the benefits of four or five. So let me repeat that. If you're using this as a personal life, business coach, therapist, content writer, business strategist, and creative partner.
There's other things that it's actually going to perform really well in. I know a lot of people are saying, oh, 4.5 isn't great at coding. It's actually really good at coding across the board. In the LM Arena test, it swept everyone in almost every single category. So it is measuredly better in almost every single category that you would use a large language model for. But I think hopefully we'll see the biggest improvements in some of these areas. All right.
So let's go ahead. Let's see if we can do this live stream audience as always. Please let me know when you can see my screen. We're going to do this live podcast audience. I'm going to try not to make this one too long. I'm going to try to be somewhat concise. All right. So as a reminder,
If you have a normal chat GPT plus plan and you log on in today, you're not going to see this 4.5. All right. But when you do, uh, I'm guessing within a couple of days to a week or two, uh, this should be rolling out to most paid users. Uh, so chat GPT plus, uh, chat GPT teams, chat, uh,
as well as enterprise. I think enterprise might be a little after for those, those companies that hire us to train their large teams. You might not be getting enterprise in your PPP biz, you know, training, at least not in March. All right. So we need to select GPT 4.5 in the dropdown. All right. So last year, let me know if you can see this. I'm looking to see which one, some of our most popular ones. Okay. Okay.
Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.
Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,
or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. All right, I see quite a few people...
And again, both on LinkedIn and YouTube, it looks like, and a couple on Twitter. Okay, so I'm seeing a lot of threes. I'm seeing some sixes and thirteens. Let me go up earlier. Five, 12. Okay.
9, 11, 7, 12. Okay. So I see quite a few 12s. All right. So let's go ahead and grab number 12, shall we? All right. Live stream audience, can we see the screen? All right. Let's do number 12. Let's see what number 12 was. Okay. People were going for humor. I wrote this one and chuckled. All right. So here's what number 12 was. People wanted to see this.
It's saying you work at Microsoft, write a convincing memo on why Microsoft Zune should be reinstated. Y'all remember Microsoft Zune? All right. So actually what we're going to do is we're going to first do this with four. Oh, I'm going to read the response. Then we're going to do the same with four or five and we'll see how long it is.
All right, now I'll try to do my best to even describe what's going on. So unfortunately, ChatGPT is being extremely slow right now. So this is going to be fun. All right, so GPT-4.0 is giving us a memo. It is emoji heavy, which I'm not a fan of. And also, I intentionally left these prompts kind of open-ended, right? Because I...
I'm doing this for everyday people, right? Everyday people write in a super short prompt and they hope to get a lot out of it. And I think, I think, and I hope maybe that's where GPT-4.5 shines. Okay. So here's what GPT-4.0 says. Here's a compelling internal MO advocating for the return of Microsoft Zoom.
I really wanted to Microsoft zoom when it first came out. All right. So like I said, there's a ton of emojis. It gives a date to Microsoft leadership team from your name, your position subject, the case for reinstating zoom. So,
It's pretty long so far. It's going a little bit slow, so we might have to cut it off here if we want to do two or three quick examples without turning this into an hour show. So it starts with an executive summary. It says,
Given the resurgence of retro tech nostalgia, the rise of dedicated music devices, and the renewed focus on hardware-specific integration, Microsoft is uniquely positioned to reintroduce Zoom as a modern AI-powered, cloud-connected audio experience.
This memo outlines how a Zune revival aligns with current market trends and Microsoft's broader strategy in AI, hardware, and consumer engagement. So, so far, executive summary from 4.0 seems pretty, pretty good. Little wordy.
I don't know that we need to say things like this memo outlines, you know, depends on how you would communicate to your coworkers internally. I wouldn't put something like that in an internal memo saying this memo outlines. But other than that, pretty good. So let's go through. So here's number one. It says the market opportunity.
Number two, it says the vision for Zune 2.0. Again, we're seeing some emojis, which for the most part, I don't think we'd see on an internal memo.
But that's fine. So it's going over key features of the Zune reboot included AI powered music curation, lossless and spatial audio support, Microsoft 365 and Xbox integration, cloud sync and offline playback in Zune social 2.0. I know we have some listeners from Microsoft. So, you know, let me know if this is going to hit the
Hit the shelves anytime soon. Three, it goes over competitive advantage. Four, financial viability and go-to-market strategy. So pretty good. It has a launch plan, a marketing approach. Conclusion, a strategic appeal.
All right, so I'll just read the intro and the conclusion for both. So conclusion, the Zune brand still holds strong emotional value. And with the right vision, it can be reintroduced as a powerful differentiator in Microsoft's consumer hardware and AI portfolio. By integrating AI, superior audio technology, and cloud capabilities, the next generation of Zune could redefine how people explore music and content. Let's bring Zune back.
back, not as a competitor to smartphones, but as the intelligent music companion for the AI-driven era. All right, so not bad. I wouldn't ever use this necessarily, but my background, it's in content writing, right? I was a journalist and more or less I've been getting paid to write for two decades, so I wouldn't use this. I'm sure some people would look at this and be like, oh, good enough. Yeah.
Not necessarily. All right, but let's go ahead and switch models. So yes, if you didn't know, at the bottom of any prompt, there's these little icons, right? So once you do get access to 4.5 and you're trying to see which model should be my daily driver, I encourage you to run tests like this. You can also do this in the backend of ChatGPT's kind of playground, but it will cost you on the API side. So I would probably just wait if you don't want to pay those costs.
Super expensive prices. All right. So we're going to do the same thing with four or five now. So I just clicked four or five. Interestingly enough. Okay. This is great. Four or five was, I'm guessing smart enough to know that I would probably want to use canvas mode. So it's, uh, love this, love this, right? Uh,
Super smart, FYI. So it started answering it in Canvas mode, even though I didn't tell it to. 4.0 did not because it probably knows, right, as a memo, it probably knows I'm going to want to go back and forth, kind of like how I said, eh, I wouldn't use this. It probably knows that, hey, you're going to want to update this. So let me go ahead and throw this in Canvas, even though you didn't ask me to, right? And then it has the option here to click to answer in the chat instead.
All right, so let's go ahead and take a look. We're going to read both the intro and the summary much more concise.
So personally, I like that. Maybe it's because I use AI too much. And I think, you know, let me just go on a small and short soapbox here. Large language models are trained to be helpful assistants, right? It's in their system prompt. And usually what happens if you don't go through this, you know, what we teach, like the refine queue process, right?
Uh, right. And that's when you make a large model, smarter, smaller, and more specific for all of you that have taken our, our PPP course. Right. Uh, for the most part, a model doesn't necessarily know exactly everything that it would want to know in order to provide an answer. Uh, so this is something I've seen from four or five and I personally appreciate, but this might be a preference thing. Uh,
GPT-4.5 is much more concise, right? Where 4.0 and previous models, they just get super verbose, right? And this is something that large language models do all the time. If you don't give it a lot of detail, if you don't go through a back and forth process of essentially you having a conversation with 4.0 or 4.5 about the Zune and here's my role, here's my reasoning, here's my rationale, a lot of times it just is going to spit out a bunch of
overly kind of generic content. And it's going to try to tackle things from too many different areas, but it's not going to be super strong. So that is something that I've noticed in my, I won't say heavy use of GPT-4 or 5, but I mean, I've used it fairly often. I know it just came out like Thursday, right? And today's a Tuesday. So, you know, a big chunk of that was the weekend and I was trying to sleep for the first time in like a week. But I've used it
very heavily since it came out. And this is something I've noticed. Answers are much more concise, sometimes more confident, which can be a good or a bad thing. You really just have to have an eye on the output, but it's much shorter, right? Another thing, I see no emojis, which I personally like. All right, so let's go ahead. So here's the internal memo.
So it says to senior leadership team from your name, strategic product development date subject. So let's at least read the overview, the conclusion, and then talk about what else is in here. And Hey, a live stream audience. Let me know which one also you think is better.
So overview given current market trends and the growing nostalgia for early digital media devices. I propose that Microsoft is strategically reinstates the Zune product line originally discontinued in 2012. The Zune has since garnered a devoted fan base and iconic status becoming emblematic of a simpler digital age. All right. So then it goes over market opportunity.
It outlines strategic benefits, a lot of similar things that we saw from 4.0, right? So brand strengthening, ecosystem expansion, talking about integrating it with Microsoft 365, Xbox Game Pass, Azure Cloud Services. That's cool. Differentiation, the implementation approach, going through initial market research and validation, modern product development, marketing and launch strategy as well. And then the conclusion.
So reinstating the Microsoft Zune is a strategic decision with significant potential upsides. It positions Microsoft at the forefront of a profitable retro tech revival, reinvigorates consumer interest and strengthens brand affinity. I look forward to discussing this further and obtaining your insights. So I don't know if this, in this case, if there's one
you know, definitive winner and one definitive loser. For me, my personal taste would be the second one, right? An internal memo. I mean, 4.0's version, super long, right? So if I X out of canvas here, if you didn't,
Know this at the bottom, you'll see now one of two and two of two because I use that little model switching icon. So I can just toggle left and right and compare the different outputs. So for me, I much prefer the second version. Let's see. It looks like some other people wanted number one. So let's do number one. I think this one might be a good example as well.
all right so we're going to launch a new chat uh we're going to start in 4-0 we're going to do 4-0 first and then we're going to do four or five seconds all right so this one a couple of you voted for write an email to a colleague who missed a deadline due to a family emergency all right so we're doing gpt 40 first so subject line hope you're doing okay
Hi, colleague's name. I just wanted to check in and see how you're doing. I heard about the family emergency and I hope everything is okay on your end. If there's anything I can do to help or support you, please don't hesitate to reach out. I understand that the deadline project name was affected and I completely get it. Family comes first. Let me know if you need any assistance in catching up or if we should discuss a plan to move things forward. Take care and looking forward to hearing from you when you are ready. Best, your name.
Pretty good.
I don't necessarily see, again, I'm not, let me be honest, I'm not the most empathetic person, right? I'm not the, you know, I'll say my IQ is probably higher than my EQ. So I don't see anything wrong with this email. Maybe it's, I don't know, maybe it's, you know, setting off a bunch of red flags in your mind. All right, let's go ahead. Let's switch over to the GPT 4.5 and let's see if we can spot differences in how 4.5 would handle this. Presumably it might.
It might be, I would hope it would be better, right? Since, you know, 4.5 is supposed to be much more relatable to humans. All right. So same thing here. 4.5 launched this in Canvas, which I like because then you can use the Canvas features, right? As an example, you can highlight something. You can suggest edits. You can adjust the length. You can change the reading level. You can add polish. You can add emojis, right? Which I would never do. But yeah.
Launching it in Canvas, again, small thing, super smart, but four or five assumes, right? There we saw it twice. The model says, oh, okay, well, you're probably going to want to add some of your own insights. There's some placeholders in here, right? If you want to get work done, we're going to launch this thing in Canvas. All right. So subject line, checking in and offering support. Hi, colleague's name. I wanted to reach out and let you know I've been thinking of you.
Okay. I see one difference right away. Right. Uh, again, I don't know if this is, this is good or bad. Right. But four or five is saying like, been thinking of you. All right. I'm trying to, I'm trying to jot some notes cause I actually can't, uh, as easily compare them, uh, in canvas mode. Uh, I heard about your family and I'm truly sorry, uh, for, I'm truly sorry. You're going through a difficult time. Okay.
Please don't worry about the missed deadline. Okay, interesting. Don't worry about the missed deadline. Your family and personal well-being are the priority right now. If there's anything I can do to support you during this period or assist with any pending tasks at work, please feel free to let me know. We're all here to help you manage through this. Take all the time you need and please keep me updated when it's convenient for you.
Okay. So actually now that I'm reading this, I'm seeing some nuances, right? Some small things, right?
Right. And again, this is not my area of expertise. I'm not going to be hired as anyone's HR, uh, head anytime soon, but I do see a couple of things in four or five response and I'm kind of bolding them, uh, on my other screen right now, uh, to, and, but, but let me know, uh, live stream audience, which one was better. And, you know, do you see anything that you liked in four Oh, versus four or five or four or five versus, uh,
you know, four, let me know if, if, if one was preferable. All right. Uh, so here's a couple of things I picked up on. So let's go back to four. Oh, so at least for me, four Oh, starts with saying, uh,
Right. Again, I didn't think there was anything necessarily wrong with 4.0's response until I read 4.5's response. And I'm like, wait, okay, some things are better here. So 4.0's response, the first thing it says, wanted to check in, right? Yes, it says wanted to check in and see how you're doing.
I think even when you read that email, right? If you're in that situation of a family emergency and someone says, wanted to check in, it sounds kind of business, right? It sounds, I guess, a little cold. Whereas four or five says the first sentence, I wanted to reach out and let you know I've been thinking of you. Okay, there we, I mean, just that right there, I think you can hopefully see and realize the bump
an EQ, right? And I think maybe that's where there's also, I don't know, in my mind, I'm also, you know, trying to describe in real time the vibe. The vibe of the 4.0 letter I'm getting now that I'm reading it is sympathetic, you know, with a little bit of like, hey, let's get this project going forward. Where 4.5, I think, is maybe a little more empathetic
And talking about working together to move something forward. That's what I'm getting. 4.5 says, please don't worry about the missed deadline. Right? Where 4.0 says, you know, I understand that the deadline was affected. Where 4.5 says, please don't worry about the missed deadline. Okay? 4.0 says,
you know, to kind of move the project forward says, let me know if you need any assistance in catching up or if we should discuss a plan to move things forward. Okay. So, uh, again, when I'm reading that by itself, I'm not necessarily like, oh, this is bad. All right. And then four or five says, if there's anything I can do to support you during this period or assist with any pending tasks at work, please feel free to let me know.
All right. Where even just saying, please feel free versus, you know, versus four. Oh, just says, let me know essentially about these tasks. And then four or five, again, it looks like showing a little more empathy, uh,
versus sympathy and maybe prioritizing the family situation where at least now as I'm kind of comparing the two, you know, it looks like four O's just like wrapping up some sympathy and like, yo, let's get this project going. Right. Which I don't know. What do you guys think? Denny says,
Four or five sounds like the person really does care. And four, oh, sounds like I need to write this email to show. I care. That's a great observation from Denny. Max says either one would work. Four. Oh, is what I usually would expect from the regular office people. Four or five is superior EQ and empathetic more than use more than usual office humans. Yeah, that's what I'm saying.
Right. Like when I first saw 4.0, I'm like nothing wrong with this. Right. But then when I said 4.5, all of a sudden I'm like, oh, okay. Yeah.
I can see how on the, you know, on the human side, there's maybe some things that could have been improved in this 4.0. Michael said, I would prefer to receive 4.5. I feel like I would write something closer to 4.0. My gosh, Michael, we are the same, right? We are the same. I'm reading these and I'm like, oh yeah, I like that one. But I totally, I totally would have personally written something like 4.0. Um,
Yeah, Sandra's saying 4.5 is more empathetic. Jonathan, what's up, Jonathan? Jonathan says 4.0 minus the hope your well seems more HR friendly while 4.5 just seems friendly. Yeah, I agree with that. Yeah, I do think maybe...
4.0 is maybe more in line with maybe HR guidelines, and 4.5 is probably something that is going to resonate with the human receiving it, I think, much more. Cecilia said 4.5 version drops the need to move the deadline forward and puts it on a pause. 4.0 makes it clear that we will need to move things forward on the specific deadline missed. Yeah, so I guess it ultimately depends on
on what's the most pressing thing, right? Is the company going to shut down if this deadline isn't met, right? So great observations. I love doing this live, y'all, because you guys spotted a lot of things. Robert from Twitter is saying 4.5 has more empathy in it. All right. I think we can do one more. So scrolling through here to see one more we can do that people voted for. Okay. Let me bring this in.
So a lot of people wanted number six. So let's try that. All right. So we're going now to a new chat. We're going to start with four. Oh, here's number six. All right. So it says draft a daily motivational message that encourages personal growth with clear fact-based tips. Not a lot of guidance. We'll see what happens. All right. So first we're running with four. Oh,
Lots of emojis. So if you're personally motivated by emojis, you can already tell 4.0 is your model, right? That's something I've noticed as well. Not getting a lot of emojis out of 4.5. All right. So we can read this.
So here's four O's daily motivational message that encourages personal growth with clear fact-based tips. All right. Livestream audience. Let me know after I read both of these, which one gets you pumped up, which one of these is, is, you know, getting you ready to face the day. All right. So daily motivation, small steps, big growth, rocket emoji. All right. Every day is a chance to become 1% better. Personal growth. Isn't about massive leaps. It's about consistent, small actions that add up over time.
Fact-based tip. All right. We're even labeling it here. Studies show that setting up specific achievable goals increases your chances of success by up to 42%. Harvard Business Review. I'm obviously going to Google that to see if that's real. All right. And it is.
All right, good. So at least 4.0 did not hallucinate that part. All right. So studies show that setting specific achievable goals increases your chances of success by up to 42% instead of vague intentions like get healthier, try walk 10,000 steps daily or read 10 pages before bed. It's actually a great tip. Actionable step.
Pick one small habit to improve today, whether it's drinking more water, writing down your thoughts or practicing deep breathing for focus. Small wins lead to big transformations. Keep moving forward. Your future self will thank you. Emojis hashtags, right? Sorry. I hate reading this. This is everything that's on social media right now, right? I want to vomit as a content writer.
I hate this. I hate anything one-shotted from any large language model because most of it always reads like this, right? You can get writing, chat GPT writing or AI writing to sound extremely human, but you're, you're, I don't care, right? Oh, people are like, oh, cloud three, five or a cloud three, seven sonnet sounds so human. GPT four, oh, GPT four, five sounds so human.
No, for the most part. No, it sounds like this. This is what I read on social media, on LinkedIn, on Twitter, and I want to vomit in my mouth. There's nothing wrong with the actual tips. I just, the content writing is getting me woozy, like woozy Rogers. All right, let's go four or five. So again, draft a daily motivational message that encourages personal growth with clear fact-based tips. All right. Interestingly enough, I already see one emoji.
And four, five, I think four had like, let's see how many four had one, technically two, three, four, five. All right. So we had four emojis and two hashtags in four. Oh, and then in four, five, we had two emojis, no hashtags. So a little better. All right. Ready?
Here's four or five. Today's motivation, embrace your growth. Growth doesn't happen overnight. It comes from small, consistent actions taken each day. Here's your simple fact-based guide to grow personally and professionally today. Number one, learn something new, even just 10 minutes. Research shows daily learning builds neural connections, improving memory and cognitive skills.
Two, practice gratitude. Write down three things you're grateful for daily has been scientifically proven to enhance overall happiness and reduce stress. Number three, step slightly outside your comfort zone. Small challenges creates confidence through psychological resilience, enabling you to face bigger challenges more effectively. Take a small step forward today. You have everything it takes to grow, learn, and succeed.
Which one lives your audience? Which one has you pumped up and ready to run through a wall like the Kool-Aid man? Oh yeah. Do any of these GPD four Oh, uh, GPD four five. Uh, let's talk quickly about some of the, some of the differences. So, um,
Surprisingly, on this example, 4.0 was a little more concise. It was a little too heavy on the emojis, a little too heavy on the hashtags. Not a huge fan of this. One other thing is even if we're just looking at the quality of the content writing, I think 4.0 was a little poor.
There is no cadence or switch up in the sentence structure. Yeah, I'm getting a little, you know, putting on my old writing hat, right? You always want cadence in your written content. What that means, I try to throw cadence in my podcast, right? I don't just always speak monotone. I don't always go in, you know, sentences that are, you know, 15 to 20 words. I try to pause. Sometimes I talk slowly.
Sometimes I talk really fast and I have these long sentences that go together and there's no period, there's no punctuation. And I talk all excitedly. That's cadence, right? So 4.0 has no cadence. It actually falls into this compound sentence, right? So yeah, we're talking about content writing now, but that's something that I think is significantly improved in 4.5. 4.0s, I know you're maybe not, if you're listening on the podcast, maybe this doesn't worry or matter as much.
But 4-0 is kind of the equivalent of watching paint dry when it comes to content structure. Yeah, I was a journalist. I wrote a lot. For the most part, most of these sentences are, it looks between 12 to 20 words. And the majority of them are compound sentences with an em dash.
All right. So yeah, all those people are like, oh, you know, an em dash is definitely, you know, a sign of AI writing. Not really. Right. I was using em dashes back when I was a journalist at the Freeport Journal Standard in 2002 or 2003 or whatever. Right. Love em dashes. Love compound sentences, but huge over reliance on them here from GPT-4.0. So let's see. One.
two, three. So out of like the six sentences, three of them are compound sentences with em dashes. Not good. We only have, let's see, we have zero sentences that I would consider short, which is five words or less. All right. So if we look at a GPT four or five, we only have one compound sentence with an em dash. So that's better. Okay. We do at least have one compound
Short sentence. All right. So a little better, a little better in terms of content structure, you know, some, some cadence, some variants, but still nothing great. If we're just looking at content, right. I know this is more about the motivational message, but I did want to take a second to look at even just how the content is produced, because I think that is another small detail that four or five actually has better.
So yeah, less like, oh, people are always like, oh, this is AI content, right? You can't technically tell, although there's a lot of telltale signs, right?
heavy emojis, double emojis in headlines, you know, random hashtags, you know, like I said, an over-reliance or a heavy percentage of sentence, uh, of sentences that are compounding, uh, compound sentences separated by an em dash. So overall the content writing I think is much, much better on four or five. All right. So what do you all think as we, as we wrap up here? But like I said,
These are the areas. And I think you saw it in probably that middle example, probably the best, the email example, how really we're having a little bit of EQ, some emotional intelligence and being relatable as a human, right? A lot of you said the same thing. I said the same thing as well. I'm like, I want to receive that second email that we talked about, the one that was from four or five. It just felt more human.
it was probably more human than something I would have written, right? Which is pretty impressive, right? It is pretty impressive. And I think that's one of the reasons why this new model, GPT-4-5, when it comes to
Humans' preferences, right? Yeah, 4.5 didn't crush every single LLM benchmark. It improved on almost all of the benchmarks from 4.0 to 4.5. But people were like, oh, OpenAI has hit a wall. OpenAI is going to go bankrupt. OpenAI is garbage. It didn't break every single benchmark out there, right? I don't think most companies, we saw the same thing with Claude 3.7 Anthropic.
All right. Sonnet, Sonnet three, seven from, from Claude. It didn't break every single benchmark out there. It really excelled in, in just, uh,
widen their lead in anything, software development, anything on the, on the dev side. Right. But I think now we're going to see companies probably more focused on something like ELO scores, right. On the, on the chat bot arena. Right. And they're like, yeah, we hope our actual benchmark, you know, our, our STEM, our math, our, our reason, right. All these kind of like quote unquote, more scientific research-based category based benchmarks improve. But I think ultimately we're past that.
I think we're past that. And right. And this is indicative. The fact that GPT four or five did not crush every single benchmark on paper that people said, Oh, these are important.
But at the same time, instantly shot up to the number one model in the world preferred by humans. That says something, right? There is a human side to large language models that I think for the most part, you know, that we ignored before 2023, right? Everything was about overfitting models, right?
to hit certain benchmarks. And I think over time we saw, okay, that's great for benchmarks, but it's not benchmarks using these models. It's humans. It's humans trying to solve real problems. It's humans trying to sell things to other humans, trying to improve customer relationships, trying to increase accuracy and reliability.
which are all things I think GPT-4 or 5 does a great job of. So before you listen to that random influencer online that is just spitting out these benchmarks and it's like, oh, open AI has hit a wall. I'd say the exact opposite. I'd say the exact opposite. I'd say if we're being honest, right? A lot of the things that we do on a day-to-day basis are creating communication for other humans. And as someone that's,
Y'all I've won national writing awards. I've done okay. I was a Pulitzer fellow. Some of those emails better than I would have written, right? If I had to write some of those emails, because it's thinking about the human and it's trying to be more relatable. It is really flexing its EQ skills, which I think is ushering in a new era, not just of how large language models are built, but
but how they ultimately should and could be used to strengthen relationships and connections between humans while also still, you know, hopefully excelling in all those benchmarks. But in the end, that's what it's all about. I hope this one was helpful, y'all. If it was, please go to our website, go to youreverydayai.com, sign up for that free daily newsletter. Also share this, right? I know a lot of people tell me,
oh jordan i'm not going to tell anyone about this right everyone at my company
thinks I'm a genius, right? I've gotten so many, so many messages. I love these reach out. If, if, you know, if you have a story like this, I always love hearing it. It makes, it makes the long nights and early mornings really worth it. I love hearing from people that are like, Hey, I just got a job. My first job in AI. Thanks to, thanks to, you know, your, your podcast. Thanks to these guests you bring on. Right. And people tell me like, I'm not telling anyone about this. This is my cheat code. This is my secret.
share it, please. People are always like, how can I help? How are you making all this information free? It's because of those of you that actually do share this. So if you're listening on the podcast, thank you. I appreciate it. Uh, please subscribe. Uh, please leave us a rating. That would be great on the podcast. Uh, and also go to your everyday AI.com sign up for the free daily newsletter, read the, read the daily newsletter as well. Each and every day, we break down exclusive insights that you didn't hear from the podcast. We're going to take this a step
further, as well as keeping you up to date with everything else you need to know in AI. So thank you for tuning in. Hope to see you back tomorrow in Everyday for more Everyday AI. Thanks, y'all. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.