We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 498: Meta drops Llama 4, Microsoft Copilot levels up its AI game, GPT-5 roadmap hits snag and more AI News That Matters

2025/4/7

Everyday AI Podcast – An AI and ChatGPT Podcast

AI Deep Dive AI Chapters Transcript

People

Jordan Wilson

一位经验丰富的数字策略专家和《Everyday AI》播客的主持人，专注于帮助普通人通过 AI 提升职业生涯。

Topics

本周AI领域发展迅速，多家大型科技公司发布并更新了其最佳AI模型和功能，包括微软的Copilot更新、Midjourney V7图像生成器的发布、OpenAI的GPT-5延迟以及新的O系列模型、ChatGPT Teams的Google Drive集成，以及Meta发布的Llama 4。这些进展涵盖了AI模型的多个方面，例如图像生成、大型语言模型的改进、多模态AI的应用以及AI代理的开发。这些更新和发布对AI行业以及相关领域都将产生深远的影响，并推动AI技术不断发展。

Deep Dive

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life.

I hate saying this each Monday, but my gosh, it's been another crazy week in AI developments. I mean, think about it. We've had multiple trillion dollar companies release and update their best AI models and features.

We finally have an AI image generator release that we've been waiting on for more than a year. That's still a leader in the pack. We got a bunch of Chad GPT updates and news on GPT-5 and another model we weren't expecting from OpenAI. And apparently an AI model has passed the Turing test. Yeah, can't argue. It's been another crazy week in AI development and yeah,

I don't blame you if you can't keep up. I do this every single day and it's hard for me to keep up, but that's why on most Mondays we bring you the AI news that matters. So what's going on y'all? My name is Jordan Wilson and I'm the host of Everyday AI. And this is your daily live stream podcast and free daily newsletter, helping everyday people like you and me, not just learn what's happening in the world of AI, but how we can all take advantage of it to grow our companies and to grow our careers. Is that what you're trying to do?

trying to make sense of all this AI? Are you trying to learn it and then leverage it in your day to day? Well, it starts here. This is where you learn what's going on, but how you leverage it, that happens on our website. So go to youreverydayai.com sign up for our free daily newsletter there.

Every single day, we recap each day's podcast live stream, as well as keeping you up to date with everything else that you need to not just keep up, but to get ahead in AI. So if you haven't done that already, make sure you go do that. And you can go listen to now almost.

500 episodes. Yeah, I think we're on episode 498 or something like that. So I got to cook up something special for number 500 running out of time. So hope y'all can join for that. I believe that's on Wednesday.

So before we get into the AI news, yeah, because there's a lot. And like I said, we do the AI News That Matters segment almost every single Monday. A couple of things. We extended voting for the Inception Games. So that was our partnership with NVIDIA and their Inception program, highlighting some of the best AI startups in the NVIDIA Inception program. So make sure both in the show notes and on our website, you can go vote.

There's two different ways to vote. So we shared that voting ends Tuesday, April 8th at 1159 PM central standard time. So if you haven't already voted, make sure you go do that. One other kind of housekeeping thing for us.

We will be out in Las Vegas for the Google Cloud Next 2025 conference. So in partnership with Google, looking forward to that. There should be a lot of updates coming out of that show. So, hey, if you're going to be at the Google Cloud Next conference, make sure you holler at me, you know, whether it's on LinkedIn or email. I always put that information into the show notes as well. All right.

Enough chit-chat. Like I said, so much AI news this week. Huge releases from Meta with Llama 4. Microsoft essentially said copy and paste to every single other cool AI feature that's out there that they didn't have yet. We have GPT-5 news. We have chat GPT updates. Mid-Journey 7 is finally here. So much to get to. Let's dive in. But hey, what's up, livestream audience?

Hey, Graham from Ireland. How you doing? Big bogey joining us on YouTube. Thanks. Dr. Scott on LinkedIn says, congrats on number 500. Dr. Harvey Castro, great to see you. Sandra and Kyle from YouTube. Thank you all for tuning in. All right, let's get to it.

Let's get to the AI news that matters. There's a lot, y'all. First, Microsoft essentially said, oh, there's all these cool new features out there. Let's develop them all and release them all at once. All right. So Microsoft had a celebration of their 50th anniversary and they released so much.

All right. So Microsoft has rolled out a massive update to its AI assistant co-pilot, introducing features like memory, personalization, web-based actions, and a lot more. All right. So here's just some of the new updates. I couldn't even include them all because it would take an entire show. But here's, I think, the ones that are going to probably most impact everyday users.

So Copilot can now remember users preferences, uh, interest in details to tailor advice and suggestions in their new memory features. So users retain control over what Copilot remembers or can opt out entirely. Uh, also there's some new personalization options. So microphone,

Microsoft plans to offer personalized appearances for co-pilots, including the option to bring back Clippy. If you've missed Clippy over the last, I don't know, 25 years, you know, the iconic assistant from earlier Windows versions is making its AI return. Also, there's actions. This one's pretty big. So a co-pilot can now perform tasks directly via its web browser.

Yeah, Microsoft just silently rolled out agentic AI in browser. Yeah, so you don't have to download a program. It's just working in the browser with their new actions feature. So you can do things such as booking

tickets, reserving restaurants, and even making purchases. So combined with new shopping tools, Copilot can research products, find discounts, and streamline online transactions. So yeah, big agent play there for Microsoft as well, as well as a huge expansion of their Microsoft Copilot Vision feature, which was previously available in web tools, and it's now rolling out to Windows and mobile apps, which is

a wildly useful feature. So I use that all the time on the Edge browser. It's kind of like Google has something like this in AI Studio, which it's great when it works. Google AI Studio, their stream in real time was being a little finicky for me this weekend. So I might be using the Copilot Vision a little bit more where you can literally just tap one button, Copilot sees everything that's on that screen.

And you can talk to it in real time. So pretty exciting there. Also, deep research. Yeah, like I said, like Microsoft literally just rolled out every single feature that they didn't have already. So now Copilot can analyze extensive documents in online sources for complex projects, integrating with Bing for AI powered search responses. So it can generate also

You know that whole, oh my gosh, so much from Copilot here. I should have teased y'all with this in the beginning. You know that like notebook LM thing that is absolutely amazing and how you can generate a podcast on any of your information? Well, now you can do that inside Copilot as well with audio summaries.

to explain detailed topics. Also, new updates to its pages. So the new functionality inside pages enables Copilot to organize notes and research across multiple documents into a single workspace, simplifying project management and collaboration.

That's not even all y'all. I couldn't do like, you know, 30 minutes of Microsoft news, but everything's rolling out at kind of a different time. So many of these features are launching in initial versions already with improvements expected in the coming weeks. So availability will vary by market and platform. So we will continue in our newsletter to keep you informed when these all come out.

So much. Yeah. Yashel says, uh, amazing. Kimberly says, got to try that out. Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. Big Bogey is loving co-pilots.

Joe says, perhaps a co-pilot update walkthrough episode. Joe, you know what? Maybe. All right. You know, at the end for our live stream audience, I'm going to ask you what we should cover maybe tomorrow or later this week because there's so much. And I do want to do a dedicated show on one of these new updates. So I'm going to let you all choose which one that is. All right. So maybe I'll have you vote at the end. All right. Next piece of AI news.

The king is back. The king has returned to quote one of the best 90s movies ever. So after a year plus of wait, Mid Journey has released V7 of its image generator, bringing new features like voice inputs in its faster draft mode that allows you to work more in natural language versus more, you know, I'll say Mid Journey promptees.

So now with Mid Journey V7, voice input is now available, letting users speak prompts directly to the model, which then converts audio descriptions into text and then generates images.

Also draft mode, I think will be pretty popular as it offers rapid image creation, producing lower quality image though in just seconds, whereas sometimes mid-journey can take a little bit longer. So users can also refine drafts by enhancing or varying them into high quality outputs. So I think that's what draft mode is really gonna be most used for. Yes, it is faster than the normal full

full mode inside mid journey B7. But I think it's more for iterating on images and using more natural language in draft mode where the full mode I think is, um,

more if you are, uh, really good at prompting mid journey, right? Uh, I'm a big mid journey fan. I always have, but you know, I don't know. I think over the last year or so, it seemed like the interest for AI image generator, at least for our audience had gone down a little bit. So I don't know, maybe I should ramp it back up now, especially, uh, with the new GPT four. Oh, uh,

image gen that has gone absolutely viral over the last couple of weeks. Uh, also with Google Gemini's, uh, new Gemini 2.0 flash, uh, that does, uh, image generation very, very well in line multimodal. Uh, so yeah, maybe, I don't know y'all. Do you, do you guys care podcast audience? Let me know as well. Should we do more, uh, AI image generation? Uh, I think now obviously the quality is fantastic. The quality is fantastic. Um, yeah,

And it's really good. So, uh, let's talk a little bit. So now there is a personalization feature that's actually mandatory for V7 users. So before using the models, users must rate 200 image pairs to create a tailored style for generations. So older V6 personalization styles can still be used, but mood boards remain unavailable for now.

So there are two modes available in V7. There's turbo mode, which doubles generation costs for high performance, while draft mode costs half as much and is much faster. So some features still use V6 technology, including upscaling, in-painting, and retexturing, though those will transition to V7 in upcoming updates. So far, user feedback is mixed, actually, with some players

praising improved realism and artistic quality while others criticizing ongoing issues like human anatomy errors and text rendering accuracy. But many feel the update is incremental rather than groundbreaking. So I'll say the same. I think Midjourney

in terms of visuals and aesthetics has always been number one, right? Even as we got the new, uh, update from, uh, chat GBT four Oh image gen, even as we got the imagined three model from Google, uh, that you can use inside Google Gemini 2.0 flash, um, you know, and obviously like dozens of other AI image generators, uh,

Mid-journey has always been the king when it comes to style, when it comes to aesthetics. However, where it has lacked, oh, I also got to mention ideogram because I think ideogram V3 that just came out is really, really good as well. But where mid-journey has always thrived is in visuals, right? It is the most aesthetic thing.

but it struggles in other areas. It still can't use text, right? So if you want text incorporated at all, mid-journey is not really your thing. Also prompt adherence, I think in the very little testing I did actually got a little worse in V7. So, you know, if you do have much more complex prompts,

I do think even something like GPT-4.0 image gen is a little better. So it just depends on ultimately what you want. But, you know, as an example, if you are creating or if your company is, you know, trying to create better multimedia with videos and things like that,

Mid-journey might be best for that, right? Because I think it's still probably the best starting point if you are trying to create AI video and you are going text to video or sorry, image to video. I still think mid-journey v7 is probably the best for most use cases, but for everything else, especially when it comes to prompt adherence, when it comes to iterating on an original image, when it comes to text, mid-journey is still not it, y'all.

Kimberly says underwhelmed, underwhelmed. All right. Next. I don't know why no one talked about this. We covered it in the newsletter and I put it out on the Twitter machine. This is actually huge.

we have like mini rag now inside chat gpt um and i'll explain what that is after i tell you what's new so open ai is starting to roll out its internal knowledge access for chat gpt teams users uh so

Right now, it is only available for Teams users. And right now, the only thing available is Google Drive. So the new feature, and this is in your connectors settings, if you are on a team plan, it just started rolling out this past week, and it allows ChatGPT to retrieve real-time

time information from internal files anywhere in your Google Drive, and it can summarize content and create tailored outputs like demo scripts or summary. So Google Drive is the first platform supported with access gradually rolling out over the next few weeks. Let me just say this scary good, scary good. And you might be wondering like, oh, Jordan, wouldn't you just use Google Gemini? It connects to Google Drive as well.

It does ish. So if I'm being honest, this is one area that Google Gemini still struggles in. I think, you know, even though Gemini 2.5 Pro might very quickly become my most used model over GPT-4.0 because y'all inside, let me just put this out there inside Google AI studio.

2.5 pro million token context window, the world's most powerful model. And it's available for free with a million token context window on the front end of Google Gemini chat. It doesn't have that million token context window. So also inside Google AI studio, you can't turn off data sharing. So, you know, definitely don't use it with anything, you know, sensitive or proprietary. So

But it struggles. It really struggles. Google struggles for whatever reason, accurately pulling information from Google's own Google Drive. ChatGPT Teams does a way better job and it is extremely impressive. So if you do have a Teams account, you need to be logged in as the Teams admin, go into your workspace settings and look for connectors. So it takes a

I actually don't know how long it takes. I just let it kind of sit there in the background. So it might take anywhere from, I don't know, five, 10, couple hours to fully sync everything. But then essentially you can click a new button that says internal knowledge in anything in your Google Drive, instant access, extremely impressive. And the reason why is because it's all dynamic, right? So yes, inside Claude, even inside Gemini, there's certain instances that work great when you can upload files individually.

But it's not dynamic, right? And this is why I do think this might be the first consumer, you know, true mini RAG system. So what that means is anytime you're using a large language model, the thing you always have to keep in mind is, well, your data recency and just

basic prompt engineering 101. So having this feature inside ChatGPT Teams is huge. So OpenAI does plan to expand support to other tools such as CRMs, project management systems, and data analytics platforms soon. But right now it is just Google Drive. So you do have to be on a Teams plan, which is $25.

per person per month. I still believe you have to have a minimum of two users to have a Teams plan. But if I'm being honest, even if you're a solopreneur or even if you're the only one using it, it's probably just worth it to just pay that extra license just to use this feature alone, especially if you are a power user of ChatGPT.

Yeah. You got to worry about security as big bogey face says. Uh, yeah, don't definitely don't just throw in, uh, docs in there, uh, haphazardly. Also a good point. If you have that connected and if you're using it, you know, you really have to increase your personal responsibility as the expert in the loop, right? I think I'm going to stop saying human in the loop. Uh,

FYI, because I really think it's about expertise in the loop, but you have to be much more vigilant to see what ChatGPT is using and what it's not. All right.

more chat gpt news this one might be some of the biggest news that snuck under the radar so open ai has announced that they're kind of delaying their plans for the much anticipated gpt 5 but also slipped in that okay we're actually going to be releasing uh two new o series models

In the meantime, so OpenAI has unveiled updates to its AI roadmap, including a new 04 mini model and also details about the now delayed rollout of GPT-5.

So OpenAI plans to release a new O4 mini model alongside the previously announced full version of the O3 reasoning model within, quote unquote, a couple of weeks, according to CEO Sam Altman's Twitter post. So the O4 mini model is expected to serve as a next generation successor to a reasoning model.

model that we right now have 01 and 03. So yeah, I'm really interested to see what they're going to do. Are they going to have three versions of their O thinking models available? Because for some, uh, for some instances, I love 03 mini high. That's actually been one of my workhorse models recently. Uh, but are we still going to have 01, 03,

and 04 available because I still use and prefer 01 Pro for certain instances, which you do have to be on the $200 a month ChatGPT Pro plan. But 01 Pro is the most powerful model I've ever used. I think even for certain tasks, it's better than Google Gemini 2.5 Pro. But I mean, we'll see how much we actually get to keep.

So what's with this GPT-5 delay? Well, GPT-5 has been described as more of a unified model incorporating all of the other models, you know, so advanced reasoning, voice functionality, canvas, search, deep research, tools, and everything. So at least what we've been told is GPT-5 won't be a new model per se, right? Like GPT-4, GPT-4i.

5 GPT-4.0, it's more going to be a system. And OpenAI has said that they will offer GPT-5 with tiered access, standard intelligence settings for unlimited use, higher intelligence levels for ChatGPT Plus subscribers, and even higher settings for ChatGPT Pro users. So OpenAI also

Yeah, in addition to that. But also, let me mention why it's delayed. Well, at least according to Sam Altman, he noted that the company has found it harder than expected to integrate all the features smoothly while maintaining performance. But improvements in GPT-5 designs have exceeded initial expectations. So kind of telling both sides of the story like, oh, it's actually going way better than we initially thought, but also at the hard time or also at the same time, we're finding it

difficult, more difficult than we thought to fully incorporate everything. So previously, uh, you know, essentially open AI said, yeah, we're not going to release any new models, uh, before, uh, GPT five comes out, but yeah, change of plans here. So we're going to be getting in oh three full, uh, and we're going to be getting in oh four mini.

Personally, I'm not looking forward to this new GPT-5 system, and I don't think power users should be looking forward to it either. That's just me. I don't know. It's not out yet. I would much prefer to not have a system decide what model to use, if I'm being honest.

I know better, right? If you are a power user that has, you know, use every single model, thousands of prompts, you know, which model to use for which scenario, right? I know it like the back of my hand. I don't want necessarily a system deciding which model to send it to. I often use three or four models in the same project, but going back and forth and model switching. So, I mean, hopefully GBD-5 is smart enough to do an

adequate job. I don't have a ton of hope if I'm being honest. All right, more OpenAI news, just some bullet points here. So Sam Altman also tweeted that OpenAI is officially developing an open weights model. So they might actually go back to being open in the OpenAI, allowing businesses to customize AI without retraining, but stopping short of full open source, similar to Lama or DeepSeek.

And then other chat GPT updates. So the very viral and extremely impressive GPT-4.0 image was updated. So there's a new version that rolled out. It didn't say a lot about it, except it takes more time to essentially think about creating the image before it gives you the image. Also, they rolled the image gen out to free users, which was previously delayed. And last but not least, they updated

are giving chat GPT plus away for free to university students. All right. Through May. So, uh, essentially if you are a college student, you can get chat GPT plus normally $20 a month for free through may, you know, so we can delve into, uh, writing our final papers together with way too many emojis. Have we passed the Turing test?

Apparently a new study says from UC San Diego's language in cognition lab has says that open a eyes GPT 4.5 model has convincingly passed the Turing test sparking debate about artificial intelligence's ability to mimic human intelligence and its potential societal impact. So, uh,

In the study, GPT-4.5 was mistaken for a human in 73% of cases during a three-party Turing test, significantly surpassing the random chance of 50%. So this marks a major change.

literally major milestone in AI's ability to simulate human-like behavior. So in this study, participants engage in text-based conversations. All right, so this wasn't real time. It was text-based with a human and an AI. Then the participants had to try to identify which was human and which was AI. So GPT 4.5, when adopting a specific personality,

persona outperformed actual humans in being judged as a human. That's wild, y'all. If you follow AI, the Turing test has kind of been this unofficial gold standard of AI development.

And now we might have it. So persona prompts, though, were key to GPT 4.5 success with instructions to act like a young person knowledgeable about Internet culture, boosting its win rate to 73%. Without those persona prompts, its success dropped to only 36%. So GPT 4.5 with personas

At least according to this study, pass the Turing test, which is a huge deal.

So OpenAI's GPT-4.0 model, which powers the default version of ChatGPT, achieved a much lower win rate of 21%. But maybe the most shocking thing of all this, the decades old, the original Eliza chatbot that's like 50 years old, right? It was technically the first chatbot. I believe it was from the, what was it, the 60s, right?

It performed at a 23% success rate. So actually, uh, Eliza outperformed, uh, GPT four Oh, uh, by a couple of percentage points, but undoubtedly GPT 4.5 crushed the touring test, right?

a 73% win rate. It's extremely impressive. And we've been saying this all along. So when GPT 4.5 came out, a lot of people were confused and they were like, okay, this thing didn't crush every single benchmark ever. So why is it important? Empathy.

EQ off the charts. I also think this goes to show how a little bit of best practice prompt engineering goes a long way, right? Having ChatGPT with such a simple, or sorry, GPT 4.5, act as a young person knowledgeable about internet culture, having it act under that persona, increase its win rates exponentially. So the

Implications of this study are significant, with the study's lead author noting that AI's ability to convincingly mimic humans could lead to automation of jobs, enhanced social engineering attacks, and broader social disruption. Yeah, I don't think this is necessarily like

a great thing for AI, it's actually a little concerning, right? Because all those, you know, scams are about to get a lot better with GPT 4.5. I guess, luckily, in that regard, GPT 4.5, using it via the API, right? So if it were to be used in a bad way, generally, you'd use it via the API because you want to do it in mass. It's extremely expensive still. But

But I do think that we're going to see a wave of new models in 2025 and 2026 like GPT 4.5 that are more tailored for emotional intelligence versus like standard IQ. And that's what's really going to trick humans. And that's where it gets both useful in many regards, right? Because then all of a sudden, you know, your AI powered customer support can be a little empathetic and emotionally intelligent, right?

But at the same time, the other side of the coin can be extremely ugly. All right. Amazon, don't forget about them.

They've unveiled Nova Act, a new AI toolkit for autonomous web agents. So Nova Act is designed to create autonomous agents capable of performing tasks in web browsers. So this move signals Amazon's intensified competition in the race to commercialize AI agents and enhance their functionality beyond simple chatbots. Yeah, I think people kind of forget

about Amazon, even though in the same way that OpenAI and Microsoft had this kind of relationship, right? With initially Microsoft being the biggest investor in OpenAI, hey, Amazon is the biggest investor in Anthropic.

So you can't sleep on Amazon, but Nova Act, their new agentic AI is part of Nova's AI initiative, which focuses on developing foundation models for various media and input types, including text images in video. So the new toolkit allows developers to build AI agents that can complete step-by-step tasks in web browsers, such as submitting time off requests or placing recurring online orders without relying on APIs.

So Amazon claims Nova Act excels in handling complex interface elements like dropdown menus, date pickers, and pop-up dialogues, which are challenging for other systems.

So the software package available in Python enables agents to follow natural language instructions and operate in a behind the scenes mode for advanced business use. Developers can run multiple agents simultaneously to handle larger workflows, boosting efficiency for enterprise work.

So Amazon's internal testing, this hasn't been verified via third parties, have shown improved reliability compared to existing systems, but the company will monitor real world performance closely. So Nova Act positions Amazon among competitors like OpenAI, Microsoft, Google, and Anthropic in the race to develop autonomous AI systems capable of completing real world tasks. So yeah, if you don't follow the

agent space closely i'm probably going to do another dedicated uh agent show or two uh in the coming uh weeks because the agent space has obviously been on fire uh this week but i think a lot of people are also confused like what the heck is an ai agent what's different than an ai agent versus you know using a large language model that has tool and internet access essentially an agent

is usually powered by a certain large language model and an agent can autonomously make decisions on your behalf without your approval, right? Essentially, you are giving an agent permission

Agency, right? That's why they call them agents, right? You're giving it decision-making powers and it can go off and complete multiple tasks in a single sequence without human intervention, connected to the internet, connected to tools, right? That's a very oversimplified version of agents, but we'll probably do it a dedicated agents show soon just because there's so much new in the space. I don't know if you guys want it.

Should I do that show as well? Let me know. But also, Amazon is launching a website to let developers and everyday users experiment with Nova Foundation models, which were announced back in December. Our last big piece of AI news on a Saturday, Meta unveiled Llama 4.

its highly anticipated successor in its open-weight, open-source, large-language model lineup. So the release of Lama 4 features new open-weight models designed to push the boundaries of multimodal AI capabilities. So yes, now Lama 4 multimodal by default, an open-source, multimodal, large-language model that benches actually very well.

So there's four new models. Two of them are available now. So that is Llama Scout, which is the smallest model, and Llama Maverick. So yeah, apparently we went top gun there. So they're available now, while two more are still in training. So that is the Llama 4 Reasoning model, and then Llama Behemoth. So they are slated for release soon.

So yeah, Lama sticking with their previous kind of Lama 3.2, 3.3 releases, having a small, medium and large variation with now Scout, Maverick and Behemoth, but then also adding that reasoning model.

So the drop from Lama, the surprise drop, because we had reports coming out that Lama was facing some issues internally, catching up with other open source models in terms of benchmarks,

but i don't know looks pretty good to me uh but the drop has sparked widespread excitement particularly due to a 10 million token context window in the small scout model setting a new industry standard all right so

Yeah, 10 million tokens. So we're not sure yet how well it's going to perform, right? In the same way, you know, sorry, Google Gemini 2.5 Pro with 1 million context window. It is wildly, wildly

useful, but also there's always going to be drop off with these larger context windows because it takes longer. If you're using it via an API, it eats up more compute, right? So the 10 million token context window,

I think has been probably the most popular piece of what was announced. But we will have to see in actuality how that plays out, because what a lot of people aren't talking about is it was trained on a 256K context window. So, you know, I would say we really have to wait to see until benchmarks show how well this small model can take advantage of that 10 million token context window.

So a lot of people are shouting out right away, oh, RAG is dead, retrieval augmented generation is dead. I don't think it is, but I have been saying now for many months that I think in the future, retrieval augmented generation, as we know it today, will be less important than it has been in 2023, 2024, and so far in 2025 because of these longer context windows. Also, I do believe that most people

that most AI usage will become agentic, uh, and reasoning as well. Uh, reasoning models, they eat up more tokens, uh, hybrid models as well because they're reasoning under the hood. And then when you talk about multi agentic, uh, setups, I do think the, uh,

RAG becomes a little less important, but I do think that we're going to see a new and improved version of RAG that's more applicable for hybrid reasoning and multi-agentic models. But I think obviously larger context windows have something to do with that, but there is an offset

to that right so you can't just think oh my gosh 10 million you know a 10 million uh token context window right which is uh what is that like more than seven million words uh seven million five hundred words roughly right so you're like okay i can just throw in you know dozens of books and you know like countless hours of transcribed videos and it's gonna remember it every single time a hundred percent no remember expertise in the loop is still important

So Meta CEO Mark Zuckerberg emphasized the company's focus on open source AI in his announcement video, stating that it aims to build the world's leading AI and make it universally accessible. So he expressed confidence.

confidence that open source AI will dominate the field with Lama for marking a significant step in that direction. So Lama four models are expected to power AI agents capable of advanced reasoning and action. So these agents will be able to serve the web and perform tasks useful for both consumers and businesses, potentially revolutionizing productivity tools.

So Meta plans to host its first LamaCon AI conference later this month on April 29th, showcasing AI advancements of Lama 4.

All right. So we need to talk about the benchmarks. There's a lot of rumors swirling around. People are doubting Lama's own internal benchmarks. I won't say that because here's why humans have confirmed it. Third party benchmarking services have confirmed it as well. So as an example,

The third-party benchmarking service, which is a great resource, artificial analysis, looked at non-reasoning models. Okay, so non-reasoning. So, you know, none of the OpenAI 03, 01 Pro, Google Gemini 2.5 Pro, et cetera. So among non-reasoning models, Lama 4 Maverick is in third place and pretty close behind GPT-4.0.

and DeepSeek v3, which were both just updated a couple of weeks ago. So actually, if not for those updates from GPT-4.0 and DeepSeek v3 that just happened,

Lama for Maverick probably would have been number one non-reasoning model on third-party benchmarks, right? So yeah, a lot of people, if you read the hoopla online, right? Because originally there were reports saying that meta was facing delays. They weren't able to get the benchmark that they wanted. But I mean, here you go for an open source model that is safe to use. That's the other thing. You know, if you want to use a model from China via the API or the web,

I would highly advise against it. You know, it's different if you're downloading it, fine tuning it yourself for safety reasons or using a tune that or sorry, using a version of DeepSeek or other Chinese models that have already been scrubbed by a company like, you know, Perplexity or Microsoft Azure, et cetera. Then it's obviously safe to use. But, you know, Lama 4 Maverick

Pretty impressive on the third party benches. Also, if we look at the ELO scores from the LM Arena. So this is human preference.

So instantly, right away, Lama for Maverick, which again is that medium model in testing is now the second most preferred model in the world. And I do think a lot of times, you know, benchmarks are important, right? But I also think maybe just as important is human preferences, right? Because models can essentially be overfit.

to perform well on certain benchmarks, but humans might not find the same utility that you might expect based on just benchmarks alone because of that overfitting problem, right? So I think ELO scores like the LM Arena, where you put in a prompt,

You get two responses. You don't know which is which and you vote for it, right? After millions of votes, you start to get some clear winners in terms of which models are best for humans. And that's what matters. And Llama for Maverick very impressively leapfrogs

a bunch of very capable proprietary models, right? That's the other thing. So yes, Lama is not true open source. It's not like an MIT license or something like that via deep seek. It's a little different. They have some restrictions Lama does. So it's under a more of an open weights, open source ask Lama license, but still,

An open source model immediately goes to the second best model in the world, right? With a 1417 score on the ELO, the LM Arena. So Gemini 2.5 Pro, so good, by the way, 1439. Llama 4 Maverick, 1417. And then the updated GPT-40, 1410. All right, so.

Extremely impressive. You know, instant reactions to the new models from Lama. All right. That's a lot. What do you guys want? What do you guys want? All right. I don't know if I can do it tomorrow. Maybe I can, but let me know what you want to hear more of. There was a lot. There was a lot there. So live stream audience, let me know. I'll probably put something in the newsletter as well. So let's do a quick recap in live stream audience. Let me know what you care most about, what we should cover next.

So here is a very quick recap of all the AI news that matters for the week of April 7th. So like we said, Microsoft unveiled just about everything new inside Copilot, rolling out handfuls of new powerful AI features.

Mid-Journey finally released its V7 after more than a year of waiting for its AI image generation model. OpenAI sneakily is rolling out what I think is mini RAG with its internal knowledge access for chat GPT plus users connecting to dynamic data in V7.

Google Drive, OpenAI also announced it's kind of delayed plans for GPT-5. Bummer, but on the good side, did say that they're going to be rolling out the full version of O3 and a new O4 mini thinking models in the coming weeks.

Next, OpenAI's GPT 4.5 in a study has passed the Turing test rather convincingly. Amazon has unveiled Nova Act, its new autonomous web agent. And then last but not least, Meta unveiled multiple Lama 4 models. Two are already out. Two should be released soon. So

A lot to cover this week. Let me know what you want to see more of. Also, please don't forget if you're going to be at Google Next 2025 in Las Vegas, let me know. I think I should actually have time to go and talk to a lot of different providers that are there.

at this Google Next conference, maybe attend a session or two, right? So I'm excited about this conference in partnership with Google. And then don't forget Inception Games. We're going to have, yeah, the madness of March might be just about over. I believe the championship game is tonight.

Our AI startup madness continues. We need your vote. It's actually very close. You know, we're going to have our final two back on for a show and, you know, we'll announce the prize and some of those other things live on that championship show of the Inception Games. So if you have not voted already, make sure you go back and listen to episode 497 where we had our

awesome eight group of AI startups in the inception games pitch their service to you all. So if you haven't voted yet, make sure to go do that. All right. That was a lot. I appreciate y'all. I would also appreciate you going to your everyday AI.com signing up for our free daily newsletter. Uh, thank you for tuning in. Hope to see you back tomorrow and every day for more everyday AI. Thanks y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 498: Meta drops Llama 4, Microsoft Copilot levels up its AI game, GPT-5 roadmap hits snag and more AI News That Matters 47:11 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 498: Meta drops Llama 4, Microsoft Copilot levels up its AI game, GPT-5 roadmap hits snag and more AI News That Matters