cover of episode EP 493: ChatGPT’s groundbreaking image update, Google’s chart-topping Gemini 2.5 drop, Microsoft’s new reasoning agents and more AI news that matters

EP 493: ChatGPT’s groundbreaking image update, Google’s chart-topping Gemini 2.5 drop, Microsoft’s new reasoning agents and more AI news that matters

2025/3/31

Everyday AI Podcast – An AI and ChatGPT Podcast

AI Deep Dive AI Chapters Transcript

People

Jordan Wilson

一位经验丰富的数字策略专家和《Everyday AI》播客的主持人，专注于帮助普通人通过 AI 提升职业生涯。

Topics

本周AI领域发生了翻天覆地的变化，发布了世界上最强大的大型语言模型和最灵活的图像模型，这将彻底改变创意产业。微软发布了一系列功能强大的AI代理，能够处理复杂问题，并与基于规则的自动化相结合。OpenAI即将获得400亿美元的融资，显示出巨大的发展和投资潜力。苹果公司正在开发AI驱动的医疗保健工具，而马斯克的XAI公司则收购了推特，以增强其AI发展。OpenAI更新了其GPT-4.0模型，使其在大型语言模型排行榜上排名第二。法院允许《纽约时报》针对OpenAI的版权诉讼继续进行。Anthropic的研究人员在理解大型语言模型方面取得了突破，为更安全可靠的AI系统铺平了道路。OpenAI发布了其新的GPT-4.0图像生成功能，而谷歌则发布了Gemini 2.5，这是其迄今为止最先进的AI模型，具有100万个token的上下文窗口，能够处理大量数据集。

Deep Dive

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life.

This is one of the biggest weeks in AI development like ever. And I don't say that lightly. I've been doing this for like two and a half years, but let's just preview what happened this week in AI news. Well, we have the world's most powerful large language model that was released.

And I'll tell you why I think it's a bigger deal than you might think. We got the most capable and flexible image model we've ever seen that I think will absolutely disrupt creative industries. One of the largest companies in the world released some breakthrough multi-agentic flows and maybe even biggest yet.

There's talks of a certain AI lab raising $40 billion and a separate company made a $30 billion acquisition. This is wild. This all happened in one week. Don't worry if you're scratching your head like what the heck is going on. I'm going to get you caught up and bring you the AI news that matters.

What's going on, y'all? My name is Jordan Wilson, and welcome to Everyday AI. We're your daily live stream podcast and free daily newsletter helping everyday people not just learn AI, but how we can leverage it to grow our companies and our careers. So if that sounds like what you're doing, you're in the right place. So it starts here with this live stream slash podcast, but it continues on

our website. So if you haven't already, please make sure to go to youreverydayai.com, sign up for our free daily newsletter. Also on our website, if you didn't know this, there's like now 500 almost episodes. So you can go whatever you were trying to learn in the world of AI, whether it's marketing, it's agents, it's ethics. We've already interviewed hundreds of the world's leading experts, and you can access it all online.

online for free on our website. It is a free generative AI university, so make sure you go check that out. All right, so welcome to our weekly installment of the AI News That Matters. We do this almost every single Monday. We cut through all the fluff, the BS, the press releases, and just bring you the AI news that matters. So it's live and unscripted, and hey, live stream audience, do me a favor.

Do I sound okay? I had a couple, you know, it was a couple minutes late getting this show started. Had some mic issues. So yeah, hopefully y'all can hear me okay. Let me know in the comments. Love to see everyone tuning in. Max from Chicago, Marie, Colby, Pedro, Brian, everyone else. Tons of people joining on the YouTube machine. Sandra's on the elliptical. Hopefully I don't keep you too long. All right, but let's, enough chit-chat.

Let's talk about what's happening in the world of AI news. So first, hardly no one is talking about this, and I don't know why.

So Microsoft has released a bunch of new, very capable agents. And I think that there's two in particular that are, you know, readers and listeners are really going to like. So Microsoft is solidifying its leadership and enterprise AI by unveiling major announcements to its co-pilot studio platform, including deep reasoning capabilities and agent flows. So,

Microsoft, uh, announced two key additions to co-pilot studio, uh, deep reasoning capabilities for tackling complex problems with your own data and agent flows that integrate AI flexibility with rule-based automations. So, uh, we did have Ray Smith, the VP of AI, uh,

on, uh, sorry, the VP of AI agents at Microsoft on the show on Friday. So make sure if you're interested in this, go listen to episode four 92. We gave you a complete breakdown, like the first people in the world, uh, to get it straight from Microsoft, uh, to you guys. So make sure you go listen to that episode. I think it was a great look from Ray, uh, just into the future of AI agents and everything that Microsoft is working on with these new agents. Uh, but, uh, uh,

Today is actually the day that Agent Flows is being released. Yeah, today's March 31st. So, I mean, this is super new. So make sure to check if you are a heavy Microsoft 365 co-pilot organization, make sure you check in on Agent Flows, which should be released today.

So Microsoft also announced that they've more than 400,000 AI agents were created in Copilot Studio just last quarter, showcasing some rapid adoption amongst enterprise users.

So there's a lot of new agents. I mentioned two of them, but another one, the new analyst agent, I think is a standout feature. So it's functioning as a personal data scientist capable of processing Excel files, CSVs and embedded tables to generate insights via Python code and visualizations.

The deep reasoning, which I think is grabbing a lot of headlines. So the deep reasoning agents has some new and improved capabilities that allow it to perform methodical analysis, enabling use cases like generating RFP responses or conducting due diligence for mergers and acquisitions. I mean, whatever you might be using it for. And then AXA.

Agent Flows, which should be released today. So that combines, and this is huge, deterministic business logic with AI reasoning, addressing customer needs for industries all over the place from fraud prevention to operational optimization.

So here's why that's important and the whole deterministic piece, right? So that means there's a piece of this new agent flow that's not generative, right? So I talk about, and I've talked in the past a lot about notebook LM and how it's grounded in your own data. And if you ask it something that's not in the data, it's just going to be like, yo, I don't know. So this is a feature of the new agent flows. It's deterministic and it lives just inside of your essential data.

essentially your Microsoft Graph integration. Any of your live dynamic data within Microsoft 365 Copilot, that's what this new agent is drawing from. Much less a lessened likelihood of hallucinations in the way that that's set up.

So other industry players, including Google, OpenAI, Salesforce, and Amazon are intensifying competition with their own agentic platforms. But Microsoft's approach prioritizes accessibility, offering tools for both technical and non-technical users to create custom agents through natural language interfaces in low code environments. That's the huge thing here, y'all. Everything I just said, natural language, right?

You don't have to be a developer. You don't have to know Python. You don't have to know JavaScript or any other programming language. The language that this accepts is human language, right? So you can build these very impressive multi-agentic flows that use deep reasoning. So this uses OpenAI's O3 mini model, you know,

to run this, which is wild, right? So a lot of these agents that we talk about, you know, through

Whether it's Google, whether it's OpenAI, you know, there's the new Manus, right? They're great. Don't get me wrong. But one of the problems is a lot of times when you set up these agents, they're not necessarily working with your dynamic data. So you might be uploading some data, but that data changes, right? That document, that report, that, you know, that quarterly draft that you're, you know, updating a huge document, that document might change dramatically.

monthly, weekly, it might change every single day. That's one of the downsides or one of the cons when you're working with some of these other agentic flows that aren't Microsoft and that don't have access to your up to the second data and information. I think huge announcements from Microsoft that I don't think a lot of people were paying attention to, which is why I brought Ray on the show on Friday. Make sure you go listen to Episode 492 for that.

All right. What's that $40 billion number I teased? Well, that's OpenAI is reportedly close to securing a massive record-breaking $40 billion with a B. I know I kind of get the sniffles. The allergies got me today, but that's not a sniffle. That is $40 billion funding round signaling some significant growth and investment. So this is according to Bloomberg.

So reportedly SoftBank is leading this $40 billion round with an initial investment of $7.5 billion, followed by an additional $2.5 billion from a syndicate of investors. So a second tranche later this year is expected to see SoftBank contribute another $22 billion, along with $7 plus billion from other investors. So a second tranche later this year is expected to see SoftBank contribute another $22 billion,

So according to Reuters, OpenAI, though, must first finalize its transition to a for-profit entity by the end of 2025 to secure the full $40 billion funding round led by SoftBank. So,

That's huge. And failure to meet that deadline could result in SoftBank reducing its investment to only $20 billion, significantly impacting OpenAI's growth plan. So this funding round follows OpenAI's previous $6.6 billion raise in October of last year, which was large.

led by Thrive Capital and valued the company at $157 billion. So with this new round, OpenAI's valuation is projected to soar to $300 billion, showcasing its rapid ascent in the AI industry.

So yeah, this one's obviously going to get very interesting, uh, with specifically Elon Musk and XAI, uh, seemingly doing everything it can, uh, to try to slow down OpenAI's plan, uh, to convert from a nonprofit, which it was originally set up, uh, 10 years ago as a nonprofit. And they've been trying to transition to a for-profit now, uh, for the better half of a year, uh,

but Elon Musk and some others are trying to delay their plans in doing so. So now the stakes are extremely high. We're talking $20 billion potentially, uh, that they could not receive if they do not transition to a for profits, uh, by this kind of, uh, end of year deadline. So, uh,

Wow. Wow. I mean, talk about talk about high stakes. Right. We all think that the work that we're doing on a day to day basis is high stakes. And don't get me wrong. It is right. If you're saving lives, you know, helping people. But man, I'm glad that I'm not in the CFO seat over there at OpenAI. It's like, yeah, yeah.

talking $20 billion in funding in the ballots if you cannot complete this transition to a for-profit in time, and that's according to reports. So yikes, I would not want to be in that role.

Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. Next, this little company that is a little behind in AI named Apple.

Right. They face so many. I mean, they face class action lawsuits in the past couple of weeks because, you know, they've been promoting this Apple intelligence thing that doesn't really exist. Well, now there's some new AI news from Apple and they're reportedly developing an AI powered doctor and revamped health app. So.

So according to Bloomberg reports, Apple is advancing its health care technology with an AI powered doctor, a redesigned health app and a personalized health coach under the codenamed Project Mulberry. I don't know why they're always like Barry codenames, but.

I don't know. Anyways, Apple is creating an AI power tool that will analyze health data from devices like the Apple Watch to provide tailored healthcare recommendations such as dietary advice for users showing signs of high blood pressure.

So the redesigned health app, which is right now will be unofficially named Health Plus, will feature food tracking, a first for Apple, placing it in competition with platforms like MyFitnessPal and Noom.

So the app may also act as a personal trainer by using iPhone's camera to access workout techniques and suggest improvements potentially integrating with Apple's fitness plus service. So according to reports, Apple is collaborating with its in-house physicians and plans to expand its team with specialists to produce educational content, possibly featuring a celebrity doctor to enhance engagement. That's just what we need. Uh, more celebrity doctors, uh,

So CEO, Apple CEO Tim Cook has emphasized Apple's commitment to health and wellness for the better part of half a decade, calling it the company's quote unquote greatest contribution to mankind.

We also had a story in our AI News That Matters segment last week that said Apple was reportedly trying to jam some AI-powered cameras into its Apple Watch, into Apple AirPods, essentially trying to put cameras...

everywhere, right, not just on your phone. And now this kind of might make a little bit more sense when we see that Apple is really just trying to prep an AI doctor and really just trying to make a even bigger investment into the AI health space. Livestream audience, what do you feel about this, right? Do you want AI-powered cameras?

on your Apple Watch or on your AirPods? Do you want all of this AI technology on every single other thing? For me, I'm torn. I have an Apple Watch, I have AirPods. I don't want cameras in them because I think that's super weird and probably intrusive. But I love this idea of an AI doctor.

Right. So, you know, that's it's one of these things that we constantly have to grapple with, you know, not just as consumers, but as business leaders. Right. Like how much of our data do we want to offer up for the sake of, you know, for the sake of increased productivity, for the sake of potentially increased revenue, for the sake of increased health? Right. It's interesting. It's something I'm even always personally grappling with. Right.

Marie says, seems like Apple is scrambling for a win. Astute observation there, Marie. Yeah, Apple has been, you know, getting crushed in the essentially the large language model race, bringing artificial intelligence features to its iPhone, which has been in abysmal rollout recently.

I do think there has to be some sort of movie made one day about how Apple potentially lost trillions of dollars in market cap by screwing up Apple intelligence so badly. So, yeah, it should it should be interesting. Yeah. Most people are just saying no. You know, Richard from YouTube says bring back the oxygen level.

Max from LinkedIn says, AI doctor is cool. More cameras. I don't know. Yeah. Same thing. Right. Yeah. Like we all want the capabilities, but yeah. Do we all want, you know, 10 cameras, right? If you have a smart ring, do you want a camera on that? Do you want an AI powered camera in your tennis shoes? I don't know. All right.

Here's the multi-billion dollar story that really no one is talking about. Yeah, there was so much AI news this week. Some of these stories just got no real attention. But Elon Musk's XAI has acquired Twitter or X for $33 billion to strengthen its company XAI. Yeah, confusing, right?

So Elon Musk's AI company has acquired Twitter, now known as Axe, which is also his company. So this was a little bit more of kind of some paperwork and more of, you know, kind of some official acquisition, you know, legalese. But, you know, essentially now,

Elon Musk's AI company is the official new owner of the social media platform, Axe, formerly known as Twitter. So Elon Musk's artificial intelligence company, Axe AI, has purchased Axe, formerly Twitter, in a $45 billion all stock deal, which includes $12 billion in debt.

So the social media platform itself is valued at $33 billion in this transaction, according to Reuters, which, you know, I believe Elon Musk originally acquired Twitter for $44 billion. But we saw reports previously that the valuation had dropped to about $10 billion. So this was actually some...

at least for me, some, some personally shocking, uh, you know, piece of reporting here that we had saw the valuation from the original price of 44 billion. We saw reports that it was valued, uh, you know, earlier this year at only 10 billion, uh, yet in this acquisition, which is kind of an acquisition, kind of not, but definitely still an acquisition. I know it's weird, uh, but we saw it valued at $33 billion. Uh, so, uh,

The deal, though, positions Musk to merge resources between XAI and X, consolidating data, computing infrastructure, distribution channels, and talent to enhance AI development. So Musk emphasized that integrating X with XAI

will improve training data for its chatbot Grok, potentially accelerating advancements in AI models and capabilities. So this acquisition follows XAI's recent funding success, where it raised $10 billion at a valuation of $75 billion, solidifying its position as a key competitor to OpenAI and other global AI firms.

So the merger could allow X to serve as a distribution platform for X AI's products while leveraging real time user data from X to improve AI training capabilities. My gosh, too many, too many exits. I still just like calling it Twitter.

to bolster AI capabilities. XAI has been expanding its infrastructure. Its Memphis-based supercomputer cluster called Colossus is reportedly the largest in the world and is designed to train next generation AI models like Grok 3. I don't know. You guys want a hot take? I know this is a new show, but...

It's going to take more than tens of billions of dollars and, you know, all these acquisitions for anyone to actually use XAI or Grok products. So I don't necessarily understand this. Right. The problem with this, this kind of this merger and this, you know, this, you know,

Beautiful, you know, partnership between XAI slash Grok and Twitter is, well, Twitter has been shown or, you know, X, whatever you want to call it, in many recent studies as the number one worst platform for disinformation, for bot activities, et cetera. So when one of the biggest concerns for many companies when it comes to using large language models, it's,

It is, you know, having trust in the models that they're using. So, you know, I've been saying this on the record for, I don't know, since Grok 1 was a thing that no one's going to use it. It doesn't matter how powerful it is, right?

We saw Grok 3 released earlier last month. You know, from a benchmarks perspective, it did fairly well. Yet I literally do not know a single enterprise company that is using it as its main large language model driver. Nor do I think any enterprise company

Uh, right. It's not yet available via the API. So you can only use it, you know, at grok.com or, you know, within the Twitter platform. So I'm not really sure what the longterm, uh, plan is here for profitability, uh, from Elon Musk and X AI. Yeah, that's me. Uh,

I just don't think it is a smart idea for enterprise companies to be using a model that now is even more tightly ingrained with the social media platform that has one of the highest instances of misinformation, disinformation in bot activities. And that's been shown across multiple studies. So, you know, do with that.

As you may, you know, Marie says trust plus transparency equals trust equals more customers. Yeah, good, good equation there. But yeah, I mean, without more trust and transparency in your training data, people aren't going to want to use it. So, yeah, I would still from an enterprise perspective, I wouldn't touch Grok or XAI with a 100 foot pole. I don't care about the $33 billion valuation.

More large language model news. So OpenAI has updated their previous flagship model, GPT-4.0, and it has jumped up to the number two spot in the LM Arena leaderboards. So

Yeah, another small story that's actually pretty big. So OpenAI announced an updated version of its GPT-4 model, highlighting major improvements in coding, instruction following, and creative capabilities. So most impressive, though, is that this up

version of GPT-4-0 shot up on the LM Arena leaderboards from fifth place to second place, only trailing the just released and extremely impressive Gemini 2.5 Pro from Google.

So also, you might be wondering, yes, that means that the now updated GPT-4.0 model has surpassed OpenAI's newest model, which is GPT-4.5, at least when it comes to head-to-head human preferences. So that is what the LM Arena measures. I think it's an extremely important measure.

kind of leaderboard or measurement to talk about, you know, all these large language models, especially recently, I think they're overfit, right? So what that means is I think that the engineers building these models, uh, you know, especially in 2023 and 2024 really tweaked them to perform really well on certain industry benchmarks, but they weren't necessarily, uh,

recognized by humans as being better. So I think it's important to look at both traditional benchmarks and these ELO scores from LM Arena, which is essentially, you know, you put in one prompt, you see two outputs, you don't know who the outputs are from and you choose which one is better. So it is the blind taste test Pepsi versus Coke, but this new version, the updated version of GPT-4-0 has shot up and it's actually really, really good.

So the update has been described as making the AI more intuitive and flexible with some users referring to its responses as unhinged due to its ability to generate less restricted content. So yeah, OpenAI CEO Sam Altman in the announcement

said the new version of gpt 4.0 is particularly good at coding instruction following and freedom so yeah there's kind of this like low-key uh unhinged mode that a lot of people are talking about i kind of tested it out a little bit as well but the guard rails on the new gpt 4.0 are down a little bit uh that's not talking necessarily about the new image model which is actually uh you know with

became a little more restrictive over the weekend, which we're going to be talking about soon. But with its actual base GPT-4-0 model, it's actually a little less restrictive.

Also open AI bless up. They talked about some of the updates better at following detailed instructions, improved capabilities to tackle complex technical and coding problems, improved intuition and creativity and bless up. Finally, fewer emojis by default. So maybe we can stop seeing all the social media posts and emails that have like 42 emojis. Thank you. Yeah.

I know I use emojis sometimes sparingly, but I'm tired at like one point on my screen seeing like a dozen emojis. It's just like y'all. So thank you OpenAI for getting rid of that since this is how everyone just writes now anyways.

All right. So the next piece of AI news also about OpenAI, but on the legal side. So a federal judge has ruled that the New York Times' lawsuit back from December 2023 against OpenAI can advance.

So the lawsuit accuses open AI of siphoning the times articles and actually millions of them allegedly without permission or payments to train its GPT models in violation of copyright laws. So that is according to

according to the New York Times. So attorneys for the Times claim the newspaper's content is one of the largest sources of copyrighted text used to build chat GPT, alleging the AI sometimes regurgitates articles verbatim.

So the judge rejected OpenAI's request to dismiss the case, but at least in a small victory to OpenAI narrowed the scope, allowing the primary copyright infringement claims to go forward while promising a detailed opinion soon. So OpenAI argues that its data collection practices are protected by quote unquote fair use, citing research and innovation, but fair use.

The Times claims its reporting was neither transformed nor lawfully reused. A key legal issue is this term of market substitution with publishers fearing chatbots summarizing news could divert readers away from their websites.

which obviously impacts their ad revenue. OpenAI claims the Times manipulated prompts to force verbatim outputs, which it says are atypical for regular users of ChatGPT. Evidence gathering in pretrial hearings will now proceed with depositions expected to remain confidential while public disputes over evidence are settled. This one could be huge, y'all.

I've been talking about this a lot, uh, since December of 2023. Uh,

Another small detail of this whole case is the New York Times literally asked for the GPT technology to be destroyed in its lawsuit, right? That's not an exaggeration. That is something they actually asked for because they said, okay, well, the New York Times and all of our paid articles were one of the big pieces that was in this data set. So I don't think that'll happen, but it's going to be interesting to see

what actually happens here. It could be a monumental, uh, kind of ruling that could impact millions, uh,

millions of businesses worldwide, right? Because so many people now, hundreds of millions of business professionals like us are using AI models in their day-to-day work, right? And it's not just chat GPT because there's probably thousands of other AI apps that use the GPT technology. So in the, I think very rare case, you know, I don't

want to put percentages on it, but it has to be less than a 1% chance that they say, okay, the GPT technology has to be destroyed because I don't even know if that's feasible. It's already out in the wild, right? It's already been used to distill other models. So you can't exactly just take it away. But I mean, this would be a huge impact to everyone, especially in the US. So it's definitely one to keep an eye on.

All right. Another thing to keep an eye on is this new study from Anthropic. So researchers at Anthropic have made a significant breakthrough in understanding how large language models work, potentially paving the way for safer, more reliable AI systems. So in a newly released study, Anthropic created a new tool akin to an FNC,

MRI scan for AI, enabling researchers to trace how large language models process information and make decisions. So yeah, this is pretty cool. It's really interesting. If you want to better understand how large language models work, I highly recommend you go read this study from Anne Profic. We shared about it in our newsletter last week. So this new tool that they developed that they detailed in the study is called a cross-layer transcoder.

or CLT, and it identifies circuits of neurons linked to specific reasoning tasks, which offers new insights into the internal logic of AI models. So yeah, essentially generative AI and large language models. I mean, people largely call them and think of them as a black box, right? People don't necessarily understand how they work. So this new paper from Anthropic, very, very,

telling. So, uh, the study also revealed that multilingual models like anthropics own Claude share conceptual reasoning across languages. So instead of reasoning separately for each language, the model uses shared neural circuits to process universal concepts and translate the output into the desired language. That's wild to think about, right?

It's not that they have created their own language, but this large language model, according to this new research from Anthropic, they're essentially saying when it's thinking in a multi-language capacity, it's not like, oh, you know, let's say it's using English, Spanish and French. Right. For whatever reason, maybe you're you're working on translations. Right. Right.

It's not each time translating it back and forth, right? But instead it's using these kind of mural circuits to process universal concepts, right? So it is doing the work of,

almost outside of normal language capabilities, which is also pretty weird and wild to think about. So a little bit more about this CLT approach. So it allows researchers to trace reasoning processes across layers of the neural network. So this could improve auditing AI systems, which is huge for safety concerns and help

develop better guard rails to prevent hallucinations, jailbreaks, or just erroneous outputs. Right now, though, this technique has limitations, including its inability to capture dynamic attention shifts in large language models. So attention mechanisms play a crucial role in how models prioritize input while generating responses, which this CLT does not fully address.

So scaling the method for longer prompts remains a challenge as well. Analyzing circuits for prompts of even tens of words, not even when we're talking about hundreds or thousands or millions of words, but analyzing circuits for prompts of just tens of words requires several hours of expert work, raising questions about the practicality for more complex outputs of using this kind of CLT methodology.

But this breakthrough could encourage businesses to adapt AI more confidently by making the inner workings of large language models, more transparent companies may feel safer integrating AI into their operations. Yeah. Sandra says blowing my mind. Yeah. Same. Like I've, I've read this multiple times and each time I'm just kind of silent and I'm thinking like, huh?

This is weird, right? It's like the more you use large language models, right? And I remember using the very early versions, pre-ChatGPT versions of GPT-3 technology in early BERT, right? Pre-Gemini and seeing how much they've improved and seeing these reasoning models now and then reading this study.

It was really eyeopening. I'll just say that, right? Uh, I don't want to take away all the, all the goodies of going to read it yourself. So, uh, yeah, we'll make sure to link it again in today's newsletter. So, uh, if you haven't already, make sure you go sign up for that at your everyday AI.com. All right. Undoubtedly the most talked about thing, at least on the internet this week in AI news was the new, uh,

OpenAI GPT-4-0 image generation. Yes, the new name. So DALI is dead. There is, well, technically DALI is still around in some of the older models if you really want to go use it. I've never really used DALI. It's not good at anything. But OpenAI has officially launched the native image generation capabilities.

of its multi-model GPT-4.0 model for chat GPT users, marking a major milestone in AI technology. So the name of this is just 4.0 image generation. All right.

I'm sure some unofficial name will catch fire and people will be calling it that. But right now, like I said, this isn't a new version of DALI. This isn't Sora Photo. Right now, it's just called 4.0 Image Generation, and it is bonkers. So their new multimodal GPT-4.0 model is now capable of handling text, code, and images.

It is currently available for paid users. Originally, it was supposed to be released to free users as well. But over the weekend, the company announced that access to free users would be delayed. And they also instituted rate limits on paid accounts for image generation. As they said, the new feature was, quote unquote, melting their GPUs due to biblical demand.

So the new feature went mega viral as the entire Internet was scrambling to create Studio Ghibli style visuals. Right. Which I don't necessarily understand. Right. But it's this kind of anime ask style and everyone's, you know, taking their family photos and uploading them and, you know, getting these Studio Ghibli outputs. I didn't do it. I don't care about that kind of stuff. But I mean, literally everything.

Every single AI media outlet, every single social media, even LinkedIn was being just overrun by everything Studio Ghibli from this new OpenAI's 4.0 image generation.

So unlike the older DALI 3 model, GPT-4-0's image generation is integrated directly into the same system. So yeah, GPT-4-0, the O is for Omni. So it is a true multimodal large language model, right? Whereas before how it worked when it was just GPT-4 or GPT-4 Turbo,

even when we're talking about text to speech or voice, there was technically multiple models under the hood, right? So now with GPT-4.0, now that we have the new image generation model, it's all under this Omni model, making it more accurate in interpreting prompts and producing detailed lifelike images. So yeah, I'm curious, live stream audience, did any of you use this over the weekend? I'd love to know your thoughts, um,

I'm personally blown away, but a couple more details. So users can refine images in real time. That's huge. Through conversational edits, achieving higher precision and flexibility compared to previous models. So key features of this new 4.0 image generation include accurate text rendering within images. That's big.

Because so many, um, aside from, uh, you know, models like ideogram, which does great with text, uh, right. So earlier versions of things like mid journey, uh, you know, obviously Dolly, uh, Google's earlier, uh, imagine, uh,

AI, uh, photo apps, they all really struggled with text. And that's what a lot of people, you know, sometimes want to do, whether they want to create a photo with text on it, or if they want to create infographics, if they want to create, uh, you know, things with branding in logos and having words,

you know, mixed in these images. I mean, it was pretty abysmal, you know, prior to the end of 2024, but this new model, the new 4.0 image generation is extremely, extremely good at working with text. And that really opens up the capabilities.

Because now it can handle complex prompts. It can support different artistic styles. Right. But now there's some great practical applications for it. Right. So things obviously like marketing with social media graphics, invitations, recipes, education, creating scientific tools.

diagrams, infographics, game development with consistent character design, right? Things with consistent branding, with logos and advertisements. So it's really impressive. The other thing is it really has improved contextual understanding. So you can, as an example, upload 10 different photos

And say, hey, mix these together. Right. You can upload an image of a backdrop. You can upload, you know, images of three people. You can upload, you know, six, you know, six products and say, hey, combine all these. And it does it.

Again, this is an early version, but it is extremely impressive. Y'all, I am someone, I am not easily impressed. Yes, I cover AI every day. I've done 500 episodes. I've been lucky enough to partner with big brands like Microsoft, Adobe, and others. I get to use a lot of these AI tools even before they're publicly released. I'm not easily impressed, if I'm being honest.

Very impressed with this new GPT-4.0 image generation. Right now, there's limitations still, right? It's obviously not perfect. It's widely been reported. There's cropping issues, aspect ratio issues, challenges with non-Latin-based fonts and scripts, and still difficulty retaining details in small text.

OpenAI CEO Sam Altman did describe the launch as a quote unquote new high watermark for creative freedom with the company actively refining the model based on user feedback. So this release positions OpenAI to compete with the new and also extremely impressive multimodal capabilities of

Google Gemini's two flash model, which introduced similar but not as robust multimodal capabilities earlier this month. Yeah, Pedro. Yeah, I like this. Pedro just says it's brutally good. Douglas. What's up, Douglas? Douglas said I uploaded my headshot and had it made a South Park version of it. Results were spot on.

So, yeah, I think there's a lot of fun, you know, cutesy, you know, things that you can do with this model, right? But...

As someone that has worked in, you know, MarTech and comms for 20 years, right? I was lucky enough to spend a good chunk of my career working with, you know, not just the marketing and comms departments from Nike and Jordan brand, but working with dozens of the largest creative agencies in the world, right? So I really have seen a lot of behind the scenes work.

of how big brands, you know, essentially create their marketing, create their advertisements. And y'all, I cannot underestimate what this does right now. Anyone with a, with a $20 a month chat GPT plus account,

And anyone that knows how to work a computer can literally produce advertisements and marketing campaigns that are on par. And I kid you not that are on par with the biggest multi-billion dollar advertising and marketing companies in the world. Right? It's been very impressive, like single person studios just over the weekend since this has been released.

have been releasing some behind the scenes of how they're creating these campaigns. And it is, they are mind-bogglingly good, right? Like, especially I think for product advertisements, things that are, you know, obviously very visual, but this new model's ability to just accurately, you know, take multiple

images that you upload work with tax, but also work with the context window. That's the thing that most people aren't taking advantage of. I think right now, right when I was just playing around with this, I uploaded, you know, an entire transcript.

of one of my interviews from last week. And I said, hey, make me an infographic that explains some of these more complex topics, right? And it did it, right? Whereas previously, if you're working with something like mid-journey or stable diffusion or some of these other diffusion-based AI models, that's not how it works. You kind of had to talk in prompt language and describe things to a T. No, you can just dump a bunch of context, just a bunch of texts and say, hey, make me something, right? And

And if it's not good enough, you can talk to it in natural language. That's the promise here and the power of this new update. Extremely, extremely impressive. Sandra is saying, can you do a show showing us how to do that? I don't know. Do you guys want that? Let me know. Yes or no, live stream audience. If you want more of a behind the scenes, you know how to do all this. Like I said, my background, I've done this. I know how this works.

I'm personally blown away. I don't know if you guys want something like that. Let me know here in the, you know, in the live stream. Just, you know, say yes. Do a visual. If you know, if you're listening on the podcast, I always keep my email on my LinkedIn so you can just reach out to me as well.

All right. Our last story of the week. I saved, I think the best for last, even though people weren't talking about this release from Google Gemini, just because of, you know, the power in the kind of the viral nature of OpenAI's GPT-4.0 image generation. But literally the world's most powerful large language model was released this week and hardly no one's talking about it. It doesn't make sense, but.

Google has introduced Gemini 2.5, its most advanced AI model yet.

So Gemini 2.5 features a massive 1 million token context window, enabling it to process extensive data sets, including text, audio, images, video, and even code repositories. So an upgrade to 2 million tokens is expected soon, further expanding its capabilities. So if you're not super technical, you might be saying like, okay, what does this mean? In theory, let's say you have a PDF of a book.

Gemini 2.5, you can copy and paste, you can drop that thing in there, and it's going to be able to go through and answer any questions you have. Even great models like ChatGPT, Claude has had a decent 100,000 context window. But a lot of times, the more you work with a large language model, it might start off really great.

And it's like, Hey, this is fantastic. This large language model is remembering everything. And the more you use it, then it starts to get a little dumb. That's because a lot of the times the, uh, some of the information that you share, or if you're trying to refine a prompt, uh, it gets lost, right? Eventually it, it, it, uh,

you know, that initial information that you share is outside of the context window, which is why sometimes when you're using, uh, you know, an AI chat bot, it starts out great. And then it starts to just stink. It's because of the context window. So this is extremely impressive. A 1 million token context window for Gemini 2.5 pro, uh, also, uh,

It instantly took the number one spot on the LM Arena leaderboard, and it was not even close. I believe it was almost a 40-point lead, so to speak, in ELO scores. When normally, when a new large language model gets dropped and, oh, it's the best model in the LM Arena, it might be by like one or two points, maybe three.

The new Gemini 2.5 model, Gemini 2.5 Pro, came in at almost 40 points higher than its nearest competitor, which is now the updated version of GPT-4.0 from OpenAI. Also, maybe even more important news,

Related to this, over the weekend, Google, again, silently, Google stopped, which I am impressed with, right? You know, they had a bad rollout originally of BARD. You know, I won't get into that. I've covered that a lot. But, you know, over the last six months, I love what Google's doing. They're not...

They're not investing heavily into the marketing. They're not making this a big show. They're just shipping. They're just shipping huge releases, shipping impressive updates. And another impressive update is over the weekend, Google also made this available to free users. So you don't even have to go into AI Studio. So Google, they have their AI Studio, which is more for developers.

And then they're kind of front end Google Gemini chatbot. So yeah, you can use now because previously for the first like year or so of Google Gemini, they didn't put their most powerful models in Google Gemini. You had to go inside Google's AI studio, which does not protect your data, right? Unfortunately, but Google Gemini does on the front end if you are a paid users, but

Now you can access Gemini 2.5 Pro even if you are a free user. Also, what's very important, the model now has a thinking mode. So it is more of a hybrid model, Gemini 2.5 Pro, because it allows the model to reason through its thought process before delivering responses, potentially, again, inching closer and closer to the ever-moving goalposts of artificial general intelligence or HEI.

So obviously, aside from ELO scores or human preferences, which Google Gemini 2.5 cleaned up on, it also did fantastically, not surprisingly, on all of the normal benchmarks, including the newest kind of trending benchmark, which is Humanity's Last Exam, which is a challenging data set designed to test the limits of human-like reasoning and knowledge.

So the previous high score from a large language model was OpenAI's 03 Mini that had a 14% and a DeepSeek R1 had an 8.6%. Yet the new Gemini 2.5 Pro, 18% on that. So more than double DeepSeek R1 and comfortably ahead of OpenAI's 03.

So Gemini 2.5 Pro, like I said, is now available for both everyday users on the front end of Gemini chatbot, as well as developers and enterprises inside Google AI Studio. And a rollout to Vertex AI is planned in the coming weeks.

So according to Google, the model's advancements in reasoning, personalization and coding could significantly impact industries ranging from software development to research, offering businesses and developers tools to innovate faster and more effectively. One of the things that I was personally most impressed by Gemini 2.5 Pro is its ability to one shot anything when it comes to coding, software development,

extremely impressive, right? Not that you ever should do anything one shot, right? You should always go back and refine something. But what did I make? Just to see its ability, I made a side runner 2D game where it's Chicago deep dish pizza is running through the city or something like that in one shot. And it got it right. It was extremely impressive. But I really think

we should be paying attention to Gemini 2.5. All right, that's it, y'all. Let me quickly recap the biggest AI news stories that matter for the week. So first, Microsoft released some pretty groundbreaking new agentic capabilities in Copilot Studio. So no code, being able to just talk

to Copilot Studio, use its new reasoning model and deterministic capabilities. Next, OpenAI is reportedly nearing a $40 billion funding round led by SoftBank, though it is...

could be less than that if OpenAI is not successfully able to convert from a nonprofit to a for-profit by the end of the year. According to reports, Apple is reportedly developing an AI-powered doctor and revamped health app

Elon Musk's XAI acquired the social media platform X, formerly known as Twitter, for $33 billion valuation, which is technically $45 billion in stock because it included $12 billion in debt.

OpenAI updated its GPT-4.0 model, which shot it up, actually passed GPT-4.5 into second place on the LM Arena Board. A judge, a federal judge, has allowed the New York Times' copyright lawsuit against OpenAI to proceed. A new breakthrough study from Anthropic was released that helps everyone better understand the black box model

of large language models going over its new tool and technology called a Cross-Layer Transcoder or CLT.

OpenAI released their new very viral GPT-4.0 image generation. And then last but not least, we have the world's most powerful AI model that was also released this week. My gosh, that was a lot to cover in a very short amount of time. Y'all, the AI world is straight up en fuego. So much going on. So if you thought AI was hitting a wall, if you thought capabilities were near the ceiling,

Not even close. Another exciting week in AI. And I hope this was helpful. If it was, please let me know if you're listening on the podcast. I'd appreciate it if you hit that little subscribe button. Go find it, right? If you're listening on Spotify or Apple, I would appreciate you leaving us a review as well. If you're listening on social media, let me know what you want to hear more of. But also click that repost button if this was helpful. You know, a lot of people tell me everyday AI is their cheat code. I'm like, yo, don't keep it to yourself. Share this with

Someone, right? I'd appreciate it if you did that. Also, I'd appreciate if you tune in tomorrow and every day for more Everyday AI. Thanks, y'all. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 493: ChatGPT’s groundbreaking image update, Google’s chart-topping Gemini 2.5 drop, Microsoft’s new reasoning agents and more AI news that matters 56:06 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

EP 493: ChatGPT’s groundbreaking image update, Google’s chart-topping Gemini 2.5 drop, Microsoft’s new reasoning agents and more AI news that matters