We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 494: Gemini 2.5 Pro Unlocked: Inside the world’s most powerful AI model

2025/4/1

Everyday AI Podcast – An AI and ChatGPT Podcast

AI Deep Dive AI Chapters Transcript

People

Jordan Wilson

一位经验丰富的数字策略专家和《Everyday AI》播客的主持人，专注于帮助普通人通过 AI 提升职业生涯。

Topics

我作为Everyday AI节目的主持人，认为Gemini 2.5 Pro是我用过的最好的大型语言模型之一。它不仅在各种基准测试中表现出色，而且在用户偏好度方面也遥遥领先。它采用了一种技术混合模型，具有内置的思维能力，并使用链式思维推理来解决问题。其上下文窗口高达100万个token，能够处理大量的文本信息。此外，它还具有先进的编码能力，在许多编码基准测试中都取得了高分。它还支持多模态输入，能够理解文本、图像、音频和视频等多种类型的信息。Gemini 2.5 Pro的免费发布也使其更容易被大众使用。总的来说，Gemini 2.5 Pro是一个功能强大且易于使用的AI模型，具有广阔的应用前景。 Google正在改变其AI更新策略，不再进行大规模的营销宣传，而是直接发布更新。这种策略使得Gemini 2.5 Pro的发布相对低调，但其强大的功能仍然值得关注。 Gemini 2.5 Pro在复杂逻辑和数学方面表现出色，无需外部工具。它在人类偏好方面得分很高，在ELO评分中领先其他模型。在市场影响方面，Google Gemini 2.5 Pro在混合模型方面领先于其他公司，并正在努力成为思维模型的领导者。虽然目前该模型还存在一些bug，但Google正在努力改进。未来，Google将继续增强Gemini 2.5 Pro的推理和编码能力，并将其与Google的生态系统进行更深入的整合。

Deep Dive

Chapters

This chapter introduces Gemini 2.5 Pro, highlighting its superior performance compared to other LLMs. It also briefly covers AI news, including Runway Gen 4 and OpenAI's funding round, before diving into the details of Gemini 2.5 Pro's capabilities.

Gemini 2.5 Pro is considered the best LLM by the speaker.
Runway unveils Gen 4 AI video generator.
OpenAI secures a record-breaking $40 billion funding round.
The episode will be split into two parts, focusing on high-level features in part one.

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life.

In the two plus years that I've been doing the Everyday AI Show, I don't know if there's ever been an instance where an AI update this big, especially a large language model update, has been talked about so little. I think that there's a reason for it, but we're going to talk about it today because I think the new Gemini 2.5 Pro

Pro model from Google is probably the best single large language model I've ever used. And I don't think I'm alone in that because it has not just broken just about every single benchmark, but in terms of human preference, it is quite literally off the charts. So today we're going to be going over Gemini 2.5 Pro unlocked inside the world's most powerful AI model.

All right. I'm excited for today's conversation. I hope you are too. What's going on, y'all? My name is Jordan Wilson and welcome to Everyday AI. This is your daily live stream podcast and free daily newsletter, helping us all not just keep up with AI, but how we can use all these advancements to get ahead, to grow our companies and our careers. If that's what you're trying to do, welcome. This is where you learn on the podcast or the live stream. But

This is only half the battle. You need to leverage what we talk about today and where you do that is our website. So if you haven't already, please go to youreverydayai.com. Sign up for the free daily newsletter. Each day in our newsletter, we recap each day's podcast or live stream, as well as keeping you up to date with literally everything.

everything else in the world of AI. So it is your one-stop shop to stay ahead, just like this podcast. I always like to remind people, this is unedited, unscripted, trying to bring you all something real in the world of artificial intelligence.

All right. So I am excited to get into today's topic and talk about Gemini 2.5 Pro, by far the most powerful AI model I've used. But before we do, let's first start off as we do some days, well, most days with going over the bullet points of the AI news.

All right, so first, Runway has unveiled Gen 4, their newest AI-powered video generator capable of creating consistent characters, locations, and scenes with realistic motion and physics. So the new model allows users to generate videos using reference images and textual descriptions, offering superior prompt adherence compared to previous models and style consistency without additional training.

So backed by investors like Google and Nvidia, Runway does, though, face some legal challenges over copyright concerns while aiming for $300 million in annual recurring revenue and a $4 billion valuation. So a study warns that AI tools like Gen 4 from Runway could disrupt more than 130

So yeah, runway Gen 4 is going to be a big hit.

I think is probably in Sora territory, maybe a little bit better. I mean, we'll see it is just dropped. So I'm sure the reviews are going to be coming out, but I think, you know, Google VO might have some competition and Hey, in terms of availability, runway gen four is available to everyone like open AI Sora is, whereas Google's VO2 tool is not available to everyone, at least inside of their platform. You can access it from third party platforms though.

All right, our next piece of AI news, another record breaker. OpenAI has officially secured a record-breaking $40 billion, with a B, $40 billion funding round, valuing the company at $300 billion. So OpenAI has closed that historic $40 billion funding round, making it the largest private tech investment ever.

All right. So it does value the chat GPT creator at $300 billion. And the round was led by Japan's SoftBank, contributing $30 billion of that amount with additional investments from Microsoft. Interesting there. Thrive Capital and Alipay.

others. The funding does come with a condition, at least from SoftBank. Their investment could drop from $30 billion to only $20 billion if OpenAI does not fully transition into a for-profit entity by the end of 2025. And that would require approval from both the California Attorney General and Microsoft in a resolution of these ongoing legal challenges from Elon Musk, which I think are pretty much theater.

All right. And this also comes as OpenAI did just announce that their weekly active users has jumped up to 500 million. And OpenAI CEO Sam Altman did just say on Twitter that they added literally a million people in an hour, probably with all the Ghibli AI studio photo generations. Also, this comes on the heels of OpenAI just announcing that they would release an

model. So pretty exciting news there. So make sure to follow along. We'll be following that news. All right. Last but not least, some big news from Amazon. They've unveiled Nova App Store.

a new AI agent to compete in the agentic race. So Nova Act from Amazon is an AI agent capable of independently navigating web browsers to perform basic tasks such as filling out forms, making reservations or ordering food. So the Nova Act SDK is a toolkit for developers and it's available now as a research preview on nova.amazon.com allowing developers to prototype agentic applications.

So Nova Act is developed by Amazon's AGI lab, co-led by former OpenAI researchers. So according to Amazon, Nova Act outperformed OpenAI's and Anthropix agents in internal tests. But despite those claims, Amazon has not yet benchmarked Nova Act on some more widely recognized agents.

evaluations like Web Voyager. Also, reportedly, Nova Act will play a critical role in Amazon's upcoming Alexa Plus upgrade, a generative AI enhanced version of Alexa, potentially giving Amazon a competitive edge through its massive user base.

All right. So we're going to have a lot more on those stories and everything else you need to get ahead on our website at youreverydayai.com. So make sure you go there and sign up for the free daily newsletter if you haven't already. All right. Enough chit chat. Let's get into Gemini 2.5 Pro. No one's talking about it. It's...

wild. Like the fact that we have a large language model that's available now with the capabilities that Gemini 2.5 Pro has and hardly no one's using it. No one's talking about it is pretty telling, right? Of a couple of things. I think this is a case of shiny AI syndrome, as I like to call it, right? I think that

What Google has released can change fundamentally how we all do business, yet so few people are using it just because there's a new shiny AI object in the room, which is the new 4.0 image gen from OpenAI.

eye, really a groundbreaking visual model. And yes, we are going to be doing a show on that sometime soon. That one's going to require a lot of research. And speaking of that, even today's show, we're actually going to break it up into two chunks. So today we're just going to be talking about kind of the bullet points

high level, what's new. And then we're going to be doing a show maybe later this week or next week, kind of a part two. So let me know what more you want to see, what you want us to test from Gemini 2.5. So, you know, our second part is going to be more hands-on and use cases where today we're really just going over the bullet points of what's new. So make sure to let me know, live stream audience, what you want to hear from Gemini.

part two use cases. You want to see all that good stuff. Speaking, hey, good to see you. You know, our YouTube family here, our LinkedIn. Thanks for tuning in. Michelle, Samuel, Jose, Shares, Kyle, Sandra, Gene, Big Bogey, Christopher, Brad,

Brian, I can't get to you all. Thanks for joining. But do let me know what questions do you have on Gemini 2.5. But let's start here. What the heck is new in Gemini 2.5? Well, there's a lot. And it's also a little confusing because, you know, you might be having deja vu. You know, you might be saying, okay, wait, there's new Google Gemini updates.

you know, out of nowhere. Didn't this just happen? Yes, it did. So we're going to be doing a quick recap also of what was released like literally two weeks ago. But first, let's talk high level of what's new in Gemini 2.5. And then we're going to be going over all of these kind of piece by piece.

So some of the biggest things is now it is a technical hybrid model, although Google did not choose to call it a hybrid model. But what that means is it has built in thinking. So Gemini 2.5 Pro is a thinking model. It uses chain of thought reasoning kind of under the hood. You know, we've been talking about this a lot over the last few months and we'll continue to talk about it a lot in 2025.

kind of this is the new direction that large language models are going. So Google following suit here with Gemini 2.5. So think of it this way, you kind of have your quote unquote old school, you know, transformer models. And then you have your reasoners that essentially use more compute

to kind of do this chain of thought thinking or chain of thought reasoning under the hood. So Google Gemini 2.5 kind of combines both. So if you have simpler tasks, at least in my testing, it still kind of goes through that reasoning or those thinking steps, although it's pretty quick. So it kind of depends or sorry, Google Gemini 2.5 kind of decides how much compute or how much thinking it needs to use. But that's probably one of the biggest, you know, what's new. The other is the context window.

enormous, 1 million token context window, which is roughly like 750,000 words or 1,500 pages. So we are talking literally multiple books. I mean, we're talking 33,

thousand lines of code as an example so if you are brand new and you're like what the heck is the context window right that's essentially how much a large language model can remember at any one given time this is different than memory right but essentially

you know, think if you're having a chat with a large language model and you're giving it some information and you're going back and forth, right. Uh, with older models, right. Uh, so, you know, let's even talk chat, GBT, they're kind of a little behind in terms of context window, right. Uh, they had a, you know, roughly 32,000, uh, context window on their front end chat products. So that means, Hey, after, you know, 26,000 words, chat, GBT is going to start forgetting. So, uh, with this, at least in AI studio, uh, I did not see Google,

clarify anything on the front end. If you're using this inside of Google Gemini on the front end chat bot, we will be testing that though. We'll probably share it in the newsletter. But hey, essentially a 1 million context window, 1 million tokens is wild. That means that the chat is pretty much not going to forget, right? Until you use it like crazy.

incessantly, right? Like, like, like until you are going wild and you're not leaving that chat and you're dumping thousands and thousands or sorry, I should say hundreds and hundreds of pages. It's still going to remember, which is huge. Uh, another thing, advanced coding, uh, some of the top benchmarks score for, uh, you know, sweet bench as an example and complex code generation. So if you are big into software development, if you're big into coding, um,

Or even vibe coding, right? This whole concept of, hey, I'm just going to open a large language model, have it code something for me, have it code a Chrome extension for me, have it code a little desktop application, have it code a simple CRM, right? This was one of my bold 2025 AI predictions is everyday people like you and me would just be using AI to code our own little pieces of software.

Gemini is great for that, right? The good thing is you don't have to know anything. You don't even have to tell it what coding language to use. Just be like, yo, Gemini, I want a Chrome extension that does this. Build it for me and then give me simple step-by-step instructions on how I go ahead and install and deploy it. So fantastic for advanced coding. Already, I would say it's not the top

coding model in the world. I still think Claude Son at three, seven inches it out a little bit. There's a lot of different coding benchmarks, but you know, essentially Claude from Anthropic was so far ahead. It's like, it wasn't even close, right? It was like, they were one, a one B one C they were probably even number two, right? And everyone else was so far in the distance. Now Google has closed that gap and they're essentially a one B with Gemini 2.5 benchmarks.

Human preference is huge. So, you know, I've talked about this a little bit. I think a lot of the kind of the AI labs, especially in 2024, were kind of overfitting models. So what that means is when they were building them, going through post-training, all that, is they were doing it to get certain scores on benchmarks, right? So,

Google Gemini 2.5 does that, right? Like not saying they overfit it to get certain benchmarks, but it cleans up on benchmarks and, you know, essentially has top scores, you know, either number one or number two on every important and telling benchmark that there is. However, the big one is the ELO, kind of the ELO score. So in the LMO,

arena. This is essentially, I talk about this a lot of the show. Think of it as a blind taste test, Pepsi versus Coke. You put in a prompt, uh, you get two, uh, outputs, um, and you choose which one is better. Those outputs are not named. Right. And that kind of gives you an ELO score. Uh, generally when a new model comes out, right. So a GROK three or a GPT four Oh latest, or, uh, you, you

you know, or quad three, seven, right? Usually the new state of the art model will go into first place generally on the ELO scores, but maybe only by like two points. Generally, it's usually like a two to four point jump anytime a new state of the art model comes. And it's like, oh, it's the most powerful model in terms of what humans prefer, because that's extremely important, right? In this case,

Google Gemini came out by a 39 point margin, which is literally unheard of, has not happened. So yes, it checks the box in terms of benchmarks, but it definitely checks the box in terms of human preferences, which I think is usually more important, right? And the LM Arena has, I believe, multiple

millions of votes, right? Not millions of vote yet for Gemini 2.5, but already with enough qualifying votes, it is the top and humans prefer it by far. The other big thing,

It's free, right? Google snuck this in actually over the weekend. So they announced Gemini 2.5 last week. A couple of days later, they're like, oh, guess what? We're going to make it available for free. So if you do have a Google Gemini account, so you can just go to Gemini.Google.com. You know, you can use your Gmail or Google workspace credentials and you'll find Gemini 2.5 in there and you can start using it for free right now. All right.

So that is the high level. All right. And hey, live stream audience, let me know what your thoughts are of Gemini 2.5. Sandra's asking, you can use it to code a widget for you. Yeah, you can use it to code anything, Sandra. But yeah, you do have to, as an example, a Chrome extension or something that runs on your desktop, you have to still execute that, but it will write the code for you and tell you how to install or execute it.

So let's go over because you might be thinking, didn't this just happen? I'm confused. Wasn't there just new Gemini 2. something updates? Yes, there were. Okay, so about two weeks ago, mid-March, if you go back and listen to episode 482, if you want the full updates, we gave it to you there. I love Google's new strategy here.

Right. I think they had their original, you know, kind of December 2023 snafu where, you know, they put out this fancy marketing video about their AI and it turns out a lot of it wasn't true and it didn't work and they kind of got drafted the mud and they spend the better part of 2023 and 2024 way behind. Ever since, I love what Google's doing. They're not coming out with flashy advertising, flashy marketing, big announcements, big hype. They just come ship.

They just ship updates that are pretty amazing. So they did two weeks ago announce some pretty impressive updates that I still don't think people talked about. So if you want to know about that, you can go listen to that in episode 482 again for free on our website. Yeah, if you didn't know on our website, you can go and listen to every single episode we've ever done interviewing some of the world's top experts on AI. But here's essentially what was announced in the mid-March version so we can get this out of the way.

So Gemini 2.0 multimodal, which was huge. I think that kind of set the stage for this whole, you know, GPT 4.0 image gen multimodal by default. Amazing. I went over that. You can literally create a blog post with inline images, you know,

wild, right? You can also edit images with natural language, kind of like what you can do now with GPT-4-0's ImageGen. So mid-March, Google announced the Gemini 2.0 multimodal. They announced deep research was updated to the 2.0 model, whereas previously it was running on 1.5. They announced personalized Gemini, which I think some people like, some people don't like, right? But it essentially takes into account your search history. So it's a mode that you can select.

So that was new. They also announced Gemma 3, which is wildly powerful for a super small open source model. So you can run it locally. They announced Gemini Robotics running on Gemini 2.0, as well as big updates to my favorite AI tool, Notebook LM. Also, they upgraded that as well under the hood to the Gemini 2.0 integration.

versus previously it was running on Gemini 1.5. All right, so if you're scratching your head and being like, wait, is Jordan like a month late on this? No, Gemini just did, or sorry, Google did just have a ton of big updates a couple of months ago.

All right, let's get into it now. Let's go over kind of a point by point here. Again, this one's not going to be a super long one because we are going to have a point two or sorry, a part two. But here's kind of what's new in the Gemini 1.5 Pro launch.

Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI.

So launched late March by Google and Google DeepMind as their most intelligent AI. The biggest thing, like we talked about, it focuses on built-in thinking or using that chain of thought. This is huge. This is huge for reasoning, coding, context handling. I mean, there's a lot of new capabilities and it does change what's possible for businesses. All right, so how can you access Gemini 2.5?

pro. Well, like I said, over the weekend, Google just with a tweet just said, Oh, by the way, we're making this for free. Uh, so, uh, it is available for free, uh, to Gemini app users. So if you're using this inside the kind of Gemini chat, so Gemini.google.com, right. Uh, especially if you're on a paid account, you do, uh, have the option to turn off model training. So, you know, you don't have to worry about the data that you share being used to train Google's model. So, uh, on the

front end, you can access Google Gemini that way. You can also access it for free in Google AI studio, which is a kind of a more experimental version and more of a sandbox.

And I'm glad that Google has shifted their strategy after I like, I don't know, I feel I like did so many rants in 2023 and 2024 because Google for like a year kind of quote unquote hid their most powerful and capable models inside Google AI Studio, which is more for developers. And then they didn't even label or tell you what was powering their Gemini chatbots. You had no clue, but usually it was running a model that was up to six months old.

So not anymore. I love Google's new strategy here. Put the newest, the latest, the greatest model inside the front end Google Gemini chatbot, but you can still use Gemini 2.5 inside Google's AI studio. And that is where you're going to be able to get that full 1 million contacts. Just keep in mind in Google studio, Google's AI studio, it is free, but there's no data protection on that end. So yeah, don't go and put, you know, confidential performance

proprietary company data inside Google AI studio. It is more of a sandbox. Also the enterprise path, it will be coming soon to Google cloud vertex AI in the coming weeks, right? So there's technically, I know it's a little confusing, right? There's so many different ways that you can access Google and Google Gemini, you know, as well as inside their apps, right? So they didn't say yet

As an example, if their Gmail Gemini integration has been upgraded to 2.5, I'm not sure, but at least for right now, you can go access it, gemini.google.com, even for free. If you have a paid account, you have higher limits, as well as you can access it for free inside Google's AI Studio. And it will be coming soon, kind of across Google's family of products via the Google Cloud Vertex AI.

All right, let's talk a little bit about the reasoning. So it does have that built in chain of thought like we talked about. So what the heck does that mean? Well, it kind of plans steps internally before it gives you an answer. And the cool thing is, all right, you can click that show thinking.

I don't know why I feel a lot of people don't read that. At least people that I talk to, I highly encourage you. If you want to get better outputs out of any large language model that shows it's kind of chain of thought, uh, you should be reading that, right? Uh, because you'll see what happens a lot of times.

especially with these kind of hybrid models that can reason, they take a little bit longer, which is okay, right? Because outputs in general are exponentially better, more accurate, more robust, more complex, much better. However, it does take a little longer. So what I always do while I'm quote unquote waiting, right? It might be 10 seconds. It might be two minutes, depending on how complex of a query you're giving the model, read the chain of thought.

always read it, right? If you want to, you know, be, be future proof in your job, right? If you want to be the smartest person in AI and in your department, read the chain of thought and go ahead and accordingly make updates to how you use that model, how you use the prompt, right? Um, all the thinking models work a little bit differently, right? So you have Claude, uh, three, seven sonnet. Uh, it's a hybrid model with thinking, although if I'm being honest, I think that was more of a marketing thing because you still have to click the extended thinking. Uh,

Anyways, you also have OpenAI's, all their models, their 01, 01 Pro, 03 Mini, 03 Mini High, 03 Mini Pro should be coming out via the API soon, right? So always, no matter what thinking model you're working with or reasoning model or hybrid model, look at the chain of thought, see what's going right, see what's going wrong. I always tell people, have a conversation, re-prompt to get better results.

Also, you got to talk. Hey, one, one, maybe we'll do a dedicated show on this. I don't know. Livestream audience. Let me know if you want to know more about this. Humanity's last exam benchmark. It is a newer benchmark put together by, I think they said hundreds of subject matter experts. Essentially, it's a benchmark that in theory shouldn't be in any training data yet.

Uh, so it did get the new Gemini 2.5 gotten 18.8% on the score, which you might think like, Oh, 18% of a hundred, uh, AI is dumb. All right. Humans. I doubt any human out there listening. Any single human could get a 1% on this humanities last exam.

Let's be honest, right? But the previous high score was OpenAI's GPT 4.5, which got a 14%. Anthropic CLAUDS 3.7 got an 8.9%. I believe DeepSeek was shortly there behind in the mid 8%. So yeah,

Gemini 2.5 in 18.8%. So, hey, in terms of it being able to solve and tackle very complex problems that the single smartest human in the world could never solve, right? You'd have to get hundreds of people working together to be able to make a dent in this humanities last exam. You know, Gemini did a great job. Also, it excels at complex logic and math without needing external tools.

Uh, so we did talk a little bit about some of these benchmarks, but like I said, uh, right away out of the gate in the LM arena, which is just human preference, uh, you know, being 39 points above the last, uh, or the next best model, super impressive. Uh, some other top math and science scores on the aim 2025, uh, got an 86%, uh, and the GP Q a diamond science. Uh, it got an 84%, uh,

very impressive. And then the, uh, in 81% on the M M M U, which is the multimodal equivalent of the old standard of AI testing, which is the M M L U. And y'all, even though this, even though a Gemini 2.5 has only been out for a couple of days, they've already made multiple updates to it since, uh, all right. So since launch, uh, so a couple of things, number one, they made it

open and available for free users. So that was not available at the time of launch. It was only available to paid users inside Gemini.Google.com. So on the front end chat bot. So now it's available to all free users. Also just hours ago, y'all.

This is why sometimes I don't sleep and why I don't always, you know, sometimes I'll do prerecorded shows, but literally just hours ago, CEO Sundar Pichai just did kind of announce that the new Canvas mode is,

is available in 2.5 Pro. So I did use it actually while planning this show, going through my notes and having it put together kind of some interactive elements to help me better learn and understand what was new. So there are some things in the Canvas mode that worked very, very well. There were some things that were buggy, if I'm being honest, right? It is experimental. Keep that in mind. Also,

They added support for third-party tools like Cursor AI. That's huge. So we should see because Claude, Anthropix Claude has been making a living, right? Essentially by being the coding LLM of choice by the top software developers, by the top

So we'll see. I do see that potentially changing, especially when you look at API costs. The cloud models are rather expensive and the Gemini models aren't. So we'll see what happens. And if Anthropic still is kind of the de facto model chosen for software developers. Also,

Sundar Pichai hinted at future MCP support for Google Gemini. Pretty big news. So that's model context protocol. I know we're going to do a dedicated MCP show soon, but essentially, you know, if you've been seeing this little acronym floating around and you're like, what the heck is it? Right. So essentially you have APIs, right? So in the SaaS world and the software world, right? APIs is essentially a language that softwares can talk to each other.

APIs can sometimes kind of work for AI tools in large language models, but not necessarily. So they're a little different. So this model context protocol was actually developed by Anthropic, but it's being used now and supported by just about everyone. OpenAI last week announced support for it. And so Google and Google Gemini may support MCP as well, which is essentially, I like to think of it, it's a little more complex than this,

Think of it as the API for large language models. It allows different AI systems and different large language models to talk to each other and to talk to other APIs and to other softwares. All right, next.

the coding abilities. All right. So we already talked a little bit about this is one of the more unique or at least one of the angles that Google is taking with Gemini, really pushing and promoting its proficiency in coding. So very impressive. So, you know, they put some demos out there, but also in agentic coding.

Uh, you know, it scored a 63.8% on that sweet bench, uh, which, you know, when it comes to agentic coding, uh, I do think that is the benchmark to look at. Um, you know, go, go play with it. Right. And the good thing is, is now you have that canvas mode inside Google Gemini 2.5.

So you can literally go code anything you can think of with natural language. Code me this, build me this, right? And you can render it or run it in the new Canvas mode. So it's a little different than OpenAI's Canvas mode, which I think is more of like a Google Docs-esque

collaborative environment. You can run certain coding languages inside OpenAI's version of Canvas or ChatGPT's version of Canvas. But I'd say in my limited testing of Canvas so far, which came out a couple of weeks ago, I'd say it is more like...

It is more like Anthropix artifacts feature in terms of it can render and run a lot more languages, right? And go have fun, right? This whole vibe coding thing, right? It's been kind of this trending topic.

go vibe code yourself something, see if you can. Right. And then if you can get it to run inside canvas, then that means, okay, it's working and you could go, uh, you know, deploy it somewhere else, whether you need to, you know, have it running on a, a full stack, uh, kind of app, you know, uh, running on some service online or whether you would run it on your desktop, whether you might, you know, as, as a Chrome extension, et cetera. Right. I one-shotted, which was pretty fun. Uh,

And I shared it in the newsletter yesterday. I don't know if anyone saw it. I did a little simple Chicago inspired game, right? A little side runner to the very early Nintendo-esque type game. But just one shot. I said, hey, do it like this. You know,

working in all these Chicago elements, you know, hot dogs and pizza and potholes, right? You know, make it kind of, you know, bring in elements that I like from Super Mario, right? I knew one shot and it worked. Very amazing, right? So like I said, instantly from a coding software development, you know, we're going to put it through some more testing and maybe we'll do that in part two if that's something you want to see, but very proficient in coding.

Next, we have to talk about the multimodality and the context window. So Gemini in their 2.0 versions, everything is multimodal by default.

So what that means is it understands not just text, but it understands images, audio, video, code inputs, or a mixture of all of those things together, which is pretty amazing. So we are getting that as well and getting close to that from big models like Anthropic Cloud and Vue.

open a eyes chat GPT, but not quite there yet specifically with video, right? Uh, that's kind of a different modality, uh, for, uh, at least open a eyes, uh, chat GPT, uh, Claude, I don't think it's really going to play too heavily in the multimodal by default space. Uh, although I feel they should, right. I, I think Claude was really hoping they could carve out their niche, uh, just with software, just, just with coding. Uh, but you know, Google's like, Hey, hold, hold my, uh, hold my MCP. Uh,

I mean, the 1 million token context window, amazing. Like I said, we're going to be putting that to the test inside of the Google Gemini front end. I do know and have done some testing on the back end of AI Studio context window. Super impressive. Also, Google did announce that they're planning soon for a 2 million token context window. I mean, that's...

wild, right? So one thing I'm going to probably do is get together transcripts, right? Like I have almost 500 episodes of the everyday AI show. That's

thousands of pages of transcripts so that's probably something i'll do for a test upload everything uh but y'all like as we get to multiple million token context windows uh i should have put this in my you know ai 2025 you know roadmap series so you know if you haven't listened to that make sure you go on our website and go listen to free uh for free to those there's a five-part series uh

I don't know. I think that, you know, rag is going to become a little less important in 2025 and 2026. I'm not saying it's not going to be needed. It's still going to be needed, right?

But I think so many, especially smaller companies in small use cases, you know, they heard this rag terminology, you know, really in late 2023 and 2024. And everyone's like, oh, I need to build, you know, retrieve along many generation. Right. But OK, what if you don't have a ton of data? Right. What if you don't actually have a ton of files and it's not a lot? Right. You might just be able to work in that two million token context window. So, you know, the context window is actually extremely important to the future of AI development.

All right. Let's talk a little bit about some of the early feedback. So like I said, we've shared about this in our newsletter, but there's been some very impressive one-shot generations, people building video games, precise image analysis, 3D simulations, extremely impressive. And we'll be doing some of those in our part two of this series. Audio skills, being able to

instantly get accurate transcriptions uh very impressive uh and just positive right just positive uh you know people are always like vibes right the vibes on 2.5 uh gemini 2.5 pretty pros uh positive uh so far uh let's look at the market impact uh so google right now uh

they're trying to be the leader in thinking models, right? Um, they're kind of beating open AI to the hybrid punch. Like I said, technically entropic was first with Claude 3.7 sonnet, but I don't know. Uh, I, I actually talked to a couple of people of this at the, uh,

a couple of people about this at the NVIDIA conference at GTC in the few, you know, two minutes free time I had between like the 15 interviews I did out there. We still do have like one or two more shows dropping from GTC, by the way. A lot of people were like, you know, I'm like, hey,

what do you think about this new hybrid approach from Claude? And they're like, oh, is it really hybrid, right? You technically have to click if you want the extended thinking or not. But with this, I think Google is in the driver's seat, at least right now, when it comes to this new hybrid model approach, which we also heard from OpenAI is going to be their approach moving forward as well. So they've said when we get GPT-5, it's going to be more of a system, right? And you're not

necessarily going to be able to choose which model that you use, which some people might like, right? If you look at OpenAI's chat GPT and, you know, in my pro account, I think I have nine different models to choose from. Some people might be intimidated by that. So, you know, at least the

GPT-5 is going to be more of an architecture that's going to kind of use this mixture of models or mixture of experts or using kind of traditional, you know, quote unquote, old school transformer models and hybrid or, you know, these reasoning and thinking models. But Google with this, they're the leader in it right now. I don't think Anthropic did a good job with it.

If I'm being honest, I don't. I think a lot of people were not super impressed with Sonnet 3.7. I know a lot of people defaulted back to Sonnet 3.5. They weren't very impressed and they didn't feel they had kind of enough control, right? At least on the front end users, that's what we're talking about, not on the back end. But I mean, this play right here. So aside from being a leader in terms of the thinking model, also, I mean,

with the enterprise game, right? So this isn't released for Vertex AI yet, which is probably a good idea, right? Because it is buggy. I should probably say this, right? I will say when OpenAI releases a model, at least this is in my personal experience, I'm using, you know, the main models every single day, multiple hours a day. The OpenAI

open AIs models when they're released yes they're they're throttled they may go down right Gemini has better availability uh right you know a lot of times if especially if you're on a free plan or you know the basic $20 a month plan with chat gpt and a new model comes out uh you know it might be very slow or availability might be impacted but when it is there it works fairly

well, I will say. So I will say that Gemini 2.5, although it's not, you know, there's no slowdowns, there's no real outages, the availability's there, it has been a little buggy, right? So the canvas mode, although it's only been out for a

The same thing with just the general Gemini 2.5. It's been a little buggy, but it's experimental, right? I always run a series of tests and sometimes we were getting not, I wouldn't say hallucinations, but some misdirections, right? One thing I always do to test its internet capabilities, I say, hey, what's the latest episode of the Everyday AI podcast by Jordan Wilson? So I see, okay, is it

actually able to navigate to the web and find the latest episode. And instead it gave me the weather, right? The weather was accurate, but that's not what I asked for. So, uh, you know, hit or miss so far, but I think once Google, uh, irons out some of those things, it's going to, uh, you know, be, be a very, uh, impressive and reliable model. But I think that's,

honestly why they haven't really released it for Vertex AI yet. Right. So that's when you can, you know, when you'll start seeing it deployed at scale, you know, across many large enterprise organizations. But I do think that Google is taking more of a tiered approach and making sure individual users, people kind of using their sandbox in AI studio have a good experience. They're going to want to squash some of those bugs before they release it to the masses.

All right. And then kind of last but not least, and hey, live stream audience, thanks for sticking with me. We are going to have a part two. If you have any questions, get them in now. I'm going to scroll through, see if I can answer any. But last but not least, we have to look at the future outlook and updates.

We are going to see some pricing updates probably soon for the API because I do believe there's going to be some heavy usage. Google has said that they're working on enhancing the reasoning and coding even further. So there will be some under the hood updates. That's another important thing to think about, right? So even though we saw this jump from Gemini 2.0 to Gemini 2.5, that

doesn't mean that Gemini 2.5 won't be updated until we get something like Gemini 3, right? Yeah, you have to kind of keep up with sources such as Everyday AI, right? To see when some of these more under the hood model updates come out. But I do see it, I think they're gonna squash some of these bugs, make some improvements, but the biggest thing is the ecosystem, right? I'll be interested to see when and if Google announces

if Gemini 2.5 Pro or Gemini 2.5 is going to be rolled out with deeper integration into its ecosystem. So that means, right, and I expect it better in deeper integration across, you know, Google Sheets, Google Drive, Gmail, Docs, et cetera. I would love to see if,

We're going to get 2.5 in Notebook LM, in Google Gems, which is kind of their version of, you know, GPTs, right? Creating kind of this personalized version of Google Gemini. Also, you got to get ready for the clap back now, y'all. That I think I'm going to end on, aside from your questions, because here's what happens. Anytime a model like this comes out and it is met with fanfare, and by fanfare, I mean a combination of

Traditional benchmarks, Google Gemini's got it. Human preference, they got it in the ELO. And then just overall vibes, like I said, overall people are loving the Gemini 2.5, but no one's talking about it. No one's talking about it because everyone's on, you know, OpenAI's new 4.0 image gen creating Ghibli studio pictures of their family, right? And don't get me wrong. That is actually, I'm more impressed

by, you know, if I had to compare the two, even though there are two unrelated things, I'm more impressed with the update from OpenAI actually, because it is really driving the multimodal conversation. And, you know, even though we did get this multimodal kind of by default with Gemini 2.0 a couple of weeks ago, being able to create and work with images in line, being able to edit with them, I think the execution was just much better with the 4.0 image update from OpenAI, but

But from a pure large language model standpoint, Gemini 2.5 just getting completely overlooked, extremely powerful. So we're going to be...

We're going to be testing this in a part two. So make sure, and if you are listening on the podcast, thank you. You can always reach out to me or just respond when you sign up for the newsletter. Tell me what you want to see in our part two. How do you want to see us put Gemini 2.5 to the test? What use cases, demos do you want to see us run? Big bogey face here asking, let's test its coding skills. We can definitely do that, right?

uh kabari uh asking on youtube in terms of capabilities on a scale of one to 100 where is ai now oh that's a good question i don't know uh if we're talking about gemini 2.5 i mean you have to say it's in the 90s right and it's ahead of everyone else if you're asking in terms of the ai uh you know um uh

As a whole, I don't know, right? Because that 100 or the ceiling is constantly being raised, right? Again, if you would have told people two years ago that we would have models this capable, this powerful available for free, I think you'd say no.

Like, oh, that's not possible, but it is. Here we are. So the ceiling keeps getting raised. Denny asking, what about content creation for writing needs? Most of what was mentioned are video or tech kinds of needs. So I will say this, Denny, great question. And maybe that's something we can test as a use case, just kind of creative writing. But I do think that Gemini has always had a nice knack

for creative writing, right? I think ultimately with proper prompt engineering, open AI has always been best. But if you're talking about zero shotting and trying to get some good kind of creative writing, I think people have always preferred Claude. I think

Gemini is, is right there in terms of, you know, uh, what you can get out of the box with just, you know, Hey, here's five examples. Go mimic, uh, go mimic this. I think Google Gemini is actually great for that. And that's something I did do some, uh,

some testing on a little bit. Uh, Jose's asking when did the live stream start? Yeah, they start seven 30 Chicago time. So yeah, if you're on the podcast, if you didn't know, this is unedited, unscripted. You can come in here, hang out, network, ask questions. Uh, we try to tackle everything as best, uh, as we can. Uh, all right, y'all. So,

I hope this was helpful. Got a couple of questions, couple of comments in here at the end. Again, we're going to have a part two where we're going to be breaking all of this down, go over use cases, do some things live, really push it to its limits. So make sure you join us for that and let me know what you want to see, what you want to hear. So thank you so much for tuning in. If you haven't already,

Go sign up for that free daily newsletter at youreverydayai.com. We're going to be recapping the highlights and what you need to know from today's live stream podcast. If you didn't catch everything, don't worry. It's going to be in there as well as everything you

everything else you need to get ahead, to grow your company and career with generative AI. So if this was helpful, please subscribe to the podcast. Please leave us a rating. I'd appreciate that. I'd also appreciate, it always makes me smile a little bit. You know, if you are listening on LinkedIn, click that repost button. If this was helpful, we spend so many hours cutting through the BS, bringing you on a

you know, hopefully unbiased and just real information to help you make better decisions on your AI strategy and implementation. So if you could repost this, if it was helpful, I'd appreciate that. So thank you for tuning in. I hope to see you back tomorrow and every day for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 494: Gemini 2.5 Pro Unlocked: Inside the world’s most powerful AI model 47:54 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

EP 494: Gemini 2.5 Pro Unlocked: Inside the world’s most powerful AI model