We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Grok 3: The New Face of AI Competition

2025/4/19

AI Education

AI Deep Dive AI Chapters Transcript

People

Jaeden Schafer

Topics

我参与了 Grok 3 发布直播的观看，并对它在多个基准测试中超越 ChatGPT 等竞争对手感到兴奋。Grok 3 在数学、科学和编码方面都表现出色，这要归功于其创新的训练方法和强大的计算能力。虽然我在实际使用中发现它在某些方面存在不足，例如对汽车配件信息的错误判断，但这并不影响我对其整体性能的肯定。Grok 3 的训练过程耗费巨大，使用了 20 万个 GPU，并克服了电力供应和冷却等诸多挑战。更重要的是，Elon Musk 宣布 Grok 2 将开源，这将对整个 AI 行业产生深远影响，并可能促使其他公司效仿，从而推动 AI 技术的普及和发展。这一举动也可能给 OpenAI 带来压力，促使其重新考虑其商业模式和开源策略。总而言之，Grok 3 的发布标志着 AI 竞争进入了一个新的阶段，其强大的性能和开源策略将对未来 AI 技术的发展产生重大影响。

Deep Dive

Chapters

The new flagship model Grok 3 from XAI has launched, boasting impressive metrics that seemingly surpass ChatGPT and other models. The host discusses a personal experience testing Grok 3, highlighting both its strengths (like providing detailed information and understanding context) and weaknesses (inaccuracies in some responses). The episode then transitions to a deeper dive into Grok 3's capabilities and benchmarks.

Grok 3 launched with new metrics beating ChatGPT and other models.
Live demo shows both impressive capabilities and some inaccuracies.
The host tested Grok 3 for car part information with mixed results.

Shownotes Transcript

Translations:

中文

Welcome to the AI Chat Podcast. Today on the podcast, we're going to be talking about some breaking news out of Grok, aka XAI. And there's been a ton of beef going on between OpenAI and Elon Musk, Simon and Grok, and all that kind of stuff. And today, the new flagship model, Grok 3, has just launched. Actually, last night, I had to stay up, watch the live stream. It was actually pretty interesting, and they unveiled a bunch of new metrics that pretty much have Grok 3 beating ChatGPT.

and every other model, not by an insane leap, but by some significant numbers. I'll be breaking down all of that and also showing you a live demo because I have XAI premium or whatever that you need to have Grok 3. So I'll be breaking down

All of that. Before we get into the episode, I wanted to mention, if you've ever wanted to start an online business or use AI tools to grow and scale your current company, I have a exclusive school community called AI Hustle where every single week I record videos.

videos that I don't post anywhere else that essentially show you the AI tools that I'm using to grow and scale my companies and different side hustles that I'm doing. And my co-host Jamie has made over $25,000 last year doing a side hustle with Amazon. He's using AI this year to scale that up. We break that down along with dozens of other videos in a classroom section, over 300 members that all talk and kind of share their ideas. So I'd love to have you as a member of the community. It's $19 a month.

In the past, we had it at 100, so it's at a discount right now, and if you lock that in, it won't be raised on you. But the link is in the description if you want to check that out, and I'd love to help you take your business to the next level using AI. All right, let's get into the episode. So what's happening with Grok? This is, of course, the latest flagship model. They did this whole live stream last night, which, okay, this is just a side tangent, but whenever they do these live streams, they always say they start at a specific time. I've just noticed with Elon Musk and all of his companies, and...

Literally, for the Tesla live stream, I know they had some sort of issues, but I waited on the live stream for 50 minutes before it actually started. And for this one, I think I only had to wait an extra 20 minutes beyond when they actually said it was going to start, but it always drives me crazy. I will say maybe it's a good marketing thing because the number of viewers on the stream was 100,000 and then 200,000 and 400,000. And 20 minutes in, there's a million people watching this live stream that hasn't even

started yet. So I guess maybe there's a marketing strategy involved in all of that. But anyways, that's my only criticism of the whole thing. So this thing is really impressive. A bunch of new capabilities. The big one that I think they pushed back a little bit for was to get reasoning and some of these like deep learning models. And of course, when DeepSeek came out and totally swamped the whole field like a

OpenAI and Google Gemini both really quickly within like two weeks released their own kind of updates to the reasoning models and their deep research models. And so Grok obviously couldn't launch without that with all the other top players. So they've actually launched that as well, which has been pretty interesting. Now, if you go over to grok.com or on the mobile app, these are the two places where it's updated first, you will literally see

a drop down where you can switch to Grok three. This is their latest model and they have something called Think. They're going to be releasing their deep research, I believe, later. But I actually was testing out Grok three today and I will admit I had mixed results with it. Now, I'll show you if I have my history here, I'll show you it does some pretty impressive things. So I was like, I got to test this out. I was going to Walmart this morning and getting some new stuff for my car because

My wife got pulled over last night because one of our brake lights, I guess, on our truck is out. And so, you know, traumatic experience for her. It's actually the first time she's been pulled over in her life, which is hilarious because she's driving my truck. So, you know, go figure. So I was testing it out and I was asking it, what kind of, you know, what kind of like blades do I need for my windshield wipers on my truck?

And it told me, and I was like, I'm just going to trust whatever it says. And I'm going to go. I was like, it said for 2006 Toyota Tundra, you'll need 19 inch windshield wiper blades. All right. I took it at faith. And guess what? It lied to me. I bought them. I came back to the car. They were way too short. I needed 26 inches. So this thing was definitely off on that. And the annoying thing was I was like, when I was in the store, so I actually also asked it for like...

like the bulb that I needed. I was like, what type of brake light bulb do I need? And it gave me one. And when I was in the store, I was kind of doubting it because there's just like this random bulb that says 7443 on it. And I'm like, I don't know. So I googled it and it was right. And I'm like, okay, it was right about the bulb is probably right about the blades. Oh, man, I picked the wrong one to verify on Google with. So it turns out, I needed to go back and get different blades. Okay, the thing that I did think was quite impressive with this, though,

First of all, I'm like, okay, I have this truck. What kind of blades do I need? And then I just said, what type of brake lights do I need? It automatically jumped to the assumption. It's like, I'm assuming you're talking about, you know, the same truck that you just referred to. So this is what you need. It tells me the bulb type. It also tells me what wattage and voltage I needed to look for, which was fantastic.

pretty useful. Then also told me, look, you're going to probably want to get two of these because the passenger one also needs it. It also goes and tells me common brands, which was useful because I think I actually ended up buying a Sylvania and I like knew because of what it was saying that it was probably the right one. So anyways, then it goes and gives me a bunch of other information on like, if you want to replace it, these are all the steps you want to replace it. This was cool because these weren't questions I was even asking. I was just like, what

what bulb do I need it? You know, guess what kind of truck because of my last question. And then it's like, and if you're doing this, you probably want to change it. Here's the steps to change it. So like,

I guess I could have changed my prompt to be like, only tell me the name of the bulb, no other information. And I probably could have got a faster response. But like for me, as someone that was actually using it, this was very useful to get kind of all this additional details in it. And by the way, if you're just listening on Apple, I'm explaining everything. But if you're on Spotify or YouTube, I'm breaking down this and sharing my screen with a video to break it all down. So anyways, I wanted to give you the antidote of me actually testing this thing out so you know.

Um, one other thing that I did with it, I tried to like image upload and help in Walmart. And there's these two different, uh, windshield wipers. And I never know if I'm getting scammed by these companies, anything in auto mechanics. I swear there's like a gimmick. So there's two kinds of windshield wipers from the same brand. One was like 15 bucks, almost seven bucks. One was $10.

optimum plus. And I'm like, is there any difference between these in reality? And Grok was a pretty good salesman and told me that apparently the op that one of them has equal pressure throughout the whole blade and it's less likely to get wrecked and blah, blah, blah, blah. So I end up buying the more expensive one because Grok told me it was good. But I mean, it's kind of useful. It was cool that I was able to just like snap a photo while in Walmart on my phone, have it upload. And then like it actually was really quick. And I don't think the internet was like super fast at Walmart. So anyway,

Anyways, this was my actual use case test of Grok. Let's go into what the updates were and why I think this is an impressive thing. You guys are probably all sick of like, okay, that's enough of your like stupid trip to Walmart for your car. What is this thing actually capable of? So I'll break all of this down for you. So.

The first thing that was really impressive about this is how they actually trained this. They said they wanted to start with first principles, which is something that XAI has been really good at. And they essentially went to like, they're like, hey, we got to build like a facility to have enough GPUs to train this AI model. So they go to all of these people that could build

facilities like how long is going to take you to like build us a data center all these data center companies and they're like yeah we can build you a data center it's gonna take us about 24 months and they're like okay well we'll be screwed because in 24 months we definitely like that's two years if that's how long it takes us do the data center then we got to train on top of that like where is chachi bt going to be in two years they'd be completely smoked so they said screw it we're just going to buy a pre-built factory so this wasn't something built for a data center they literally went and found i think like an a

they said they had to find a factory that was like new enough that it was still good, but someone just went out of business and it's kind of hard. Anyways, they found some like electrical company that just went out of business or moved locations. They grabbed their factory. It wasn't big enough. So I think they actually had to add on to it, but they grabbed it and they were doing every hack in the book, like trying to essentially, um,

hack and get this thing built faster. So they, they, the first thing that they did was they wanted to put a hundred thousand GPUs. Everyone said this was impossible with some engineering feats. They did some crazy stuff. They, they were able to attach a hundred thousand GPUs and halfway through the training, they were able, I think that took them about 120 days, like three months, halfway through the training, they added another hundred thousand GPUs, um, which was like

It took them another 90 days. So really, like three to six months had this entire thing up and running. And people are like, how the heck did you get this like factory that was not built for data centers? Because data centers are notorious for a bunch of different reasons. Number one, absolute power hogs, like 200,000 GPUs. And when you think of a GPU, this isn't the little one in your computer. Like you're talking about a brick, like just this massive thing.

hunking thing. And 200,000 of those is a complete power hog. In addition, cooling that many GPUs is insanity. So what they said they ended up doing was they didn't have enough power from the grid. They were getting it hooked up. And in the meantime, they just bought thousands and thousands of generators and lined them up on an entire side of the factory, had all of these generators going. On the other side of the factory, they said they literally purchased 25%

of the entire United States remote cooling capacity. So pretty much like these trucks, all of these have to be liquid cooled. So they're like waters going through pipes and circulating through pretty much. And so people have like trucks to do like liquid cooling for like big events or concerts or like things.

But there's like not that many of those. They literally had to get 25% of the capacity of the entire United States. I'm sure that was probably a great business to be in to cool this entire thing. They said they had so many problems with like one cable would be disconnected because what they did that was different was they actually connected all 200,000 GPUs together. And they had to make the redundancy so that if one cable got pulled out of one or there's an issue with one, then all the rest of them would keep working. There's a lot of like really impressive things that they were able to pull off for this.

All of this to say Grok 3, this current model that they have, was trained on 10 times more compute than Grok 2, the earlier one. And it might be like one of the biggest most compute used in any AI model. So what was the output? And I know like you heard at the beginning, I'm like complaining that it told me the wrong length of my windshield wipers or something like that, which Chagipati or any other model probably could also do that. And I'll go and like research...

why that was. But like overall, the other questions I've asked this and tested out, it's very thorough. It's very in-depth. It shows you its reasoning processes. It could do a lot of really impressive things. So what are the benchmarks? How did this actually perform?

For the math benchmark, math aim 24, it scored 52. Grok Mini, which is like their smaller version, was 40. This is like really impressive. The only model that got close was Claude at like 39. And that's like...

still worse than their mini model. It's still completely beating GPT 4.0, which scored, I think, the worst. And Deep Seek. So Gemini, I guess, also did fairly good. So anyways, they completely beat everyone on the math one by like a long shot, 52. So then Science, they scored 75, the next runner up.

was 65. A bunch of models scored 65. So they're solid 10 ahead. And then when it comes to coding, they completely crushed again, scoring 52 on coding. And the next best model that wasn't from Grok was like 40. So they really, really crushed it on math, science and coding. And it seems like ChatGPT, I've heard from a lot of people is notoriously like

sort of struggles in this area. Claude does really good. Most of the developers I talk to use Claude, even though they haven't come up with an update in like forever because they just say it's better at coding. And so sometimes you find these use cases that these models had better training data or were trained better or fine-tuned better on. And it seems like Grok might be the winner now with code. So in the live stream I watched last night, they literally had it. They said like,

build a game for us that's a cross between Bejeweled and Tetris, and it literally, like, wrote all the code, they ran it, and it was an actual functioning game where you had, like, these Tetris blocks that were, each thing was a different color, and if you got three of them in a row, it would, like, Bejeweled, it would, like, destroy the line or the blocks or whatever, so it was interesting, and it was able to spit it out pretty quick, so...

This was pretty impressive. Reasoning, test time compute, it crushed it. And you essentially are able to tell it, think longer about this prompt. And if you just put that in your prompt, there's a button that you can also do. If you tell it to think longer, it bumps up its response from like,

78 to 93. So if you tell it like use more compute, think longer about this. And we kind of saw the same thing with chat GPT. They ran some of the similar experiments with similar results. But if you tell it to think longer and use more compute, it'll essentially try to solve the same problem like 10 or 15 or a hundred times. And then it's like, what is the average of all 100 times? So if I had said, what windshield blade do I need for my truck?

Think as long as you want about this. It probably, instead of just going and searching and grabbing like the first couple results from Google, it probably would have like looked at like a hundred results and then try to solve it a hundred times and then realize, oh, actually you need the 26 inch blade. So, I mean, maybe that's a user error on my fault. I need to, I need to ask, I need to tell it to do that. So,

The one thing that did not come with ChatGPT, because subscribers to the premium tier, which is like 50 bucks a month, get Grok 3 first. Although I think I'm paying like $17 a month. Maybe I'm grandfathered in because I've been paying it for a couple of years. But like the $17 a month tier, I'm getting Grok 3. But the one thing that did not release was...

So Elon Musk said the voice was a little spotty. It should come out in about a week, but that's where you can talk to it. And apparently just like opening eyes voice thing, which is phenomenal where it's like really dynamic and you're like talk really fast, like talk like you're running on a treadmill, talk like you're singing, like talk like you're yodeling. Like you can tell it to all these crazy stuff. The voice mode should be good, but that's not going to be coming soon.

for a little bit. In the next few weeks, Grok 3 model is also going to be available via their API, which I'm stoked about because I can then integrate it into AI Box, my software startup. So a lot of really cool things. The biggest thing, okay, this is the biggest W of the entire night. And that is Elon Musk said, once Grok 3 is fully rolled out and everyone can use it, everyone's like, what happens to Grok 2? Because they took like a Q&A on Twitter and people responding. And he said,

The older version, once the new one is fully rolled out, the older version will get completely open sourced so anyone can use it. This is amazing. And I think OpenAI could solve all of their controversial problems of going from a nonprofit to a for-profit and everyone hating them and stuff if they did this. Sam Altman was like, he did like a, he did a sneaky poll in my opinion on Twitter yesterday where he was like, what do you guys want? Do you want like the best O3 model open sourced? Yeah.

because they're about to come up with a new model? Or do you guys want us to make the best phone model we possibly can? And like the way he phrased it, everyone, I even said like, oh, I want the best phone model because I was like, oh, this would be cool to have like an open source on my phone. But what I'm realizing is people can take the best model and make phone model versions for like, we could do that after. Really what we want is the best model possible that's open sourced, which they're not going to do their flagship model because that's how they make their money, but they could do their older model because like,

Now that Grok 3 is out as a consumer, I'm never going to go to my XAI app and just choose to use the older model. I'm always going to try to use Grok.

three, but Grok two is still capable of doing a lot of things. And for developers, um, saves a ton of money. If you can open source it, not have to pay their API fees and host it yourself or run it locally on your own computer. Super, super cool. So I think, um, the biggest win of this entire announcement, other than, okay, they made a model that beat everyone in the benchmarks. That is cool. But I think the biggest win is that they're saying they're going to set a precedent where the older model will always be open source. They're just giving that for free to everyone, to the public. So

That was really cool. I would love to see opening. I do that since that was the purpose of their company was to be an open source AI company and now they're closed source. Um,

I would love to see them follow suit. And I think that this will put some pressure on them to potentially do that. You already see Sam Altman kind of like talking about it. And I think if this becomes precedent for Grok, they'll essentially be forced to, which I'd be thrilled about. Overall, super excited for everything happening. I'll keep you updated on all the latest news going on with XAI. Thanks so much for tuning into the podcast. And again, if you want to grow and scale your current business or side hustle using AI tools, make

Make sure to check out the link in the description to the AI Hustle School community. Thanks so much for tuning in and I'll catch you next time.

Grok 3: The New Face of AI Competition 16:45 Share

AI Education

Deep Dive

Shownotes Transcript

Grok 3: The New Face of AI Competition