We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode OpenAI’s New Model, Jensen’s Bold Claim, Alexa+ Is Here

OpenAI’s New Model, Jensen’s Bold Claim, Alexa+ Is Here

2025/2/28
logo of podcast Big Technology Podcast

Big Technology Podcast

AI Chapters Transcript

Shownotes Transcript

Let's break down what the release of GPT-4.5 means for OpenAI and the future of generative AI. Plus, Anthropic also has a new model, NVIDIA CEO Jensen Wang makes a bold claim, and Amazon introduces a better version of Alexa. That's coming up right after this.

Hi, I'm Jonathan Fields. Tune into my podcast for conversations about the sweet spot between work, meaning, and joy. And also listen to other people's questions about how to get the most out of that thing we call work. Check out Spark wherever you enjoy podcasts.

Welcome to Big Technology Podcast Friday edition, where we break down the news in our traditional cool-headed and nuanced format. We have so much to talk you through this week. It feels like this week, among many crazy weeks, has been one of the craziest. We have a new model from OpenAI, a new model from Anthropic, a new Alexa, NVIDIA earnings, and Skype is dead. So...

It was a very, very promising week for a lot of companies, but not for Skype, which will forever live in our memory. So we will say goodbye to Skype at the end of the show. But in the meantime, joining us as always on Friday is Ranjan Roy of Margins. Ranjan, great to see you. Happy new model week, Alex. How have all these new models changed your life as of today, February 28th?

Not at all. But we will talk about whether that will matter in the long term, because of course, I put your is it the model or is it the product question to OpenAI head of research, Mark Chen. We talked about it. And now you're going to get a chance to respond. But first, let's just break down the news because...

Yesterday, we had the release, of course, of GPT 4.5 OpenAI for the first time ever put a spokesperson on this podcast. We broke the news here with Mark. And now we're going to analyze what it means because we sort of left the fog of war and we have some perspective on whether this is disappointing for OpenAI, whether this is promising for OpenAI, and whether this means something.

that generative AI can continue to progress or not, now that we've seen some more reactions outside of Mark Chen saying, yes, scaling is still alive. So this is from The Verge. OpenAI announces GPT-4.5. GPT-4.5 is the largest and newest large language model from OpenAI. It's going to be available as a research preview for ChatGPT Pro users to start.

And here's like a weird thing, though, that happened. There was some documentation. We're going to get right to it right away. There was some documentation that OpenAI released about this model and then removed. And it's very mysterious. They said GPT 4.5 is not a frontier model, but it is OpenAI's largest LLM.

Improving on GPT-4's computational efficiency by more than 10x. It does not introduce seven net new frontier capabilities compared to previous reasoning releases. And its performance is below that of 01, 03 mini and deep research and most preparedness evaluations. OpenAI has since removed this mention from an updated version of the document. So they did remove it. I don't think they disputed it, though.

And I found what was interesting was, yes, this was a step change improvement over GPT-4. It was not over the reasoning models. So you would think maybe you could build reasoning on top of this. We're going to talk about that in a moment and it will be even better. But for the meantime, OpenAI has a new model that does not exceed the reasoning models in certain benchmarks and seem to admit that in a document. So Ranjan, you've been following along this whole way. What do you think the implications of this are?

This week had me thinking, I feel with iPhone releases in recent times, a lot of us have been saying, do we really need a big event to release every new iPhone? Certainly the 16E was not exactly the iPhone launches of yesteryear. I'm starting to feel like that with all of these large language model releases, Cloud 3.7, GPT 4.5.

Even as you're listing out all of the kind of release notes around this and then there's some ingredients that are not listed or these things are removed from the actual release documentation,

It's not that exciting. It's not exciting enough to have to try to launch a live stream and get everyone hyped up around it. GPT 4.5, and we're gonna get into there's some elements of emotional intelligence or emotional quotient that are around it. Perhaps creative writing is a little bit better. Perhaps there's a bit of computational efficiency introduced to it. Even Sonnet 3.7, and I was trying this,

Claude Code is a pretty big release and a pretty big step change, but it's not revolutionary. So I think a lot of these companies have gotten caught in this hamster wheel of needing to do these big model launches. And there was a time where the step change was so big that it was actually exciting for all of us.

But now I think 4.5 is probably the least interesting model release from OpenAI to date. Because even 01 and adding reasoning models to the overall suite was a pretty big deal. 4.5, I still cannot tell you what the big deal is. Maybe you can tell me. We are going to get some commentary

from Andre Carpathi about this that he put on Twitter yesterday, which does answer that point. But even from OpenAI itself, there was some very interesting communication, shall we say around this model. So Sam Altman came out with this tweet and he said GPT 4.5 is ready. The good news is the first model that feels like talking to a thoughtful person to me. I've had several moments where I sat back in my chair and I've been astonished at getting actually good advice from AI.

The bad news is it's giant, it's expensive. We really wanted to launch it to pro and plus users at the same time, but we've been growing a lot and are out of GPUs and we'll add tens of thousands of GPUs next week and roll it out to the plus tier then. And there's hundreds of thousands coming soon, so I'm pretty sure you'll be able to use it once we can rack up. This isn't how we want to operate, but it's hard to perfectly predict growth surges that lead to GPU shortages. Remember, ChatGPT has gone from...

100 million to 300 million users in a very short amount of time. Heads up, this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence and there's magic to it I haven't felt before. Really excited to have people try it. Look, it's very interesting because, again, we're going to go into my interview with Mark Chen very quickly. But Mark, I was like, you know, hey, listen, does this show that, you know, we're getting diminishing returns from scaling? And he said, absolutely not.

But then you have these endorsements from Altman and it's fairly muted. So it makes sense of that for me, Ranjan. The world's greatest product marketer cannot market his own product. I mean...

It's not as great at some things, but trust me, there's this magic which I felt, but I'm not going to actually tell you what that magic is. I think it kind of – actually, his statement really captures the overall feeling I have of 4.5. It just tries to put a positive slant on it. But I think that's exactly it. They have to keep pushing –

new models, new narratives pushing towards GPT-5 whenever, if and when that will come. But to me,

And actually, this is going to get back to our product versus model debate. They need to show more product. Again, operator deep research, those were exciting moments. 4.5, as any kind of announcement, is not incredibly interesting to me. You had asked Mark Chen during your interview, what are the new use cases? Or what are the use cases where this will be better at?

And I was actually sitting there waiting with bated breath, ready to hear, okay, this is how this is going to help me or other people. And there was a somewhat generalized answer around how

with creative writing tasks, whatever that might mean, this is better. And that was kind of it that I got out of the interview. So I think, and which lines up with the whole idea around emotional quotient, emotional intelligence, more creative writing, more thoughtful answers. And I've seen a lot of examples of

out there of 4.5 answering and being a bit funny and people saying, this is the first time AI has made me laugh. But if you're just trying to get a little bit more groggy with your model, I don't know. That doesn't seem like that's going to fill in that SoftBank valuation for me.

Yeah, look, I didn't find it groggy at all. So because, you know, as we've talked about on the show, but, you know, more on the the trying to make it funny or interesting as opposed to just giving you information. So having experimented with it, you know, as I was going to say, both of us have paid for that two hundred dollar a month.

upgrade, because we wanted to try deep research. And I guess mine is still live. I think yours just dinged. So but I'll say that I spent a good amount of time chatting with GPT 4.5 yesterday, and

And what they're saying is real. Like it is definitely much more pleasant to talk about it. And I spoke with Mark about this a little bit yesterday. The responses are shorter. They're more human-like. Like it doesn't feel the need to like print out, you know, a master's thesis for each answer. Like you can actually have a back and forth with it. And it was actually one of the more enjoyable conversations I've had with a bot to date.

That's okay. I'll give you, that is a good point. If the big functional change is we've all gotten very used to this idea that you, you know, query a chat bot and you get this

really overly thoughtful answer that tries to both hedge itself from any kind of safety consideration and lists out 10 bullet points and as you said, a master's thesis. So maybe there is something very important there where it actually starts to be able to answer you correctly in a concise way, in a more conversational human way. Maybe there's something there.

But to me, again, why not just put that out there in the model? Why have a big event around it? Why make a big press push around it? Why not just put that in the product? Well, here's why I would say it's important to do that is because, and this is what Mark was saying, that you have linear progression of the model's capabilities based off of what you predict. If you put this much compute in it, you get this much output.

And I think OpenAI is saying that this 4.5 is the next step on that progression. And it's met with the amount of compute that they've put in, the benchmarks that they've expected to hit. And that's why I said to him, did you find the scaling wall? And he said, GPT 4.5 is really proof that we can continue the scaling paradigm. So basically, I think that is sort of like, that is the march. But I also think it's important to kind of talk about like what it's going to feel like

to all of us and then this gets to Karpathy's comments. And it's basically here he describes really well the progress from the original models because

You it is it's going to feel less as you get better. So he says GPT-1 barely generates coherent text. GPT-2 was confused, was a confused toy. 2.5 was skipped straight into GPT-3, which is even more interesting. And GPT-3.5 crossed the threshold where it was enough to actually ship a product and sparked OpenAI's chat GPT moment.

He says, I went into testing GPT 4.5, which he's had access to, and he says everything is a little bit better and it's awesome, but not exactly in ways that are trivial to point to. Still, it is incredibly, incredibly interesting and exciting as another qualitative measure of a certain slope of capability that comes from comes for free just by pre-training a bigger model.

He says, we actually expect to see improvement in tasks that are not reasoning heavy. And I would say there are tasks that are more EQ as opposed to IQ related and bottlenecked by world knowledge, creativity, analogy making, general understanding, and humor. So these are the tasks that he was most interested in during his vibe checks. Basically saying that like, you use this model, it's a little bit better. And that matters a lot because we've already come so far from the barely coherent part to where it is today.

I think I'm going to nominate you as the new product spokesperson for OpenAI because I think you just convinced me right here. I think you just turned my entire view of 4.5 in this moment. So basically...

I've been talking a lot about AI has a branding problem. The idea that people say that's written, quote unquote, written by AI. Everyone has this really narrow view of what AI text generation is. And that's because of this very dry, weird, almost inhuman way that it responds to you. And every model, whether it's Gemini or Claude or ChachiBT, everyone has this view of this is what an AI response looks like.

So actually, if the real advancement here is it can move beyond that and make things more human and conversational, that actually could be very interesting overall in terms of

getting people to use these products. So I think if that's the real change here, I'm surprised that they didn't hone in on that, that this is going to be what takes ChatGPT to the next 700 million people outside of all early adopters and makes people comfortable and happy with it and makes AI much more natural within all types of mediums and channels and outputs. If they positioned like that, and if that's what's really happening here,

That is kind of exciting for me. I think that is how they're positioning it. They are talking about the fact that this has great AEQ, and that is where they want to seem to focus people with this release. And you look at some of these benchmarks, and so I'll just read a few of them. Simple QA accuracy, GPT 4.5 has 62.5% compared to 5%.

Let's see, 47%, the closest model, which is OpenAI 01. The hallucination rate is 37.1%. Again, lower is better. GPT-4.0 has a 61.8% hallucination rate, which seems high. So those are like the standard benchmarks. But then you get into the everyday queries. And they say that for everyday queries, people prefer...

57% of the time over GPT-4.0. For professional queries, they preferred 63.2% of the time over 4.0. And for creative intelligence, 56.8% over 4.0. So that's not nothing. No, I think...

If Kantrowitz and Roy were behind this marketing campaign and launch, we could have just come up with a simple make AI less AI. What about that one? Something just pushing the idea that that's what this is really about. Not getting caught up in the scaling law side of it, the compute efficiency side of it, and really saying this is the first model that makes AI less AI. It makes more people...

feel comfortable using this on an everyday basis. I think that I would have been, it would have been a little more,

more exciting for me. - Definitely. And so there's a very interesting debate that's going on about like, where did it get this more EQ oriented positioning? Was it pre-training? Like, is it because of its abilities or was it post-training where like they just added this personality after the model was built? We don't fully know. And actually, if I was gonna have one question that I'd wanna ask Mark Chen, if I could get him on the phone for like another five minutes, it would be that question and I feel bad.

having left that out yesterday. But I have seen some very interesting debates about it over the past couple of days where there's this one, Princeton academic Arvind Narayan. He says, apparently the main thing we're getting with GPT 4.5 is an exchange for 30 times percent price in exchange for a 30X price increase is fuzzy stuff like IQ. The ironic thing is this is an aspect of behavior, not a capability. My bet is that any difference in EQ is,

are due to post-training, not the parameter count. Okay, so that's an interesting thesis.

Ethan Mollick from Wharton slides into his mentions. And Ethan Mollick, of course, he's a professor. He's been on the show. He's been pretty good at sort of following the pulse of AI. He's pretty positive. So he tends to take the sunny side of things. But he says disagree on this one. Stuff like theory of mind or EQ are deeply rooted in abilities, not behavior in humans. And I would bet the same for AI. But again, we don't know yet. So...

Basically, if this did come out of like just training the model, making it more able, and then it all of a sudden produces like a more human style of communication, I think that's pretty interesting. Well, yeah, I do think and I was thinking about this as well after reading these articles.

On one hand, it could be essentially kind of a party trick. It could be more instruction level after the actual core training where it's just speak in this voice, give concise answers, try to lean your behavior towards a certain way. I think that would actually be very sad and be like, because that would be easy.

What Ethan's saying, I think, is the more interesting part. And I have to say, if it's open AI doing this, I have to imagine for this kind of product and model, they're not going to be going the party trick route and genuinely changing the way the model thinks and produces knowledge.

would be a very big deal, as Ethan's saying. But again, we don't know what that means or what it looks like. Is it in the supervised fine-tuning layer? Is it in the bass training layer? We don't know. I'm actually surprised. Yeah, we got to find out. You got to ask Mark Chen again, because to me, again, that is the really interesting stuff they should loudly be talking about rather than Sam Altman just saying it's kind of magic magic.

and not giving us any more. Exactly. And so there's been this other thing that's happened, though, which is that people have taken a look at the evaluation scores and have noted that this is not as good as reasoning models in a lot of different fields.

So I think we should talk about that because it has been used as a discussion point about whether open AI has lost the magic. So, um, let me just go through some of these, you know, whatever they're going to mean to you. I'm just going to read them out. So there's a GPT GPQA, which is science 4.5 gets us 71.4% compared to 79.7% for open AI. Oh, three mini. Um,

So it's down by eight or seven, eight percentage points there. There's AIME 24, which is math GPT 4.5, 36.7% compared to O3 mini 87.3%, less than half as performance. It's amazing. It just beats on this multilingual test and it is a little bit, no, it is a little bit better on one coding benchmark compared

And then a little bit worse on another coding benchmark. But basically, people have taken this and I think this was also something I saw afterwards. I was like, oh, dear, you know, like there is reason these reasoning models are outperforming this on a lot of benchmarks. And I think we should say that the reasoning models use the intelligence of these, you know, standard models and they learn how to attack things effectively.

step by step, which is like, yes, the reasoning models are doing the things that they're supposed to do. And it just shows you how impressive the reasoning is. But then there's also just like, why is it lagging? People have been like, all right, that's really disappointed.

Here is let's hear from trustee Bob McGrew, former chief research officer at OpenAI, that always seems to hop in the discussion at an opportune time. He says, don't be disappointed that GPT 4.5 isn't smarter than 01. Scaling up pre-training models, pre-training improves responses across the board. Scaling up reasoning improves our responses a lot.

If they benefit from thinking time and not much otherwise, wait to see how the improvements stack together. I think this is really important, right? It's that this 4.5 is going to be the basis of the next reasoning model that OpenAI is going to put out. And I think Mark hinted on this, is that GPT-5 will bring

both of those capabilities together, where you're going to have the smarter basic foundational model, which is going to be GPT-5 or something built off of GPT-4.5, and then you're going to add the reasoning in, and then it should even further outperform the stuff we're seeing with like 01 and 03. So what do you think about that? Yeah, trusty Bob McGrew making sense of it, I think. That makes sense that

Building this more apt, able, emotionally intelligent foundation model and then building, incorporating that with the reasoning model and ideally that getting us to GPT-5 seems like something ambitious enough to actually push forward on.

I guess I still have such a difficult time again, though. When we're looking at GPQA benchmark, AI ME24, even when we're looking at what you had shown earlier, there's like on everyday queries that GPT 4.5 beat 4.0 by 60%.

What does that actually mean? What does that look like? What kind of real life problems? Because I'm so fascinated by what is an everyday query in one of these tests that if you have an AI researcher creating a benchmark, what is their everyday query versus your or my everyday query? I think that's the part that's still...

worries me about OpenAI that so much focus is on that research house part of it and the mud, like the very, very research oriented approach to all of this going back to product versus model. But it feels like we're still locked in that rat race here. Okay. Well, that just takes us to our model versus product question again, because I did bring this up to Mark and I said, all right, you're the head of research at OpenAI. You're a model guy. So just like

I am trying to figure out how to argue this to Ranjan. Maybe you can help me figure it out.

And, and he did, he gave an explanation, basically saying that as the models get smarter, these products, like for instance, deep research gets smarter. We talked last week about how if they're hallucinating, they become useless. So the less hallucinations you would imagine, the better, maybe unless you're Benedict Evans, who wants zero hallucinations. So I'm kind of curious to put that to you and get your thoughts on what it means. I listened to it and it still felt like,

a relatively generalized statement for something that shouldn't be a generalized statement. Even going back to what are the real use cases, is it creative writing? Is that really what you're pitching me with 4.5, that it's going to be better? Is it

everyday users will have a better experience with a chatbot and feel more comfortable? Is it AI therapists are going to get a lot better because now it can actually talk to you in a more emotionally connective way? To me, that's the part that the hallucination rate side of it, I think obviously matters. But

If the idea is it's like, to me, the 99% versus 98% versus 97% for most AI use cases in the world, I think will probably be okay. To me, again, it's more...

It still doesn't answer that question. Like deep research can get better and better. But does that mean financial analysts will actually trust everything that they is put into a deep research report in a week, in a month, in a year? What does that actually look like? Yeah, I think we still don't know. I mean, we can definitely say for sure that like improving the model from GPT-1 to

to where we are today has mattered. But I think that the question is, yes, what are these incremental improvements going to really lead to? And like, yeah, I mean, Mark was like, it's all about getting to the frontier of knowledge in AI. The smarter these things are, the more they can do, just like a smarter human can do more. I think it's great. I love that they are pushing, you know, the cutting edge on this and that every AI lab is trying to get, push the cutting edge on it.

But I cannot staunchly sit in my position for much longer unless I see some tangible impact.

comes from this. But, but anyway, I'll, I'll still be on team model for the time being. All right. So let the, it makes for a better Fridays knowing we still have product versus model. Yeah. I'm not, I don't really see myself going away from that position anytime soon. I want to see the better models. Thanks for shipping the better models. I'm waiting for GPT five to show up and magically solve every use case perfectly. And I will, uh,

eat my hat, whatever, whatever one does on that day. Well, the last thing I'll say about Mark is I did say like, so aren't you setting expectations too high? And he said, I don't think so. So. Okay. All right, Mark. Keep the T5, baby. I don't know what trusty Bob McGrew would say about that, but let's see. All right. So now that I've become the sort of de facto product spokesperson for OpenAI, I'm

Let's go to Gary Marcus, because I feel like we should talk about, we should at least give some time to those who've said actually that GPT 4.5 launch shows that open AI is toast and sort of discuss their points. And one of those people are Gary Marcus, longtime critic, former, well, he's been on the show as well. I'm sure we'll have him back soon. He messaged me on LinkedIn after he saw my Mark Chen interview and said, I

Allow me to give a rebuttal. I said, all right, send me something. We'll read it on the show. I haven't got anything back, but I will read a LinkedIn post from him and we can discuss it.

So he says, OpenAI is in serious trouble. They still have the brand name, a lot of data, and tons of mostly unpaid users. But GPT-4.5 is usually expensive, even so it offers no decisive advantage over competitors in zero moat. Scaling hasn't gotten them to AGI. The GPT-5 project was a failure. There is already starting to be an is-that-all-they-have reaction, including from some people who've said they have to adjust out their prediction of when we hit AGI.

He said DeepSeek led to a price war that cuts potential profits. There is still no killer app. OpenAI is still losing money on every prompt. A bunch of investment turns to debt if they can't make the transition to the nonprofit fast enough. And Elon has perhaps upped the cost. Many, many top people have left. Some have started serious competitors with similar IP because OpenAI's burn rate is so high. They have limited runway. Microsoft no longer fully has their back on

Altman's credibility has diminished Sora went nowhere uh whatever uh lead they had two years ago has been squandered and if masa changes his mind they will have a serious uh cash problem and Elon is right that they don't have all the money for Stargate man uh what do you what do you think and respond to that list what takes Ed Zitron about 5 000 words to write I think Gary Marcus did in about uh in one tweet

Actually, that reminded me of Sora, that it exists, which I played. Have you used it recently? I have not. The text-to-video or image-to-video model, that one definitely went nowhere. Could have been a good product demonstration. I think overall, they are making this bet. It's what we keep talking about, but GPT-5 has to be

oh my God, this solves everything. Like this is where there's no hallucinations, it's reasoning, it's a huge foundation model, it's relatively low cost somehow. I think it really, the way they're positioning their entire business

is that it's going to be the kind of silver bullet to everything. Otherwise, I really don't see, again, the zero moat part of it, you're seeing more and more. Which is to me, maybe that is why they push so hard on these constant model releases because they have to stay relevant. Because the moment they're just an API in the background, then you're the most commoditized thing imaginable and then that will kill you anyway. So that's a pretty compelling case right there.

Yeah. And I think that one point I think that I should make here, and I did speak to one more point about the Mark interview, I spoke to him about starting and stopping. And he said, that's a normal part of training any model. But if you're starting and stopping on a model that's this expensive to make, then your costs go way up. So I think Gary is right that that the errors or whatever the changes, the tweaks that you have to make, become very expensive tweaks when you're starting to work on projects this size.

Yeah, the cost of the model training, and we're going to get into how Anthropic is supposedly the new cloud was much less expensive. DeepSeq, we know, whatever, whether it was 6 million or 60 million or whatever it was, was significantly less expensive. I think overall, you have one side of the industry showing us that it actually can be cheaper and cheaper and cheaper. But then those with the best interest, remember OpenAI's competitive advantage.

could be talent to an extent, even though a lot of talent's left, they have a pretty deep bench that's pretty impressive. Or it could be resources, cash, and access to compute. So they almost have to make that their game, because if that's not their game, they're not going to win. If that's not the game, they're not going to win.

So we talked about the competition, you teased Anthropic. We have so much more to talk about, including the new Anthropic model, what Jensen Wang has talked about, how expensive reasoning is and NVIDIA earnings. And of course, the new Alexa. We're going to do that right after this.

Will AI improve our lives or exterminate the species? What would it take to abolish poverty? Are you eating enough fermented foods? These are some of the questions we've tackled recently on The Next Big Idea. I'm Rufus Griscom, and every week I sit down with the world's leading thinkers for in-depth conversations that will help you live, work, and play smarter. Follow The Next Big Idea wherever you get your podcasts.

We're back here on Big Technology Podcast Friday edition, talking about all the latest

AI and tech news, including the fact that Anthropic has a new model. Jensen Wang has a stance on how much compute reasoning uses and the new Alexa. And by the way, Skype is dead. So let's see if we can get to that all in the second half. The first is that GPT 4.5 wasn't the only model here. We have Anthropic's Claude 3.7 Sonnet. It's here. This is from TechCrunch. Anthropic is releasing a new AI frontier model called Claude 3.7 Sonnet.

which the company designed to think about questions for as long as users want it to. So like we've been talking about, it's a hybrid AI reasoning model, a single model that can give both real-time answers and more considered thought-out answers to questions, and you just choose, do you want the quick answers

response or do you want the thinking response and the model represents anthropic's broader effort to simplify the user experience around its ai products we're long time claw heads i would say on this show i've gotten a chance to use it you've gotten a chance to use it i believe um the thinking toggle that we talked about is pretty good it's almost as good as deep seeks what is your response that we have

Another model from Anthropic and the fact that we went not from 3.5 to 4, but from 3.5 to an incrementally better 3.7. What about 3.6? I was waiting for 3.6. That was going to be the big one, but we just skipped straight ahead to 3.7. On to 3.7, baby. I think I've been using, as a clod head, I've been using 3.7 regularly.

Again, from the model side, the thinking toggle mode, which I'll still categorize a bit as product. Maybe that one lives between product and model, is good. Cloud code is definitely a very new offering from them. And I think it's going to be very interesting because still coding to me is the most...

monetizable, direct to actually productive use case for generative AI as of today. So I think the way they approach this is kind of how I want these model launches to be approached. There's a blog post, there's some tweets, there might be an explainer video here and there, and that's it. And we keep getting improvements. And as we wait for 3.9, maybe, not 4.0, because that's AGI probably. 4.0 is AGI, so.

So I will say just having, we're going to get into how they trained the model because it's interesting, but I will say I did an experiment this morning where I've been using, I mean, I think I've mentioned this on the show.

I've been using Claude every day as a diet coach where I basically like write down the meals I've had, weigh in, will give me how I did a letter grade based off of the prompt that I gave it about the way that I want to be eating. And it will like count up the calories and grade the foods. It's very good. And it has lots of memory. And so I like I copied the history, which goes back like probably a month and a half at this point in the latest chart.

and dropped it into OpenAI's GPT 4.5, CLAWD 3.7 Reasoning, and DeepSeek. And unfortunately, I'm here to report that DeepSeek did the best job of all of them. You didn't try Grok and have it yell at you and make fun of you for- No, I'm good on that. Thank you, though. I think, see, that's like, to me, I want that as its own standalone benchmark. The Alex, what did I eat?

benchmark that is the leading benchmark for all frontier models going forward. I mean, that's the real life stuff that's actually interesting to me. I do that very regularly. I'll have three tabs open, try the same question across three and see what I get a good answer on. Those are the use cases and the kind of ways that I think everyone, all of our listeners, like approach these models in that way, try different approaches.

models and try the same question, just see what happens and see what you like better. I think that's the real way to try to decide what's really happening in terms of progress here versus the more theoretical stuff. Can I just say one of the big takeaways for me this week is just that reasoning is just freaking unbelievable. Like it's a true breakthrough. And when you use those models, you just get better stuff.

And that to me has been sort of like, I think discounted, I think in some of the conversation here, but not in our show. I think we've already always talked highly of reasoning, but in the broader, like the walls hit open eyes toast, like reasoning is both useful and better.

I'll agree it's better, but it's still better for many things, but not all things. Again, simple queries of back and forth, analyze this text, something that's like where all the information is right there in front of you and doesn't require a great deal of complexity, you don't need reasoning for that and that's more expensive.

or that's more complicated, or it's more time consuming. So I think I agree. I'm still wowed, but there's also the UI element of that, that deep seek, again, listing out the questions of the chain of thought as they're coming up was the kind of

using the term party trick again, but it just UI feature that makes it so much more real. And now everyone's doing it. It's amazing how quickly, sometimes it's almost annoying now on ChatGPT where it starts walking me through what it's doing and how it's thinking when I actually don't want it to, when I'm like, I'm good. I'm good. Just.

Just give me an answer. I'll take a little, I'll switch to another tab and come back. But it is like your very talkative friend who's like, let me tell you exactly how I got to this. And you're like, nah, it's good. We're just going to go with your answer. I'm going to start on this tab and then I'm going to go here. It's almost, I want like the log afterwards if something's wrong to go back. But we've talked about this before. We've talked about this before.

The problem that remains is if something is broken in that reasoning process, you can't simply fix it. It's not like I go back and I'm like, okay, on step three of eight, I would rather you have done this than this. That does not exist yet. So at that point, the reasoning is nice, the show of reasoning, but it's not actually...

you can't utilize it in any meaningful way. That's fair. So Ranjan, I want you to talk a little bit about this cost efficiency that Anthropix seems to have found in training 3.7, because I think that's pretty significant when we think about how these businesses will operate and whether they need to spend as much money as they are training their latest models.

So on the cost side, 3.7 saw it. It apparently cost just a few tens of millions of dollars to train. We already talked about deep seek. I think, again, it goes back to showing what are the real costs involved? There's gathering up some large amount of data.

If it's a reasoning model, there's a supervised fine-tuning side of it. There's a reinforcement learning side of it that could involve bringing lots of humans. And again, that literally is like, what is the correct way to get to this answer? Is the answer correct? What rank these outcomes and actually going through hundreds, thousands, tens of thousands of times and training the model that way?

Obviously, that's time intensive and it's expensive, but I think it's important to recognize that even Anthropic, who has kind of been in the whole big models, expensive models game so far, the fact that they are moving towards this, it almost means, I guess, OpenAI is probably the only player left that's still kind of trying to sell, you need big expensive models to win. Yeah.

So then what do you think about this comment from Jensen where he talks about now we're going to go to reasoning and inference and that's going to be

more expensive. So NVIDIA earnings came out this week. So they had revenue jump 78% from a year earlier to 39.33 billion in the quarter. They're projecting 43 billion in the next quarter. They delivered 11 billion of their Blackwell chips. So life is good for NVIDIA, but everyone's getting the sense as to like, how is your business going to look if we get more efficient, if we go toward

uh these reasoning models and um and this is a very interesting statement from jensen wong where he says ai has to do a hundred times more computation now uh then when chat gpt was released basically talking about how the reasoning approaches are more expensive

Next generation AI will need 100 times more compute than older models as a result of new reasoning approaches that think about how to best answer questions step by step. The amount of computation necessary to do that reasoning process is 100 times more than what we used to do. So it is interesting to me because, I mean, you look at what DeepSeek did and they found a way to not only do reasoning, but do it more efficiently and better.

Jensen is saying this thing that seems to disagree with this a little bit. Well, I'm curious what you think, Rajan. I mean, never to speak ill of Jensen Huang. I think he's saying what he needs to say. I mean, if the thesis that things are going to get much cheaper and require less compute holds...

We could have the Javon's paradox, which I haven't heard in a little while, but we all heard about that one week. Again, the idea that the more ubiquitous AI would get because it's cheaper would actually require more aggregate compute.

But it still feels like NVIDIA has to tell that story. And again, the company blew out numbers again. And even though it's getting caught up in the larger stock market route as of today, but this is still an insane company in terms of its ability to produce and deliver, it still hurts their longer term story, at least with the expectations that have been set by the market.

Yeah, I mean, it's just one of those things where I'm like, I see his logic and I see where he's going. But I don't really see how I mean, yes, I mean, they've talked about how inference is 40% of their revenue. But I just don't really see how it's going to cost 100 times more to do reasoning. Maybe I'm missing something.

No, I think it's very difficult to try to calculate out because even I guess the more complex the use cases get and maybe we'll start unlocking use cases that we haven't even imagined or AI is going to be applied to areas where we haven't even started to and those will be the ones that really soak up all that compute.

But I agree with you that the idea that it's going to require 100, it's going to 100 times more compute, especially as the trend is everything's getting cheaper, doesn't make sense to me either. Let me ask you this one thing that I saw from earnings. And I'm curious if you think that it's right. I mean, the fact that they shipped 11 billion in Blackwell chips, the expectation was like three and a half billion. So

Clearly, there's a huge amount of demand for the Blackwell chips, which are the latest generation of NVIDIA chips. All the hyperscalers are saying we'll take as much as we can get, including Andy Jassy at the Alexa event this week. Does that show that there's already enough tangible process? Sorry, does that show that there's already enough tangible progress within AI that merits this further investment of chips? Or do you think we're just still in the finding out phase? You never want to be in the find out phase, I feel like I said.

I think we all know what happens after. But I think from the hyperscaler side, it's still like no one backed down from actually Microsoft a bit seem to hedge. And I believe there's some reporting that they're canceling some data center leases. But overall, the hyperscalers are playing the same game that we're going to, it's an arms race for a compute and we're going to continue down this road. And we're going to get into the Alexa event

And maybe it does start to seem like the more complex Alexa gets, if every single person who has an Alexa is actually actively engaged all day with Alexa Plus,

then you start to see that, okay, it's going to require a lot of compute. So if those, like, if it really lands in the way that it's being promised to, maybe that does make sense. But I think as of today, it's just everyone is to, all the hyperscalers are taking the exact same bet.

Right. So we're still in the like scale up infrastructure and maybe this will work not in the, this is working enough that we're going to keep investing. Exactly. So I think that, I mean, I am still, I'm still bullish on Nvidia, but I think if you take this kind of like dubious proclamation about reasoning being cheap, being much more expensive combined with the fact that like, yes, they're still ordering, but you know, there's a big if at the end of the tunnel, right?

I do wonder a little bit, like, if there's, like, a potential nasty surprise for NVIDIA coming in a couple years. No one ever won that prediction in the last few years, at least, but...

I'm not disagreeing with you, but it's one of those things that I'm almost fearful of saying out loud. Yep. I'm sure I'll eat the words there. And maybe Mark Zuckerberg will be the one that continues to keep NVIDIA running, taking all that ad money and pushing it right into this chip and server company, right? Because now Facebook is going to potentially spin off a Meta AI app in an effort to compete with OpenAI's ChatGPT. This is according...

To see NBC, Meta AI will soon become one of the social media company's standalone apps. Joining Facebook, Instagram, and WhatsApp, the company intends to debut a Meta AI standalone app during the second quarter. Of course, they're going to have all the app install power that you have on Facebook to get people to use it via their devices.

ad slots and new slots they're going to put in. And Mark Zuckerberg is really intent on basically taking over OpenAI's lead with ChatGPT. He sees it, we've talked about it in the past. He sees this as a big consumer app. He sees that it's growing fast and he doesn't want somebody else to do it. Same way that he sort of cut off Snapchat and did it with some success against TikTok and Reels. Very funny response from Sam Altman when he sees this. He says, okay, fine, maybe we'll do a social app.

He says, it's funny if Facebook tries to come at us and we just uno reverse them. It would be so funny. I mean, I think he's there. You get that response from Sam Altman when he's not.

completely sure of his footing and i don't really feel that he's sure of his footing against this one because you don't want to go up against facebook when it comes to a consumer app it's it doesn't usually end well so what do you think rajan i think that i'd actually it's been so long since i played uno i had to look up what uno reverse was first but i was like so surprised like what do you uno reverse who speaks like that but anyway sam altman speaks like that but i

The same guy who's coming up with everyday queries for our AI benchmarks. But I thought this was a very interesting one because I have been using meta AI more for image generation just because it's very easily accessible. And it's still in a weird place because it lives in the search bar for Instagram and WhatsApp or Facebook.

And living in the search bar, I've also accidentally used it when I'm searching for something on Instagram and somehow it pushes me towards meta AI and gives me a weird chatbot response. So I think spinning it out is a very interesting idea. And then being meta, they would be incredible at...

quietly guiding people towards that app from all of their other apps. But it's still, is it needed? Is there other ways to integrate it more as a tab in existing Facebook Blue and Instagram itself? That part, I would think it would just be another tab on the regular app versus having people download something. But I see this one as another threads. They'll get some big numbers, but I don't think it's going to be.

anything too impactful. Yeah, I don't think it's going to work. I would like to see an open AI social network and not for nothing, but OpenFace is out there for the taking. The AI first social network where all of your posts are commented on extensively, where you have a million friends who are all fawning and love you very much. I think they could go down that road, OpenFace.

Open face. Open face. I'd sign up. I'd be a day one user. Social networking needs like an entire remake and maybe Sam Altman is the one to bring that to us. I mean, if anybody can do it, maybe it is OpenAI. They're the best at product in AI. So you don't even need a better model for that. I'll give you that, Ranjan.

That's the product that we've all been waiting for. So we got about 10 minutes left and I've saved this for last and I don't want to spend too much time on it because I am going to have a podcast next week covering it. But Alexa, the new Alexa app is out. This is not out. It's been introduced. The new Alexa revamp. It's called Alexa Plus. It is conversational. It is able to accomplish things in the real world. It seems to have evolved.

an awareness of what happens between your Amazon services. So you can ask it to play a song and then say, all right, can you take me to a point in this movie? That song is on and it will do that based off of Amazon Prime Music and video.

It will search your ring cameras for you. It will potentially order you an Uber. You can use it to control the sound in your apartment if you have echoes with conversational tones, like can we have this song play in that room or can I want to hear it over there? It was a very impressive, very impressive demo, I felt. And it was live, unlike Apple Intelligence, where Apple Intelligence was a promise and it seems like a lot of this Alexa stuff is going to work.

So I do want to do this preface that we're going to have Panos Panay along with Daniel Rauch, the head of Alexa. So it's going to be a fun conversation that's coming up on Wednesday. You're going to hear a lot more about that. But Ranjan, I'm very curious, like what your reaction was in our chat. I think in the Discord, you dropped this German tweet where he talked about how like this was ChatGPT voice on steroids and Apple's.

Apple has to be embarrassed at this point for how bad Apple intelligence is. But what was your takeaway looking at the Amazon news? And if you can, if you want to, I'm just going to say, if you want to, you can say what it might mean for Siri. Man, I...

spent so much time rewiring my entire house for HomePods and I watched that event and I want to just go back. I want to switch all to Alexa and I'm sure we'll definitely talk about it more after your next episode, but like

It looked good. It looked exactly like what it should be. It looked like putting chat GPT voice mode on a device or Gemini voice on a device or just what basic voice interaction should do right now. And I'm avoiding hitting my table right now. So there's feedback on the mic because you can tell I'm worked up right now. No, do it this time. I'm just telling me. Sorry, listeners.

Oh, my God. That's all it should be doing right now. And it did what it's supposed to do in voice. We know generative AI voice is that good right now. I think the only thing that I think could be a little problematic for Alexa Plus is like Amazon does not have a great reputation. Right.

in terms of privacy or just overall, like it can still be a little creepy. The reason I got rid of my Lexus was it would always ask these follow-up questions, which you could not turn off. You'd be like, what's the weather? Oh, here's the weather. And can I interest you in these three other things? And you couldn't turn it off. So I think, I mean, the way they portrayed, it becomes your like really trusted companion that you're sitting there sharing yourself with.

that's a big ask in terms of trust. So I think in terms of the technology, I'm pretty confident they're there in terms of like getting people actually comfortable with interacting with your voice device in that way. We'll see, but.

Oh man, this is going to cost me a lot of money. So yeah, we did talk about it yesterday. I'll just give a quick preview of what this is going to look like. I mean, we talked about it yesterday and they are aware that the, you know, talking back to you and being proactive can be pretty interruptive and annoying. And I think they're paying attention to that as they roll this out. So that'll be at the end of the conversation for those that listen. But yeah, I thought it was really interesting. I think Amazon has a shot here. I wrote about this in big technology that Amazon

Basically, all big tech companies want to build a universal, contextually aware assistant that helps you get things done. And Amazon has a pretty good shot to be the one that pulls it off, especially because they have a working demo of this and it seems like it's going to go live next month. And

I don't know. I mean, they don't have an operating system, which is on one hand, they don't have a mobile operating system. On one hand, that's a curse because that default matters a lot. We know Google pays Apple $20 billion a year to be the default search engine on the iPhone. However, that does let them...

other productivity services and not privilege their default. And I think these companies privileging their default productivity services have been sort of the downfall of the modern AI assistant. Like if I'm using an iPhone and I can't use Google Calendar or Gmail in there because Apple is so dedicated to whatever Apple Mail, that's not going to work.

That ruins Siri for me, but Amazon doesn't have that problem. And so I was speaking with actually the head of Prime who was mentioning that, yeah, I use my Google calendar in my Echo devices and it works just as well. So that could be a blessing for them. Yeah, I 100% agree. Though, speaking of Prime, the way they rolled out the pricing, I think was the most like...

savage and amazing Amazon move ever. Again, I forget what the monthly subscription is. Is it like- 1999. 1999. Or was it 1499? Let me see. It's something that most people would not pay as of today, but as an Amazon Prime member, you get it for free. So it just kind of like assigns this additional benefit to being a Prime member, which if you're a Prime member, you shop on Amazon Prime.

X percent more. So they're going to kind of assign this incredible value to it on day one. And you're going to feel like, oh, well, now if I was questioning, should I renew my Prime membership? Well, it's a gimme. I mean, I'm getting Alexa Plus for free. Brontën, listen to this. Alexa Plus costs $19.99 a month. Prime costs $14.99 a month.

Oh, wait. I thought it was free for Prime. No, if you have Prime, it's free. So basically, you could pay an extra $5 to just get Alexa Plus or five less dollars to get all of Prime and Alexa Plus. Oh, okay. That is savage. That is savage. I mean, if Lena Kahn was still around, I don't know what she would say, but my goodness. But she's not. But she's not. But she's not.

Okay, before we go, we need to talk about Skype. Microsoft is killing Skype. This is hot off the presses and makes me very sad. As from TechCrunch, after kickstarting the market for making calls over the internet 23 years ago, Skype is closing down. Microsoft, which acquired the messaging and calling app 14 years ago, said it will be retiring it from active duty on May 5th.

to double down on teams Skype users of 10 weeks to decide what they want to do with their account. It's not clear how many people will be impacted. The most recent numbers that Microsoft shared were in 2023, where it said it had 36 million users, a long way from Skype's peak of 300 million users. We do look at tech with a critical, sometimes hopeful eye.

I will say that Skype is one of the products that I've loved the most on the internet. Just have good feelings about it, helping me make international calls and calls to friends. And you could play different games on it back in the day. And that little squeak that it makes when you get a message will forever remain in my heart. Rest in peace, Skype. We bury you the week after the humane pin goes the way of the Neanderthals. And I'm much sadder about losing you than the wearable AI device.

I had my first remote job interview on Skype. I agree, international calls. It opened up the world, sold for $8.5 billion to Microsoft in 2011. And sorry you had to get caught up into a corporate battle with Microsoft teams that clearly won and ended up basically, I think the last few times I ended up on Skype was

I had all these messages that were clearly phishing and scam things that were just barraging my Skype account and did not open it after. Goodbye, Skype. Goodbye, Skype. And goodbye to all of you, but just hopefully for a couple of days, because I'll be back on Wednesday with those two Amazon executives and Ranjan and I will be back on Friday. Ranjan, thanks so much for coming on the show. See you next week. All right, everybody. Thank you for listening and we'll see you next time on Big Technology Podcast.