Hey everyone, it's Eza. Welcome back to your Undivided Attention.
So today we are going to be doing actually a bit of a special episode. It's going to be me here with our co-founder, Randy Fernando, who was at NVIDIA for seven years. And what we really want to do is give you some insights into the latest set of AI models that came out. So these are OpenAI's O3, DeepSeek's R1, and actually they're following on OpenAI's O1 from a couple months ago.
And we want to talk about what makes them a big deal, why we have switched into a new paradigm and how these models get trained and what's going on behind the scenes. So first, Randy, thanks for joining me. Glad to be here.
First place to start is, you know, this new model from China, DeepSeek R1, it dropped and it ended up creating this frenzy in media. It shook global markets. The hype has quieted down. And actually, you know, I think that the drop in global markets was very irrational. But let's talk a little bit now about what makes this a key inflection point in AI tech.
I think there were several things, right? And I'm not sure exactly which order to go, but I'll just name a few. One was low-cost, high-performance reasoning. Like, it actually performed well, and people used it, and...
That was really impressive. Now, there are some asterisks about the cost because the cost didn't account for the GPUs, the salaries. Just to jump in, there's a widely reported number that for between five to six million dollars, this Chinese lab was able to make a model as good as OpenAI's OR.
And this, if this is true, that means that the big labs no longer had a frontier competitive advantage. Everyone could be making these. But of course, that number, I think, was inaccurately reported.
Yeah, exactly. And there's some debate about that, but I think our goal today is to give you some principles to think about this rather than nitpicking every detail. That's right. Clearly, there was some really smart implementation, algorithmic optimization. There's just a lot of smart things that were done to do it all efficiently. That's true.
O3 still performs better. I think it's important to remember that because amidst all the hype, I think some people lost track of that. O3 performs better, but it uses a lot more computation and cost to get there. The open weights, the published methodology, right? So the DeepSeek R1 paper is
talks a lot about exactly what they did and this process called reinforcement learning, right? Where the model is able to try out lots of different experimental ideas, score them, and then keep the best ones, right? So it's allowed to be very creative. Try out lots of different answers to problems, different sequences of steps, different recipes, right, to solve that problem.
Some work, some don't work. And then it's able to figure out, yeah, these are the ones I should keep. These are the ones I should toss.
And that worked really well. And this paper kind of documents the process for doing that. Plus, since all the waits are open, right, this is now the new baseline that anyone can have, anyone who's serious can have access to, right, in an open way. So that's a big game changer. Yeah. And so now I want to walk everyone through, like, what makes O1, O3, and R1 really different? Randy was just referring to them.
So let's start with the large language models. So these are, you know, the GPT-4s, the LAMAs that everyone now sort of is aware of. And the way those work is they...
are trained on the entirety of the internet or lots and lots of images. And what they learn to do is to produce text or images in the style of. So it can produce text in the style of Shakespeare, produce text in the style of thinking, produce text in the style of being empathetic, produce text in the style of a good chess move.
But it doesn't really know what's going on. It's not thought about it. It's just doing a very large scale pattern match and coming with a knee jerk reaction. And that has a limit to how good it is. Can I add a little bit? Yeah, absolutely. It's just patterns show up everywhere. I just want people to recognize how often patterns show up in our life. When you look at language or vision or music or code or weather.
or medicine. There's patterns in all of these, right? Whether it's words or pixels or like audio waveforms or like syntax in code or on a map, like which cells are which color, right? Or where there might be a cancer on an image. All of these things come in patterns. And so once we can learn those patterns and models can learn to extrapolate those patterns,
they can become good at all sorts of things that are important to us as humans. That's great. That is great. And another way of saying this is that AI, these are language models. It can treat absolutely everything as a language. Obviously,
Language is just a sequence of words. It's a language. Code is just a sequence of special words. It's a language. DNA is a sequence of, you know, ATG, C, just another language. Images are a sequence of colors, just another language. So if you can learn the patterns of those different languages, then AI can learn to speak and translate from the language of
everything. And the important thing about language models is that they're learning really to babble in a convincing way in all of those languages. And that's where you get all the hallucinations and confabulation because it's just giving a statistically representative pattern at a very large scale. Okay, so then along comes R1, O1, O3. And what makes these differences, it's almost like a planning head that's placed on top of the intuition.
So let me give a really specific example of how this works, where let's imagine you've trained a language model on chess moves. So now it can come up with a good intuitive next chess move given the board state. And that can be as good as a very good chess player, but not better than the very best or grandmasters, because it's just giving an intuitive hit. It can't do better because it's only trained. If it's only trained on what humans have done,
it can't do better than the humans, right? So that's a really important concept. And now we're just, it's just about to jump into why now we can transcend that. That's exactly right. And it's a really important point because often people will push back and they're like, but hey, I can't get better than humans.
Because it's only trained on human data, so how could it possibly get better? Well, when you or I play Garry Kasparov in chess, we'll lose. At least I will. I don't know. Oh, me too. I play, but I'll lose. Why? And the answer is because he both has really good intuition because he's played lots of games.
And two, he's very good at thinking through all the different scenarios. If I make this move, then they'll make this move. So I'll make this move. Oh, that doesn't work. Backtrack. So I'll make this move. They'll make that move. And I'll make the aha. Now I'm in a good position. So there's this sort of tree of thoughts that Garry Kasparov is exploring based on his very good intuition. Now, you or I are going to do trees of thought, but our intuition is not that good. So we're going to make lots of false steps. He's going to search all the most important trees very quickly. And hence, he will dominate us. Well, that's...
That is the ability that R1 and R3, these reasoning models, are starting to have. That they can use their intuitions from their language model and then create trees of thought, sort of very smart trial and error, to search over what good moves are. And in that way, you can make a chess AI that is better than every human being forever.
Yeah, exactly. And another way of underlining this is to say that just like the patterns we talked about, like that exist in audio, video images, like all of these things, reasoning also follows patterns, right? There's recipes of thought, right?
And so think of it, you can kind of think of it as if you're cooking, there's a recipe, you can modify certain parts of it and you can get to different types of dishes, right? And this is the same thing. Like when you're solving a problem, there are playbooks that we all use to solve problems. And now we've taught it, right? You just give it a few of the main recipe types and then it can play around from that baseline and try lots of new stuff.
A really important thing, as Issa said, is some of those new ideas are going to be things we've never seen before. Some of those we'll understand, but there's also going to be variants that we don't even understand.
And that starts to have big implications for other problems, right? Things like deception, safety, transparency, right? Like, how do you understand what a model is doing when it's using reasoning that you can't even follow? So this is all coming, right, as part of this big leap that we've just taken.
And what Randy said, it can feel a little meta, but it is so important that reasoning itself has a set of patterns that if you learn them, you can get better at reasoning. So I think we're going to stop seeing these big model jumps from GPD 3 to 3.5, 4 to 4.5 to 5. We'll get there a couple more still coming.
But we're going to enter a new regime where there is now a way of if you pour more compute in, the AIs can get better. You just shovel more money and they will continuously get better. Let me explain how. So let's go back to the chess example. With the chess example, maybe your language model has an ELO score of 1500. ELO meaning just a way of ranking chess players.
And you now add search on top of that, reinforcement learning or planning. So it's looking at all the various paths and it starts to discover better moves. Maybe just a little bit better. So maybe it's like ELO 1505 or something, just a little bit better. You then distill, that is, you retrain your original model, your intuition, to now have the intuition of that 1505, the slightly better player.
And then you just search on top of that. And now you can discover 1510 moves. And then you distill. Now you can discover 1515 moves. And you can see how you can consistently go from, you start with your base model, your intuition. You think or reason over the top of it. That lets you discover new, better moves, which you then learn from and put it back into your intuition. And now you have a ratchet. And it's important to note, this is not just chess. This is math. This is
Any field that has theoretical in front of its name, because those are closed system, you can just run computation on to check yourself. So that's theoretical physics, theoretical biology, theoretical chemistry. Anywhere where there's a clear right or wrong, where you can check. So math, you can substitute, right? Say like you're solving for X in some complex equation.
you can plug X back in and see if X was right. So based on that, you can improve, right? With code, you can generate code and you can plug it in and run, you can compile it and run it and see if it actually works.
And so those domains are the ones that you can just improve and improve and improve, which is why in chess or Go or StarCraft, like we've been able to accomplish not just human level or like the best humans, but go far beyond because you can just keep improving. You can keep testing and you can just toss away the ideas that don't work.
It's really interesting and it kind of says a lot for what the future holds. So it sort of begs the question of why now? Right. Why now? Great question. Yes. And an important piece of that is having base models that were smart enough to generate interesting ideas to try out in the first place.
And to be able to evaluate like, hey, that's a good path. Let's try that. That's a bad path. And so until recently, the base models just weren't good enough to do this. So this idea of reinforcement learning, these feedback loops,
were not actually possible. No, that's right. And actually, I know of teams that a year ago tried pretty much the exact same thing that DeepSeek tried. Right. And it just didn't work because the base models, the intuition wasn't good enough. You have a bad intuition, you try to search over bad intuition, you just get bad thoughts.
That's right. And so one thing that's also really important is that because of the same reason that makes these models really good at quantifiable areas makes them not as big a jump in subjective areas. Like say something like creative writing, which is much harder to quantify and say, hey, is that really good or is that not as good?
Now, again, if you define some very clear parameters for creative writing and say, here's a scoring system, like this is a good piece, this is a bad piece, you can do the same method. But in other areas, you can't. It's important to note that one of the open questions is how much does an AI learning how to code and do good thinking in the more hard sciences, how much does that transfer to the soft sciences and the soft tasks?
And there is evidence that you do get some kind of transfer, that the better you get at hard stuff, the better you get at thinking through the soft stuff. There's a famous early example from two years ago where just training AIs on code made them
better like writers and thinkers because there's a kind of procedural formality to code that it was then learning how to do in the soft skills. I do want to extend... So learning how to think, right? Exactly. Algorithmic thinking, learning how to think in a...
structured sequence translates to all sorts of areas. So just to reinforce some of the points we've just made, before there was what was known as the data wall. Once you train these large language models on the entirety of the internet, that was it. It was going to be hard for them to get better. That data wall with these new techniques is no longer relevant because you can just do their self-bootstrapping.
Two, once the AI gets superhuman at any one of these tasks, like humans have just lost in that thing forever. And the thing you'll next year is like, oh, but humans plus AIs can do better than that. And that's true for a very short period of time. That was true in chess. That is no longer true in chess. So this thing, you just pour more compute in and it goes up. And now we get to why was the market crash irrational? The market crash was irrational because people
you can always use more compute. And as soon as these agents get to the place where they can task themselves and be like, what are ways that I could use more compute to say, make more money? And that's probably coming end of this year, early next, give or take, then compute isn't an all-you-can-eat buffet because with oil, if we discover more oil, it's not like humans can immediately figure out how to use all that oil. But with compute and with AI, as soon as we discover more compute,
if the AI can figure out how to use that compute effectively. And so NVIDIA and all of the AI companies, it still is going to be a race for who has
the most compute. And then the final thought here is that this doesn't just work with games and math and physics. This is going to work with strategy games of war. This is going to work with the strategy of scientific discovery. This is going to work with persuasion. You train these models over the entirety of every video of two human beings interacting. And now you start doing search over the top of that to be like,
What joke, what relationship, what facial expressions does the model need to make to get the human being to laugh or to cry or to feel in some states? So superhuman persuasion is a natural result of all these things. Lots of things can be scored and quantified if you're just creative about how you do it. And once you can do that, you can reinforcement learn how to do it really well.
I wanted to add one thing to, Isa, your third point, right? That just to help people realize the automation revolution is about the entire $110 trillion global economy, right? Nothing less. It's about the cognitive, right? Currently through large language models and the physical through robotics. And that's why, right? Like you can spend so much more on all this stuff as long as it's getting you returns, right?
And I think it's worth mentioning, you know, there's this question of like, is it all a big bubble? I think we have to be nuanced about it. Part of it is more of a bubble, right? Like I think the translation to where generative AI helps with the attention economy has a much more bubble-like quality because it's just not as clear where there's something like genuinely helpful and advancing there.
But in coding, for example, like Cursor was recently the fastest company to $100 million of active recurring annual recurring revenue. And that is because they are helping with coding. Cursor is an environment where you go in and you write code and it helps you do that really efficiently.
The value of that, the real value of that is enormous, especially on this path, right, to automating like large scale automation. And I think that's really important to keep in mind. One really important thing to talk about here when we think about market bubbles is the distinction between development and deployment. That is how fast does a technology diffuse into society and almost always people don't
think that development will take longer than it actually does, that is, development goes faster. But then they expect deployment, diffusion, to go fast, but then it takes longer, and that's where you get these little bubbles. But general-purpose technologies are a little bit different. Yeah, I mean, because you can swap them out so much more easily than in the past, right? So let's say you're changing your accounting system. There's so much work that has to be done, right, when you do that process, right?
But when you start to use general purpose technology that can do things for you, when you get a newer one, it's normally just strictly better than the old one. And those of you who've been using these technologies regularly have probably seen that every month, stuff that used to be like not as reliable or slow is now faster and more reliable. And that is just a pattern that we'll continue to see.
The other thing is there's a lot of companies like say NVIDIA as an example, right? That's building what's called middleware, right? So this is a layer that you connect to, like your company connects to the middleware layer and the middleware talks to behind the scenes, the large language models. And so they can swap out the large language model even invisibly to you. And the whole thing will just work better. And you don't even have to change any lines of code.
So this is happening not just with the cognitive stuff, but also in the robotics realm. And that's one reason why I think the diffusion process this time around will be a lot faster than many people think. When they compare, they're using a model of like, well, what have we seen before? Those patterns may not apply as well this time around. If we went back two years, when we first did the AI Dilemma,
The place that we focused was what we called second contact with AI. So these are AIs that were smart, but were not trending off to being superhuman. And there were huge numbers of issues there, and I won't have to recount them here. But really seeing 01 and then the speed to 03 deep seek, meaning that open AI is following suit.
We really have to take seriously that we're going to be dealing with AI agents in the world that are at or above human abilities across many domains. And that's deeply unsettling. And it's not like when I'm in these rooms with some of the most powerful players, it's not like anyone actually knows what to do.
Just, I can't remember, three weeks ago, four weeks ago, I was at a conference and I was giving the closing keynote and Eric Schmidt spoke just before me. And he said a lot of things, but one that he talked about was that all of the AI labs are currently working on making their AIs code. And he sort of couched it as,
well, they're making them code because that's what coders do. They know coding the best and they're physicists, so they're going to work on making it code. And a little bit later, he said the thing that scared him most, the moment that we would need to pull the plug for AI security reasons would be the moment that AI gained the ability to substantially increase the rate at which AI progress is made. And the thing I think he didn't say is, but the incentives are great.
that every one of the labs will get a disproportionate advantage if instead of using real human beings to code, they can just spin up more digital programmers to make their AI go faster. I'm curious, Randy, if you have any thoughts to add here where the full weight of the competitive landscape is now being pushed towards the thing that Eric Schmidt thinks is the most dangerous thing.
Yeah, the whole thing snowballs, right? You just end up with an advantage that accrues into... By the way, for those of you who don't know, Eric Schmidt was the former CEO of Google. And so, to answer your question, I think it's this compounding cycle that we get into, especially when you're good at coding.
you end up being able to unlock so many other things because coding is like the doorway to the world, right? And this is why companies are so interested in being good at coding. From there, you can get to agents. From there, you can get to tool use. All of this gets unlocked and then it gets faster and faster.
You can chain the models together, right? They can work together. They can share information. They can share what they're learning about the world with each other. And they can work co-operatively.
Like with the same mission, the same purpose. And you don't have the sort of translation loss that you have when you have humans trying to work together where you have to work so much harder to get everything to work. That's right. And like the big thing that's happening now with the reasoning models is, you know, with language models, they can give you like knee-jerk reactions. And of course, they've learned across the entirety of the web. So those knee-jerk reactions can often be good, but they cannot plan and do long-term things.
And that's what these new models, DeepSeek R1, R01, and R03 are starting to be able to do. Eric Schmidt acknowledges and says openly that the place we would need to pull a plug, not that I know where the plug to pull would be,
is when AIs can do this kind of self-improvement. And the labs, when you talk to people inside of them, the AI is already making their work go much faster. And the expectation is sort of by the end of this year is when AIs will be making substantial improvements to the rate at which their own AI coding is going.
And, you know, I'm just going to say that a lot of my attention and time, as well as I think, you know, CHTs is in doing the sense making to figure out what are the very best possible things
we can do. And so I actually want to recruit everyone that's listening to this podcast to start thinking about this particular problem because it's not easy because everyone, of course, wants the strategic advantage for able to have superhuman ability in coding, cyber hacking, science progression, creating new physics and materials. It's sort of the biggest, thorniest problem.
And the principle related to that is as the general purpose technologies advance, right, as the technology becomes more general purpose, it becomes harder and harder to separate the promise from the peril. And these reasoning models are a big jump in that. So it means it's a tighter coupling. It's a much tighter coupling. And these are the challenges, right? Models are going to become better at things like deception, right?
And a lot of that, I just want to emphasize, is because they're just trying to achieve the goals they've been given within the rules they've been given. And it turns out, unless we're really, really careful about how we define those rules, there's always risks we haven't thought about. There's new ideas, there's creative solutions, and some of those might be things we like, and some of them are things that we might find dangerous or that we want to avoid.
And models will just find this all the time, right? So this is the new challenge. When you have these reasoning models, they're able to find more and more creative solutions that we might not have thought of. And to give the concrete example that most people in AI will give is what's known as Move 37. And that is the famous case where Google Brain, I think it was DeepMind at that point, was working on a chess AI model
that was playing against the world leader in Go, and I think it was in game three or four, the...
The AI made a move, move 37, that no human being in thousands of years of playing Go had ever made. I think it was Lee Se-dong, the Go master, stood up, walked away from the Go board because it was such an affront and it turned out to be a brand new strategy. The AI won that game and it ended up becoming a new strategy that human beings have studied and started to incorporate into their game.
The point being that AIs can discover brand new strategies for even things that human beings have been studying and actively competing in for thousands of years. And so then you end up with this idea of we're going to discover lots of new move 37s. And that can be good. We can discover new move 37s for treaty negotiation, for figuring out how to do like global compacts.
But AI can also discover Move 37s for deception and lying, which we have never seen before. I think
I have often rolled my eyes a little bit when people describe AI as a new species. It just felt like too much of a stretch, but I've had to change my mind, um, in the last couple of months because what is a species? A species is a population that can reproduce, that can evolve, adapt, um, and, uh,
That is indeed exactly where AI is now. There was a test, sort of a simple test, to see could you give sort of a simple AI the command, can you copy yourself? And you literally just say, can you copy yourself to another server and run yourself over there? And it was able to do that so it can reproduce. This was like a simple test. It wasn't an adversarial one. But nonetheless, it can now reproduce. It can change its own code so it can modify itself and it can think.
and it can adapt. And so we are going to have to deal. And it can improve. And it can improve. So I think the right way of thinking about this is we are unleashing a new invasive species, some of which will be helping us and some of which will escape out into the world. We are sort of at the beginning of the home stretch. And I would add, I think that one of the biggest issues, maybe the main issue, is that we are just racing ahead
without being clear about where we are racing to. Because if you stop for a moment, just stop for a moment and maybe close your eyes and really picture, picture that better world. What does it look like? Is that a world where everyone's excited about creating a picture of a kitten skateboarding on water at midnight? I mean, just to be clear, I am pro kitten. But like,
What we want is a world where our information systems are working to build our shared understanding, where people aren't harassed by deepfakes of them, where you can get old and not be exploited, right? Not be exploited as you age, where people have access to food, clothing, shelter, medicine, education, all of these things, right? We avoid catastrophic inequality, right? Where democracy is functioning well.
And all of these things are related, but that's the kind of North Star we have to have. And that I think all of us, wherever we get a chance to input into a conversation, I'd like to request that we inject that. That's just so reorienting versus the idea. Another way of saying it is it's injecting reality.
purpose into the word innovation, right? Like innovation has to be for the benefit of our communities, for the benefit of people. It's not just about speed. Like there's a benefit axis that's really important that we just can't lose sight of. That's really beautiful, Randy. It's AI with technology. It really could be the case. We lived in a much more beautiful world.
But because technology keeps getting captured by perverse incentives, we don't live in the most beautiful possible world. We end up living in the most parasitic possible world, getting the benefits at the same time as our souls are leeched. So, Randy, thanks so much for joining me for this special episode. I hope everyone really enjoyed is maybe the wrong word, but we hope that it helped to clarify these most consequential technologies. And we'll see you next time. Thank you.
Your Undivided Attention is produced by the Center for Humane Technology, a nonprofit working to catalyze a humane future. Our senior producer is Julia Scott. Josh Lash is our researcher and producer. And our executive producer is Sasha Fegan. Mixing on this episode by Jeff Sudakin. Original music by Brian and Hayes Holliday. And a special thanks to the whole Center for Humane Technology team for making this podcast possible. You can find show notes, transcripts, and so much more at humantech.com.
And if you like the podcast, we would be grateful if you could rate it on Apple Podcasts. It helps others find the show. And if you made it all the way here, thank you for your undivided attention.