We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks)

2025/6/24

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Dan Hendrycks

Daniel Kokotajlo

Gary Marcus

一位批评当前人工智能研究方向的认知科学家和名誉教授。

Topics

Daniel Kokotajlo: 我认为美国可以通过阻止中国自动化AI研发来削弱其发展超级智能的能力。自动化AI研发可能导致情报爆炸，使其他国家无法赶超。AI实验室领导者认为，即使存在风险，他们也应该尽快发展AI并赢得竞赛，因为他们更信任自己。OpenAI成立的初衷是制衡DeepMind，防止Demis滥用AGI权力。 Gary Marcus: 我们都希望人工智能对人类有益，并为此共同努力。我们都认为AGI有积极的可能性，并希望引导其朝着积极的方向发展。如果停留在LLM上，就不会有持久的优势。如果范式保持不变，我不认为任何人会获得持久的优势。自动化AI研究过程是具有破坏性的、可怕的和危险的。即使AGI或ASI需要一段时间才能实现，人们已经在推动我们的红线，且透明度正在倒退。 Dan Hendrycks: 原始AGI遵循指令的能力“相当合理”，但这仍然令人担忧，特别是在涉及武器控制等高风险领域。我主要考虑阻止超级智能，因为这更容易实施，在地缘政治上与现有激励措施相容。如果朝着AGI发展，这可能是一个坏主意，因为我们将无法阻止超级智能。人们应该有自主权在AI增加GDP所能提供的资源下选择不同的生活方式。美国更有动力对前沿技术保持透明，因为中国已经了解情况。对前沿技术的高度透明对于建立更可信的威慑非常有用。

Deep Dive

Chapters

The conversation begins by discussing the dangers of automated AI R&D, particularly the "rationalization" that if one country doesn't develop superintelligence, another will. The risks of recursive self-improvement and the concentration of power in the hands of a few are highlighted.

Automated AI R&D could lead to an intelligence explosion and a durable advantage for one nation.
AI lab leaders prioritize speed over safety, driven by a lack of trust in each other.
The potential for AI to be used for malicious purposes, such as guiding weapons, is a major concern.

Shownotes Transcript

Translations:

中文

Take one. All right, here we are saving humanity. Take one. Having, say for instance, the United States disrupt China's ability to develop a super intelligence, the main way in which they would develop it is if they get the ability to automate AI research and development fully and take the human out of the loop.

Then you go from human speed to machine speed. We've had people such as Dario, for instance, at SheoBanthropic talk about how such a recursive process like that could lead to an intelligence explosion and that would lead to a durable edge where nobody will be able to catch up. And last week, Sam Altman discussed how this process could telescope a decade's worth of AI development in a year or potentially a month.

There's been this very seductive argument that has appealed to all of these people, which is basically, well, it's probably going to happen anyway. If we don't do it, someone else will. Basically, all of these people sort of trust themselves more than they trust everyone else and have therefore convinced themselves that even though these risks are real, the best way to deal with them is for them to go as fast as possible and win the race.

Why did they make OpenAI? Well, they were worried that they didn't trust Demis to handle all that power responsibly when he was in charge of the AI project and all the fate of the world rested in his hands. So they wanted to create OpenAI to be this countervailing force that could do it right and distribute it to everybody and not concentrate power so much in Demis's hands. And in fact, the emails that came up in the lawsuit, they were talking about how they were worried that Demis would become dictator using AGI.

On the proto-ASI things, I think that they follow the instructions fairly reasonably. There are some other parts of it. Yeah, but that scares the shit out of me, right? Fairly reasonably? You know, when these things are still in our hands, kind of, that's okay. But if you have them guiding weapons or something like that, there are many circumstances where fairly reasonably is not good enough.

This episode of MLST is sponsored by Tufa AI Labs. It is a research lab which is headquartered in Zurich. They're moving to San Francisco as well. These guys are number one of the ARC V2 leaderboard. They're genuinely fascinated in building the next generation of technology, the next innovation, which will take large language models to the next stage. If that sounds like you, please get in touch with Benjamin Crousier. Go to tufalabs.ai.

This podcast is supported by Google. Hey, everyone. David here, one of the product leads for Google Gemini. If you dream it and describe it, VO3 and Gemini can help you bring it to life as a video. Now with incredible sound effects, background noise, and even dialogue. Try it with a Google AI Pro plan or get the highest access with the Ultra plan. Sign up at Gemini.Google to get started and show us what you create.

All right, so I'm Gary Marcus. I'm a cognitive scientist, an entrepreneur. I've written six books, most recently Taming Silicon Valley. I think that, and here we are in the heart of Silicon Valley as we record this in San Francisco. And I think we all want what is best for humanity and hope that AI can be positive for humanity and not negative. I think the three of us are maybe not known for agreeing on everything, but we actually have a lot of shared values around that. And I'm looking forward to the conversation.

Thanks, Gary. My name is Daniel Coccatello. I'm the executive director of the AI Futures Project. We are the people who made AI 2027, which is a comprehensive, detailed scenario forecast of the future of AI. I'm Dan. I'm the director of the Center for AI Safety. I also advise ScaleAI and XAI.

I also have done research in machine learning, like I coined the GELU and the CELU, a common activation function. More recently, I've done evaluations to evaluate capabilities such as MMLU and the math benchmark and humanity's last exam. And I've been focusing on trying to measure aspects of intelligence and also AI systems as safety properties for mostly the course of my whole career.

The common things that we care about are what is going to be the outcome when AGI comes? How soon is it going to come? How do we forecast these things methodologically? One person wrote in a question that maybe we could actually start, which is, like, what is the positive view? Like, I take it that

All three of us in the room are united in thinking we think AGI could be a good thing. None of us is sort of like standing down like people would in front of a construction machine saying, stop it all, period. I think we all think there's at least a possibility of a positive outcome, but you can stop me if I'm wrong, and that we're hoping to steer towards that positive outcome. So our first question is,

What could be the upside here? And why aren't you just saying let's just forget about AI altogether? Yeah, so first of all, I actually think it's quite reasonable to basically stand in front of the bulldozer and say stop. I think that eventually we want to build AI because it is possible to get it right and it is possible to make it in such a way that's beneficial to everyone and massively beneficial to everyone in fact. But

If we are currently not on a track to getting it right and we're currently on a track to make it in a way that's going to be horrible, then it makes sense to just sort of stop until we figure out a better way. But yeah, as for what the benefits could look like.

Well, AI 2027 has the slowdown ending in which things go really well for almost everybody. And-- Since not everybody read-- I think more people probably read the darker scenario than the positive scenario, lay out for people who might not have read the positive scenario a little bit about what the spirit of that is. Sure. So in the slowdown ending of AI 2027,

manage to devote just barely enough effort and time and resources towards the technical alignment problem that they solve it just barely in time so that they can still beat China as they have wanted to do. By they, I mean the leader of the leading AI company and the president and maybe some other companies, that group of people. So the slowdown ending is not our recommendation for what we should actually aim for. We think it's an...

To aim for this sort of scenario would be to expose the world to incredible amounts of risk of various types. So we don't think it's a recommendation, but nevertheless, it's a coherent, plausible scenario for how things could sort of muddle through and end up pretty well for everybody. And pretty well is, I mean, like Peter Diamandis abundance scenario, for example. Is that what you have in mind? I haven't actually read that, but I would say...

When you get the AIs that are super intelligent, so what I mean by that is better than the best humans at everything and also faster and cheaper,

you can just completely transform the economy. You know, super intelligent designed robot factories constructed in record speed, producing all these amazing robots and all these amazing new industrial equipment, which then are used to construct new types of factories and new types of laboratories to do all sorts of experiments, to build all sorts of new technologies and blah, blah, blah, blah, blah, blah, blah. Eventually you get, and by eventually, I mean, in only a few years, you get to a completely automated robot

wonderful economy of all sorts of new technologies that have been iterated on and designed by super intelligences. Material needs are basically just met for everybody. There's just an incredible abundance of wealth to distribute. And things like curing all sorts of diseases, putting up new settlements on Mars and so forth, all that stuff becomes possible.

So that's the sort of like potential upside. And then, of course, there's the question of can we actually achieve that, right? And if we do have the technology to achieve that, who's in control of the technology? And do they actually use their power over the army of super intelligences to make that sort of broadly distributed good future for everybody? Or do they do something that's more dystopian? So I think we...

Well, first of all, we'll come back to the bulldozers and whether you think we're actually at that point or not. But I think we can agree probably, and we'll let you chime in too, that there is a technical alignment problem, and we should talk about that. And a political, if it's not alignment problem, maybe that's too close pairing of words, but a political issue about if this technology exists, who controls it? That's deeply important. And so we should talk about both sides of that sort of.

political control and technical alignment. Let me get your answer, though, on that first question. Do you think we need to stand in front of a bulldozer now? And if you don't, what do you think is the positive?

Yeah, so I think there are two concepts. There's AGI and there's ASI, artificial general intelligence and artificial superintelligence. I think AGI doesn't have a clear definition, so it's difficult for me to say for all definitions of that that would be worth standing in front of. Some people would say we have AGI already. Let's just start with what we have now. There could be an argument that right now we should just stop the train. Sorry, I've moved from bulldozers to trains. There'd be an argument that

I'm not making this argument, but some people have made it. The argument would be we're already so far along this path, and if we don't resist it massively right now, it's going to be bad, and there isn't a good outcome here, a good enough outcome to offset what seems inevitable. I'm not making that argument, but I've heard people make it. And so partly grounding exercise, where are we on that? So one argument would be,

Fuck it, we've already screwed up. We need to have every person on the planet get in front of this train or bulldozer and say, just stop it because there isn't a positive outcome to warrant it. Another position would be, there's actually a really positive outcome here. And if we can guide it in the right way, we can get to that positive outcome. And here's why I think it's positive. That's the kind of set of issues I wanted to speak to first.

So I primarily think about stopping superintelligence. The main reason for that is because that is much more

implementable, geopolitically compatible with existing incentives and so on. So it's something that I can actually foresee. I think things stopping tomorrow is not something I can as easily foresee, so I don't really think about that. The reason that super- But is that just a fatalism then? I mean, like if you're worried about superintelligence,

And you might think if we move even towards AGI, which presumably is closer in time, that that's a bad idea because we won't be able to stop the superintelligence. Like maybe we don't want to risk going further. I mean, I couldn't conceive of how we would coordinate now to do that. I think the asks are instead.

having, say for instance, the United States disrupt China's ability to develop a super intelligence. The main way in which they would develop it is if they get the ability to automate AI research and development fully and take the human out of the loop. Then you go from human speed to machine speed. And we've had people such as Dario, for instance, the CEO of Anthropic talk about how that will give the USA

or such a recursive process like that could lead to an intelligence explosion and that would lead to a durable edge where nobody will be able to catch up. And last week, Sam Altman discussed how this process could telescope a decade's worth of AI development in a year or potentially a month.

And I think that's extraordinarily destabilizing for two reasons. One, if they control it, then-- or if a state controls it, such as, say, China, then all the other countries are at substantial risk because that superintelligence could be weaponized and used to crush other countries. And if they don't control it, which I think would be fairly likely by a nearly unsupervised, extremely fast-moving process,

then everybody's survival is also threatened. So either way, this very fast automated AI R&D loop is quite destabilizing whether a state controls it or not. And so I think it makes sense, not just because AI is scarier in some vague sense, but I think that there are very strong geopolitical incentives for a state's self-preservation to prevent that.

I want to come back to the geopolitical aspect of it. And I read or I skimmed the paper that you just did with Alexander and Eric. But I don't think I got an answer for the first part, which is the upside. I don't think I quite needed to lay out the upside. Yeah, yeah, yeah. So for upside, yeah.

I think people generally think that if we have AI, it will necessarily hollow out all other, all values or that we'll have an extreme pressure toward one of them. That was, we'll be zoo animals or we will, or we'll be all pleasured out and just, you know, in our VR experiences constantly and have, have superficial experiences. I,

I think you could set a society up so that people have the autonomy to choose between different ways that they would want to live their life, given the resources that AI increasing GDP could provide. So I think that there is a way you could have a list of objective goods being met in society. So I don't think there's...

no equilibria where things can work out. I think there's a way of achieving a variety of goods and still preserving human autonomy and all that. Like sort of what we have now. Imagine we had things now, but then we have the autonomy to choose how we want to live our lives now. And there'd be more resources. Sorry.

Some of us in the West have that. Yes, that's right. That's right. That's right. And there's other sorts of longer-term questions about how you might do that, like how are you going to distribute power? Maybe that would be with compute slices that people would rent out, for instance, where they would have the unique cryptographic key to activating that compute slice so that you're not just distributing wealth, but you're also distributing ways of generating that wealth.

There's a lot in this deep future that one could pull out, but I think there are some paths where things work out well. I like the term equilibria. I think we probably all agree that there's multiple equilibria here and that we're all trying to steer towards balance.

positive equilibria. And I think we would also all agree that there's lots of different ways to get to both the positive and the negative. We might disagree about some of the paths, some of the likelihoods, and so forth. But I think we're all three operating under that general assumption that there's multiple ways this could go. And some of them are good, and some of them are bad. I'm a little darker than I used to be even a couple years ago because of the

economic inequality sorts of issues. So a couple of years ago when I trusted Sam a lot more than I trust him now,

I heard him talking about universal basic income. And I've always thought that that was an important part of the equation. And it seems to me that the more that data has driven things, the more acquisitive the companies-- and it's not just OpenAI-- have been, the less realistic it has seemed to me that we're going to have anything like a universal basic income. I think there is some scenario under which there's so much wealth that everybody's subsistence gets met.

I don't expect that the wealthy people are going to give up the beachfront property under any circumstance or their power. And I think the dynamics that we've seen in Washington lately have made me, though I know our politics may not be the same, have made me less optimistic about any kind of

relatively equal distribution or even just not a relatively extreme distribution. I think that the positive outcomes do depend on getting some answers there to what would be the incentives, the mechanisms, the dynamics such that things are well distributed. And I guess I'm sort of making this up as I go, but I think there's an argument here for stopping the train if we can't see any solution to the political thing. Most of my

own personal concerns are really about technical alignment, which we'll talk about soon enough. But if we don't have an even envisionable political solution, that does make me nervous. And I could see an argument for let's stop the train now because we just don't see how to do it. And going back to something you said at the very beginning, there's an interesting argument about delaying the train versus stopping the train. And I think all of us signed the pause letter maybe? I didn't. You didn't sign the pause letter.

Okay, only I signed the pause letter. That's interesting. I didn't call that. That's fine. I don't know whether I did or not.

The part of the pause letter that made me sign it that I liked the best was it said, let's pause this particular thing that we know is problematic in certain ways. I mean, it was really about delaying the development of GPT-5, which ironically still doesn't exist two and a half years after some of us signed the pause letter. But the notion was we would pause the development of GPT-5 because we knew that GPT-4 had certain kinds of problems.

around alignment, and that we would spend that time instead working on safety. So it was explicitly constructed as a delaying tactic. It wasn't saying never build AI. It wasn't saying don't do any more AI research. It was in fact saying do more AI safety research and wait. And that does still seem like

maybe not a politically viable thing in this moment, but at least a strategy one might consider is, you know, maybe we should be constantly updating our estimates on, you know, how likely are you to get the positive outcomes versus negative outcomes? And how much would that change as a function, for example, of putting more resources into safety research as opposed to capabilities research, et cetera? And what's your take on what I just said? Totally agree. Like, I think, yeah, like I...

I like the pause AI framing more than the stop AI framing for the reasons you just mentioned. And then

There's a lot more to say about more nuanced proposals for what is to be done. I do think we should maybe save a lot of this for the later part of our discussion after we've gotten through the technical stuff, like timelines, alignment, et cetera. But you're the moderator. I'm going to use moderator as a prerogative for now, but you can push me a little bit later if we don't get there. I'm trying to find some sort of common ground. I think we have lots of common ground. Yeah.

And then we'll get to the differences. Do you buy that notion that also this should be on the table of some kind of cause or no? I think making time for technical research, I'm not expecting much return on investment from that. I don't think most success strategies in the foreseeable future particularly go through

AI is being fully controllable. So I do technical research. I'll give you a comeback on that. Like an extreme version of the argument I just placed comes from the cognitive psychologist and evolutionary psychologist Jeffrey Miller, who had a tweet that I really liked, which was, we should wait until we can build this stuff safely, even if that takes, I forget what he said, I'll say 250 years.

It was an interesting framing because everybody's thinking, what should we do next week or should we sign this pause letter? It was deliberately extreme. I think it was maybe even 500 years. If it takes us 500 years, we should just wait. I could see an argument for that. In fact, I wonder what the counter arguments are. I think that the process that I described earlier, that recursive loop, I guess you could call it intelligence recursion, which if it goes fast enough is an intelligence explosion.

That is not something you can research your way out of. You can't just write an eight-page paper and then we've solved it. In the news tomorrow, we've figured out how to fully de-risk some extremely fast-moving process that we've never done before, and all of its unknown unknowns have been anticipated.

Well, I mean, we could politically, I mean, it would take a lot of willpower and it's probably not likely, but we could try to have a global treaty. Don't go there. Don't work on these kinds of things. Let's report it if you do. I mean, you could at least imagine that kind of scenario. I don't see any technical way of coping necessarily with that set of problems right now. But if we were

I mean, not just the three of us in the room, but if we as a society were convinced that, let's say, that was a red line. You suggested one red line. There are others. So we could decide as a society there are certain red lines. Maybe recursive self-improvement might be a reasonable one to consider. We could say we won't cross any of the red lines. We will make treaties around it. And we just shouldn't do it until we have answers to them.

So I think it would be worth states clarifying that we don't want anybody doing that type of fully automated intelligence recursion.

because it would be destabilizing. And now that doesn't necessarily immediately take the form of a treaty. So initially you have them exchanging words about that and articulating their preferences potentially explicitly or potentially in internal policy at CIA and things like that. And the other parties come to learn it

through leaks or directly. And you eventually want to gain more confidence that nobody's trying to trigger something like this. And this process is multiple stages. It may involve things even like skirmish. It requires a conversation. It may even require a skirmish.

for people to think, okay, we need to do a verification now. And then maybe you get some type of treaty. But you can still have various forms of coordination through deterrence without anything like a treaty, just as we've had strategic stability on multiple issues without treaties necessarily. So that's something that...

is potentially a later stage thing. But you have to have the conversation advance far further. Daniel's giving me a look on deterrents. No, no, I'm just looking back and forth. You're just looking back and forth. Why am I in this tennis match? I have to rub your neck because I'm right in the middle. Yeah, in the middle. OK. So let's separate for a moment the deterrent stuff, although I know you're keen to talk about it and know more about it than I do. Let's try to get to it.

You could sort of separate what are the dynamics for which we would form treaties. Maybe we need some disinterrence and skirmishes, as you just talked about. But from the question of, like, would the rational thing for civilization to do right now be, in fact, to sign these treaties? Because we're basically pretty close. I maybe take a longer timeline than you, but we're reasonably close to recursive self-improvements of at least some sort.

And so maybe we don't want to go there. Maybe that's what the rational thing to do would be right now to make those treaties. Or even if we thought that was 25 years away, because we know treaties take 8, 10 years, maybe we should be putting all our intellectual capital or political capital or whatever into doing that right now. What do you think? Yeah, so I think three red lines would be no recursion, where that's fully automated, not just some AI-assisted one, but one that has that explosive potential.

No AI agents with expert level virology skills or cyber offensive skills made accessible without some safeguards.

And model weights need to have, model weights pass some capability level, need to have some good information security for containing them and making sure that they're not exfiltrated or stolen by rogue actors. And Minimum asks you how close we are on these various things. But what's your take on what he just said?

I don't know about the other two, but definitely the first one. I think that if somehow we could coordinate on that, that'd be great. I'm not sure if that's the best red line to coordinate on, but it's an excellent place to start at least. Do you want to throw in any others now or you can later in the conversation? I think that the type of thing that I'm probably going to end up advocating for is going to be more of a like, rather than like, here's a line that we're all not going to cross, something more like,

we are going to gradually develop AIs with these capabilities, but we're going to do it in a way that's mutually transparent to each other and that proceeds slowly and cautiously where we all debate whether it's safe to go to the next level. And then after we get there, we study it for a little bit and then debate whether it's safe to go to the next level and so forth. So the sort of thing that we're probably going to end up advocating for is going to look something more like that rather than...

But yeah, in terms of the thing that you really need to stop from happening in the short term, that sort of recursive self-improvement thing is, I would say, the number one thing. So this conversation is a little bit depressing in the sense that many of the things that we seem to be worried about actually seem fairly close. Maybe the person in the room who's most pessimistic, if that's the word-- the most extended in time on this is me.

But all of us, I would say, think that-- well, let me rephrase this question. It's a dark side of answers relative to the reality right now, I think, in the following sense. Even if you think it's going to take a while to get to AGI or ASI or something like that-- and we'll talk about that in a little bit-- the things that are our red lines-- and I like your red lines--

People are already pushing against them. They may not be breaking through them like depending on your definition of recursive self-improvement I might give you different estimates mine might be a little longer than yours, but people are already trying to do that right and transparency is like

1990s talk. It's in the rear view mirror. OpenAI was open originally. It is not open anymore. There are elements still pushing for transparency, but there are certainly elements pushing against. I'm somewhat more optimistic about increased transparency and more situational awareness from governments. The

A reason for that is I think it's incentive compatible for the U.S. to be more transparent about what's going on at the frontier because China already knows. Meanwhile, for PLA or People's Liberation Army developments, there's somewhat less transparency there. So the people to gain more from information would probably be the rest of the world. So China has somewhat more of an advantage there and this sort of levels things.

And that having high transparency into the frontier is very useful for making more credible deterrence. Because then-- Yeah. Let's interrupt for a second about a time frame question.

The notion of a durable advantage in LLMs, if that's what the technology is, is a myth. Like if we stay on LLMs, nothing's going to be durable. But I think you think about these things in a little bit different way than I do. I think with the current paradigm, I am expecting them to continue to leapfrog each other. So roughly parity is what I'm seeing. Yeah.

Right? So there could be a different technology. My personal favorite would be neuro-symbolic AI that somebody gets a durable advantage of maybe, but it's going to be espionage. People are going to share ideas or whatever. But if the paradigm stays roughly like it is, I don't see anybody getting a durable advantage. So that's part one is do you agree, disagree with that? I think it depends on what you mean by paradigm. So I would say like

LLMs are just a subset of this overall AI research. They're not the only possible AI design. Definitely not. And progress is going to continue being made. I would say, in some sense, arguably, we're already seeing a move away from LLMs with things like these so-called reasoning agents that have access to tools and can write code and so forth. So there's going to be this continuous shift towards...

I would say rather than like discrete paradigm shifts, it'll be more like a continuous paradigm shift. And the AIs of 2027 are just going to be like quite different from the AIs of 2023, you know. But taken as a whole, the AIs, I do think that the United States could potentially end up having a durable advantage over China. And one particular tech company within the United States could end up having a durable advantage. I see this as...

I see that as unlikely unless somebody approaches the problem in a pretty different way. Like, you can imagine a neurosymbolic approach. It would be just so different from what anybody else has that I could see, at least for a while, an advantage. In the current way of people are doing things, I don't even know what that would look like. Perhaps I should clarify. By durable advantage, I don't mean...

they never figure out what you already figured out. I mean that by the time they catch up, you've already moved ahead. So that you, it's like, you know, you keep running as fast as they're running so that even though they're only a six months behind, they can never quite catch up because by the time they do, you're six months ahead again. And that was part of the other part of that question I wanted to get to, which is like, the six months matter. Is that enough to change the world or not?

Right now, it doesn't seem huge, but if you are the first to trigger a recursion, that can matter a lot. And that goes back to the other question, which is on the recursion thing, the fact is people are trying it as far as I understand it. It's the plan.

It is it seems it's the plan. Yeah, it's still highly ai assisted and doesn't there's nobody could try and spin this off spin this up right now and have that be foreseeably going to to lead to something uh substantial but uh, yeah, I mean what I see right now is like

You could do your hyperparameter search faster or something like that. And you could call that recursion if you want, but that's not really what we're talking about here. We're talking about a system finds a fundamentally new idea that heretofore would only come from humans. And that leverages something else. And it's doing the whole thing of not only coming up with the ideas, but testing them, validating them, working out the tweaks, implementing them, scaling them. I think that's exactly what people like Sam and Dario are now doing.

hinting that they're going to be able to do soon. I'm not really buying those claims, but I think that's what they want to do now. 100%. Yep. And that is destabilizing. And scary and dangerous. Let's say potentially destabilizing. I mean, if it's just hype and they can't really do it, it's not destabilizing. But if one of them achieves it. If it were technically feasible, that would be destabilizing, yeah.

And I mean, I think we can agree it probably is technically achievable. It's just a question of whether you can do it with current technology or later. Or far in the future. There's no surly an argument that this can't be done that you need. I mean, I think we all think that's going to happen.

And so one advantage for transparency then is if there were very high visibility as to what's going on at the frontier, if they're basically triggering this sort of process and the public is kept reasonably informed as it's happening, I think the world would be very much freaking out. So I think that possibly sunlight might be...

a way of providing substantial pressure to prevent that. Freaking out might be too strong, but it's worth mentioning that polls of the American people-- not necessarily in Asia, but American people are pretty worried. It's maybe not at the top of their set of worries, which might be economic and so forth. But you look at these polls, and 75% of the American public is already worried.

And I expect that that worry will increase, not decrease. At least I don't see any active thing that is going to decrease it in time. The thing that might decrease it is propaganda from the tech companies, I think. The tech companies seem to like scaring people as far as I can tell. Oh, I disagree. I think that, I mean, there's probably some of that there to some extent for sure. But at least my experience at OpenAI was that

there was more pressure to like not talk about the risks in public and to like, you know, sort of downplay that sort of thing. Then there's no pressure to play it up, at least not as far as I could tell. And, you know, if you look at the public messaging by Sam over the years, he certainly started to like talk about it less and less and less.

Well, when he-- I mean, partly because I pushed him. But when we talked at the Senate, he didn't answer the question, what were you most worried about? I mean, I made Blumenthal come back to him. Blumenthal says it's jobs. And Sam gives his explanation why he's not that worried about jobs.

Right. And I say to Blumenthal, you should really ask him a question. And Sam there said, my biggest worry is that we do-- I can't remember the exact words-- but substantial harm to humanity. And when he was back at the Senate a few weeks ago, his biggest harm was like, in so many words, like we don't make all the money and extract all the value that we could or something like that. Right. So-- Right. So I mean, I think it's true that he used to talk more about this kind of risk stuff and talks less now.

So, I mean, that's interesting data point. Around the time that I did my Senate appearance, which was May 2023, so Reconstruction Timeline, a little bit after that was a second letter related to the pause letter, which I did not sign. What was that, Bob? That was the one about being, and maybe you wrote, I don't know. Was that the Tice letter? The Extinction letter. Yeah, yeah. I signed that one. You made it happen. That's what we did. So I didn't sign that one.

And we should soon get to extinction risk, but we'll come back to it in a minute. Sam did, I think, sign that, right? So at that time, that was maybe June 23 or July 23 or something like that. It was after the Senate appearance, I know that. At that time, it was still popular among CEOs to express concern about this. And maybe it's true that they do a little bit less. Dario still alludes to this, right? Yeah.

So a lot of

people that I have spoken to think that it's a deliberate mechanism to try to hype the product, to talk about the risk. Like, look how great our stuff is. It might kill us. Yeah, so I have lots of... A lot of scientists signed that. I know, I know. I didn't, but many did. Yeah, I understand. So first of all, yeah, loads of people have these views who aren't conflicted and trying to hype up the companies. I agree about that too. But then to your point about the hype, my take on what's been happening is that

DeepMind, OpenAI, and Anthropic have been full of people who were thinking about superintelligence from the very beginning at the highest levels of leadership. And therefore, they have been considering, at least somewhat, both the loss of control risk and the concentration of power risk, you know, who gets the control of the AIs from the beginning. And this is documented in all sorts of ways. You can look at their old writings and so forth, some of the leaked emails, and so on.

And then you might ask, well, why are they building it? If they've been thinking about these risks, can't they see? Loss of control. Loss of control, concentration of power. So loss of control is what if we don't solve the alignment problem in time and the AIs take over? And then concentration of power is if we do solve the alignment problem, like who

who controls the AIs, what goals do we put in them, do we risk becoming a dictatorship or some sort of crazy oligarchy or whatever. So you can find writings from people at these companies, both senior researchers and also the CEOs and so forth, going back decades talking about these things. And then you might ask, well, why are they doing this? Why are they racing to build these things if they were... I mean, Dario seems like the most extreme version of that question. He seems still very much...

publicly saying that there's very serious risk and he seems very much publicly pushing the models forward. I understand though, I didn't see it myself, that at some point he was saying we won't build frontier models because of these risks and now he's obviously building frontier models. And so the key here is I guess I would say in a single word it would be rationalization. So

wisely solve all the safety issues and then also beneficently, you know, give UBI or whatever to make sure that everything works out well. So basically, all of these people sort of trust themselves more than they trust everyone else and have therefore convinced themselves that,

even though these risks are real, the best way to deal with them is for them to go as fast as possible and win the race. And this is DeepMind's plan, so to speak, Demis Hassabis, his plan was basically be there, get there first with this big corporation, Google,

And then because you have such a lead over everybody else, you can sort of slow down and get all the safety stuff right and sort of make sure that everything goes well before everybody else catches up. That seems naive. That plan was torpedoed when, you know, Elon and Sam and Ilya made open...

OpenAI. Why did they make OpenAI? Well, they were worried that they didn't trust Demis to handle all that power responsibly when he was in charge of the AI project, you know, the only, you know, when all the fate of the world rested in his hands. So they wanted to create OpenAI to be this countervailing force that could do it right and make it, you know, distributed to everybody and not concentrate power so much in Demis's hands. And in fact, they were, the leaked emails or the emails that came up in the lawsuit, I

They were talking about how they were worried that DEMIS would become dictator using AGI. And well, we can see how well that's worked out. All the Anthropic people basically split off from OpenAI because they didn't think OpenAI was going to handle the safety stuff responsibly. So then they're claiming that they have the technical talent and they'll be able to sort out the alignment issues better than everyone else. It's a mess.

Do you have anything to add? I agree that it's a mess. Not to that it's a mess. And I mean, I guess, I don't know, you may not want to answer this on camera, but do you feel confident that we should trust any of these particular partners? We definitely should not.

I think pushing for things like transparency as well as the government having people whose job it is to be keeping track of these, coming up with contingency plans, interviewing these labs for their plans and coming up with an internal assessment so their probability of success, all of that seems useful. And I think at least some of the players here would probably be willing to push for that type of stuff, and I think others wouldn't.

So I think it's, are they willing to help solve these collective action problems or are they going to continue defecting? It's been a lot of defection. To get back to your question, I realize I never actually answered why this ties into your question. So

Early on, when these companies were fresh and young and idealistic, their founding mythology was basically, yes, the risks are real. And that's why you should come work at our company, because we're the good guys. And so when that was still fresh and still like, you know, the main thing they were saying, they were talking about it a lot. But then now when they're sort of like founding myth is sort of like kind of laughable and it's like very much not.

Something they can say with a straight face. It's eroded some. It's not something they can really say with a straight face so much anymore. And also they're under lots of political pressure to get investors and to ward off regulation and stuff like that. So the rationalization wheels are continuing to turn and they're coming up with new narratives to justify what they're doing. Yeah.

And there's also a lot more players at the table, both within the US industry, which is most of what we were talking about, but also in China. Then we have the whole, I don't know if we have time to go into it, but maybe we will, maybe we won't, the mechanisms of open sourcing or at least open weight models, which means that essentially anybody can get in this game to some degree. I disagree with that to some extent. Oh, go ahead. I mean, to some degree, sure. Walk me through the disagreement. Well, hypothetically, suppose that we...

Suppose that we achieved AGI, someone achieved AGI, and immediately opened the weights for everybody.

It's not going to happen like that, but suppose it did happen like that. Then for a brief glorious moment, anybody with enough GPUs could be able to run their own AGI right at the frontier of capability. However, AGI isn't like, you know, there's going to be AGI plus and AGI plus plus and so forth. And whoever's going to get to AGI plus is going to be the one who had the most GPUs so they can run the AGI to do the research fastest, you know? So even if you...

gave everybody exactly the same starting point at the same level of capability, the people who had more GPUs would pull ahead slowly but surely over everybody else. So I think that there's this unfortunate-- I'm not sure around that, but go ahead. So there's this unfortunate sort of inherent-- I don't know if I want to say winner take all effect, but there's this turn to scale sort of thing inherent in the dynamics of an intelligence explosion. And that's part of what makes this so scary.

All the actors are incented to push it as fast as possible and to have the maximal resources in order to do so. And they're also incentivized not to open it up and to not be transparent about it and so forth, unfortunately. Yeah, I think we have a little disagreement there about transparency and how much we might expect. I think you're a little bit more optimistic. And I'm...

with Daniel and being a little bit less optimistic about transparency. I'm especially less optimistic about transparency as we get closer to having actual AGI or let's say differentiated AI. So right now I would say that what's there is not very differentiated. Everybody's kind of using LLMs with reasoning models and so forth. There's some differentiation in how people do their RL and what their data sets are. But I think there's...

not so much value in transparency until somebody has something that really is unique that they think other people aren't going to just reconstruct very rapidly. And as one gets to that point of having some unique piece of intellectual property, which I think will happen, there's even more reason to be less transparent.

So I think I'd be optimistic about being transparent about the numbers of the best models internally, not necessarily the methods that were used to create them or the weights for them, but at least the public knowing what's going on

What's the peak capabilities that the models are exhibiting? And then being aware of that. Are you being optimistic that they'll do this by default or that this would be good? Obviously, it'd be good. I think it's good and that there's more tractability for this than a lot of other ESCs.

I agree. More tractability, like people might agree to do it. Yeah, yeah, yeah. I agree with that. That's why I've been making these asks. But I'm just saying that they're not going to do it by default. Like if nobody asked them to be transparent about these things-- I agree. So I mean, we can agree that voluntary self-regulation is probably not enough to get to that point. Yeah, yeah. I wouldn't bank on it.

Yeah. All right, let's talk a little bit about time and forecasting and scenarios. We've talked a bunch about strategy, what we should do going forward, what policy should be. Some of that depends on timelines. So if we thought that we had 1,000 years before AGI, then we might make different choices than if we thought we had six months.

If we thought that LLMs were the answer to AGI, we might make one set of choices. If we thought they were definitely not the answer to AGI, we might make a different set of choices, maybe focusing more on research. So let's talk about our forecasts around that.

And also, maybe I'll throw in there our forecast around when we might sort out the safety problems, the alignment problems. I will make the argument that we've made some progress towards AGI and very little towards alignment. And we can see if we agree on that. So I want to start by using a notion that many in the audience will know but not all, which is a distribution of probability mass.

which is to say that you could make a simple prediction. You could say, I think AGI will be here in 2027 or 2039 or whatever. But I think we all understand that that's the unsophisticated thing to do. You can certainly say your best guess for which particular year you think it might come. But as...

people who are either scientists or know something about science, we know there's what we call confidence interval around that. So it might be this plus or minus that. The most sophisticated thing to do is to actually draw out a curve and say, I think some of the probability mass will come before 2027, and some of it will come before 2037, and some of it will come after.

I guess it's sort of an appendix to your AI 2027. You go through this in a fair amount of detail. I'm not sure how many people got to the appendix. I think it was maybe a little hidden to find it, but it was there. And it did a good job of that. And it gave the forecast for four different people in a sort of qualitative way. I don't know if it showed the full curves, but it said, like, this is the chance that they think it will come before 2027.

I think for three of the four forecasters, or something you remember better than I, some of the probability mass was like after 2040 or something like that. So maybe I'll start with you, because I think you have maybe put the most work into trying to get detailed probability mass distributions. And maybe you can talk about what your own are and different techniques people have used and where you are on that. Thank you for that, Excellence.

to this. Feel free to stop me if I ramble too long because it's a huge topic, lots to talk about. Okay, so first the exciting part, the actual numbers. When we were writing AI 2027, we had our different medians or 50% marks for AGI, or I think we were, we divided up, we didn't use the word AGI, we used different milestones. So superhuman coder, full automation of AI research, superintelligence,

But for those things, let's say for superintelligence, I was thinking 50% chance by the end of 2027.

And that's why AI 2027 depicts it happening at the end of 2027, because that was sort of illustrating my median projection. Now, the other people at AI Futures Project tended to be somewhat more optimistic than me. They tended to think it would take longer to get to super intelligence. Let's clarify. Optimistic. Is that how you flip, depending on how you think it's going to be? Sure, yeah. So the other-- They thought it would take longer, like 2029, 2031, something like that. Let's say more conservative. Few more years, yeah. That's right.

But, you know, I was the boss, so we went with my timelines. But happily, by the time we actually published it, I had lengthened my timeline somewhat. So these days, I would say 50% by end of 2028.

But yeah. It doesn't give me that much comfort. I mean, I think you're wrong, but if we have an extra 12 months, I'm not sure that's enough to handle all of it. It's still quite scary. But in terms of what the shape of our distributions look like, they tend to have a sort of hump in the next five years and then like a long tail. Why is that? And the reason for that is because...

Well, the pace of AI progress has been quite fast over the last 15 years. And we understand something about the reasons for why it's been so fast. And basically, the reason is scale. So they've been scaling up compute. They've been scaling up data. They've been drawing in ever more researchers into the field, especially compute is probably the most important input that they're scaling up. And that's sort of turbocharged progress. But they simply won't, they, the companies, simply won't be able to scale things up at the same pace

after a couple of years. Is that a function of power or is it a function of like electrical power? What is the function? What's the rate limiting step by which you think that kind of scaling won't continue? It won't be like a sharp cutoff, but it'll be a couple of things. So partly it'll be, you know, power supplies. Partly it'll just be, um,

compute production, like much of the world's, even after building new fabs and even after converting much of the world's chip production into AI chips, they'll have to like produce 10 times more fabs in order to scale up by 10 times, right? Whereas previously they could just take

chips designed for gaming and you repurpose them for AI. So in a bunch of little ways that are going to add up, there's going to be all these frictions that's going to start to bite that will make it harder for them to continue the crazy exponential rate of scale. I think we're already seeing that with data.

I don't know the actual numbers. But let's say that GPT-2 used maybe 10% of the internet or something like that, or 5% or something like that. Maybe you guys know the actual numbers. And GPT-3 used a significantly larger fraction. GPT-4 used most of the internet, including transcriptions of videos and stuff like that. And so you can't just keep 100x-ing that because there just isn't enough data. There's new data generated every day.

you can always eke out a little more and people are turning to augmented data as well they should, but that's not a kind of universal solvent. It works better for things like math where you can verify that the augmented data are better. And so I think we're already running against that kind of like bottleneck on one of the, let's say, raw resources that go in at least into the current approaches. And another thing I would add is that

You can also just think about money, which is which can used to buy many of these things. And you can say, well, they've been scaling up the amount of money that they're spending on research and on training runs in particular over the last decades. But they it'll be hard for them to continue scaling at the same pace. You know, they're probably already doing something like billion dollar training runs, but they're

The biggest training run in 2020 was like, what, like $3 million, something like that, $5 million. So they've gone up by like two and a half orders of magnitude in five years. If it's another two and a half orders of magnitude, we're doing a $500 billion training run in 2030. Like, just

There starts to be not enough. Yeah, there's just not enough money in the world. You know, like the tech companies just won't be able to afford it. Even if they've grown bigger than they are today, the economy just won't be able to afford it. And so that's why we predict that like...

If you don't get to some sort of radical transformation, if you don't get to some sort of crazy AI-powered automation of the economy by the end of this decade, then there's going to be a bit of an AI winter. There's going to be, at the very least, a sort of tapering off of the pace of progress. And then that sort of stretches out a lot of probability mass into the future because that's a very different world. It's like, you know, it could take forever. Well, not forever. It could take a quite long time to get to AGI once you're in that regime.

I'm going to ask you one or two more questions. I'm going to insert mine, and then I'm going to come to Dan. Although, if you want to-- I mean, one note on data was, yeah, we sort of ran into that bottleneck, I think, maybe two or so years ago. And the main things that have been continuing the pace would be this thinking mode type of stuff outside of the trends that were existing previously. So I think it's sort of picking up slack in some ways for the fact that most of the internet's been trained on.

The methodology by which you came up with these curves, can you just tell us a little bit about them? Yeah. So again, these curves represent our subjective judgment, which is very uncertain. It's just our opinions, you know. But the way that

I guess the way I would like to say it should be done is rather than just sort of pulling a number out of your ass, so to speak, you should come up with models and look at trends and then, you know, have little calculations that attempt to give numbers. And then you should stare at all of that.

And then pull a number out of your ass based on all that stuff that you've just looked at, you know? And so that's what we did. And the main arguments and pieces of evidence that we found moving to inform our overall estimates were what we would call the benchmark and plus gaps argument. Perhaps I should also go into the like,

compute-based forecast. Have you heard of the BioAnchors framework by Ajay Akotra? I don't think I have. Okay, well, we can... I'll briefly mention that before getting into the benchmarks plus gaps thing. So the BioAnchors framework...

It's called BioAnchors because it references the human brain, and I think that part's actually the less exciting and plausible part of it. The part that I think is more robust and more worth using is this core idea that you can think of this trade-off between more time to come up with new ideas and do AI research and more compute with which to do the AI research. And you can sort of think,

You can make a big two-dimensional plot and you can imagine, okay, 10 more years, 20 more years, 30 more years. How does the probability that we get to AGI go up with more time? But you can also imagine...

not more time, but just more compute. Could we get to AGI today if we had, you know, five orders of magnitude more compute, 10 orders of magnitude more compute, 30 orders of magnitude more compute, right? And the insight there is that the answer is, yeah, probably. Like, for example, if you had 10 to the 45 floating point operations, you could do a training run that's basically just simulating the entire planet Earth and all life evolving on it for a billion years.

years, you know, with that amount of compute. And the thought there is that you don't really need to understand how intelligence works at all if you're building it with that type of training run, because there's no insight coming from you. You're just sort of letting nature do its thing and letting...

evolution take its course. And so the thought is that we can make a, not guaranteed, but like a soft upper bound at something like 10 to the 45. And then you can make other sort of soft upper bounds. You can think, well, what could we do with 10 to the 36? And you can lay, I wrote a blog post about this in 2021. You know, suppose we had 10 to the 36 flop. What are some like really huge types of training runs we could do? And then like, what's our guess as to how likely that is to work? And what you can do is you can sort of

you can start to smear out your probability mass over this dimension of compute. And so you sort of have a soft upper bound. And then you have, of course, a lower bound, which is the amount that we already have done. We clearly haven't done it right now with this amount of compute.

And so that gives you this smeared probability distribution over compute. And then you think, OK, but now we're also going to get new ideas. And so as new ideas come along, we're going to be able to come up with more efficient methods that allow us to train it with less compute. So you can think of your probability distribution as shifting downwards while also the amount of actual compute increases. And then that gets you your actual distribution over years. And I think this is the right sort of basic framework for calculating these sorts of timelines.

But it's a sort of relatively abstract, like low information framework that doesn't really look at the details of the technology today and the details of the benchmarks. So I think it's like a good way to get your prior, so to speak. But then you should update based on actual trends on the benchmarks and so forth, which is what I'm about to get to. But the reason why I mentioned this prior process is that

10 to the 45 floating point operations isn't actually that far away from where we are right now. Right now we're at like, what, like 10 to the 26 or something for training runs. And we're going to be crossing a few orders of magnitude in the next couple of years. And so even if you just had like a, even if you just smeared out your probability mass with maximum uncertainty across the like,

orders of magnitude from where we are now to 10 to the 45 there'd be like you know a non-negligible amount that it's going to happen in the next few years and so like even on priors you should think it like decently plausible that it could happen by the end of the decade and then you should update your prior based on the actual evidence which i'll now get to so the actual evidence i would say let's look at agentic coding benchmarks that seems to me to be um

the most informative thing to look at. And the reason for that is because I don't think that the fastest way to get to superintelligence is in a single leap where humans come up with the new paradigms in their own brains. I think rather it's going to be this more gradual process where humans automate more of the AI research process, and then that gets us to the new paradigms fast. And so I think that the lowest hanging fruit as far as the AI research process is concerned, that's going to automate first is the coding.

So I'm looking to see when will we get to the point where the coding is basically all handled by LLM-like AI assistance. And we have benchmarks for that, sort of. Places like Meter, M-E-T-E-R, have been building these little coding environments, doing all these coding tasks. The companies themselves have been doing this, of course, because they are racing as fast as they can to get to this automated coder milestone. And...

So we look at those and we extrapolate trends on them and we forecast that, well, in the next couple of years, they're basically going to saturate. You know, we're going to have AIs that can just crush all of these coding tasks. And they're relatively, you know, they're not something to scoff at. They're not just multiple choice questions. They're like the sort of tasks that would take a human like four hours to do or eight hours to do.

But that's not the same thing as completely automated coding. So first we extrapolate to when they saturate the benchmarks, and then we try to make our guess as to what the gap is between the first system that can completely saturate these benchmarks and the first system that can actually automate the coding. And that's probably the more speculative part, but we do our best to reason about it. Now we're ready for our first full-on disagreement of the day.

But I understand your logic there. I think it's well thought through, but it's missing the cognitive science for me. And my approach to this is more from the cognitive science. I see a set of problems that a cognitive creature must solve, many of which I wrote in my 2001 book, The Algebraic Mind.

And I don't feel like we've solved any of those problems, despite the quantitative progress that we've made. And those include generalizing outside the distribution, which I think still remains a huge problem. I think the Apple paper was—there are actually two Apple papers I discovered today. But the Apple paper with the Tower of Hanoi stuff, I think, is an example of—

problems with distribution shift. I think we've seen many of them over the years. We see that these systems have trouble doing multiplication with large numbers unless they call on tools and so forth. I think there's lots of evidence for that. I think that there's a problem of distinguishing types and tokens that leads to bleed through when you're representing multiple individuals from some category that leads to hallucinations. So I wrote an essay recently about the hallucinations that

I think it was ChatGPT made about my friend Harry Shearer, who's a pretty well-known actor. And it misnamed the roles of characters that he played in the movie Spinal Tap and said that he was British when he's American and so forth. And I think this blurring together that we see of hallucinations remains a problem. And I could go on with a list of others. I think there are several having to do with reasoning, planning, et cetera.

the way I look at things, which is not to say that there isn't some value in what you're doing, is more on these cognitive tasks. And so what I say to myself is,

What would AI look like two years before we achieved AGI or ASI or something like that? Certainly two years before we achieved ASI, we would have full solutions to all of those things. If we specified an algorithm for something, we would expect the system to be able to follow it. Current systems can't even play chess reliably according to the rules. So, you know, O3 will not... Sorry, O3 will...

sometimes make illegal moves. It can't avoid illegal moves. Another thing I would expect is that current systems, when we're close, would be basically the equivalent of their domain-specific counterparts, or at least be close. AGI means artificial general intelligence.

And I would say the reality is that domain-specific systems are actually much better than the general ones right now. The only general ones we have are LLM-based. But for example, AlphaFold is a very carefully engineered hybrid neurosymbolic system that far outperforms what you could get from a pure chatbot or something like that.

Somebody just showed that an Atari 2600 beat, I think it was, 03 in chess. So even sometimes very old systems will beat the modern domain general ones. On Tower of Hanoi, Herb Simon solved it in 1957 with a classical technique that generalizes to arbitrary length, whereas the LLMs do not generalize to arbitrary length and face problems. And so I could go through more, but the gist of it is,

I don't see the qualitative problems that I think need to be solved.

And I'll just go to your 10:45, 10 to the 45th, because it's really interesting to me. I wrote a piece once with Christoph Koch. I don't know if you guys know him, the neuroscientist. We wrote a science fiction essay. It's the only published science fiction essay I've ever written. And it was in a book that I wrote called The Future of the Brain, which I guess we wrote in 2015. I was the editor. And we wrote the epilogue to it in, I'll call it 2015. And so it was said in something like 2055.

or I think it was 2045. And the notion was-- this is a book about neuroscience-- that by that point, we would actually have created an entire simulation of the brain, but it would run slower than the human brain and still

not have taught us anything about how the brain really works. So we'd have this simulation, but we wouldn't understand the principles of it. We'd have a neuron by neuron simulation, maybe even a protein by protein simulation. But we could find ourselves in a place where we'd replicated the whole thing without really understanding where it worked.

And I do wonder with the 10 to 45, even if you'd sort of trained on everything, would you have solved the distribution shift? And would you have abstracted principles that allow you to run efficiently and effectively and usefully in new domains and so forth? I think it's a really interesting question. I had never thought of the 10 to 45, even though I read at least some of your paper. I admit I didn't get that level of detail. Well, this part wasn't in ANA 2027. So it wasn't in that appendix.

I admit I didn't read the whole appendix. I read some of it. I think it's a really fascinating thought experiment.

So maybe I've said enough. So just to lay it out, there's one way to extrapolate on the basis of things like compute. And I think you've done a masterful job of doing that as well as can be done and acknowledging that there's still an element of pulling things out of ones behind, which is true on any account. Nobody can really do this in a closed form way. And then I have a different slice on it, which is like, where are the qualitative things that I want to have solved?

I think Yann LeCun, who I often disagree about many things with, actually would be closer to mine. He would probably give a different set of litmus tests that he's looking for. We would both emphasize world models. We have slightly different ideas about world models. But he would say, I don't think we're close because we don't really have world models. And neither of us, I think, are satisfied with the current thing that some people call reasoning, but neither of us think is robust enough. And so...

I think he and I both take an architectural approach or a cognitive approach. Great. So I'll list a bunch of bullet points of things we could discuss, and then hopefully we can get through them each. And if we miss some, well, at least I put them out. So in reverse order, 10 to the 45 scenario where you just sort of brute force evolved intelligence. Indeed, you would not understand how it works at all. But nevertheless, you would have it. And you could sort of take those evolved creatures out of their simulated environment

and then start plugging them into chat products and stuff and using them in your economy. And they would be smart. They built their own civilization in there. So they're pretty smart and they're pretty good at generalizing and so forth. So you wouldn't understand how it works, but you'd still nevertheless have the AI system. And indeed, that's kind of what I think is happening today for us. We don't understand.

how these AI systems work very well. We're sort of just throwing giant blobs of compute at giant data sets and training environments and then playing around with them afterwards and seeing what they're good at. That's part of my darkness about it is I think there's too much alchemy and not enough principles.

100%. And this is part of why the alignment issue feels so looming to me, is that we don't even know what we're doing. How are we supposed to craft a mind that has the right virtues and the right principles and so forth when we don't even like... Right, anyhow. So there's that. Next thing. So you mentioned...

tower of Hanoi, math problems, these sort of ways in which the current AI system seem limited in particular, limited and fragile and hallucinations in ways that you think that we're not really making progress on. So there I would say, well, you know, I also can't solve the tower of Hanoi and I also can't do large math problems in my head. I need cools. I need to be able to like program a little bit or like you do it. I'm going to come in on that briefly. One is that, you know,

relatively young children can actually do Tower of Hanoi really well if they care about it. Even really big ones?

Even pretty big ones. There's a video, I think, of a kid doing seven disks lightning fast, like in two minutes on YouTube. A kid who enjoys it. He's probably 12 or 15 or whatever. I'm sure he can do eight disks if he wants to because it's a recursive algorithm. I'm sure he's learned it. And so I think some humans can do a problem like that. Some humans, if they want to, can do arithmetic. We humans do get into memory limitations. But you shouldn't expect that AGI should.

There is actually like a...

We should pause for a moment on definition of AGI, right? It has had varied definitions. The one that I always imagine is it should be at least as good as humans in a bunch of things and better in certain ones. So I would not be satisfied with an AGI that can't do arithmetic. I'd be like, yes, okay, it's equivalent to people in this respect, but I would actually expect more, and especially if we're talking about the risks that we're facing

I think we're really talking about a form of AGI that probably can do all the short-term memory things that people can't and has the versatility and flexibility of humans. You can argue about that. And I would say that the weakest AGI would be it's as good as Joe Sixpack, who's really not very good at reasoning, is full of confirmation bias, has not gone to graduate school, doesn't have critical reasoning. You'd say, OK, that's AGI because it does what Joe Sixpack does.

I think most of the safety arguments are really around AGI that's at least as smart as like, you know, most smart people. Smart people can, in fact, do things like tower of Hanoi. And certainly, you know, another example I gave you was chess, right? Six-year-olds can learn to follow the rules of chess. O3 will do things like have a queen jump over a knight, which is not possible. And so, you know, the failures there are quite striking and not something you need even an expert to do.

Yeah. So I guess I would say it feels to me like we're making progress on these things, or at least that these barriers are not going to be barriers for long. I think, for example, that maybe the LM-- That's assuming your conclusion, that last little piece. Well, let me tell you more. Let me tell you why I think this. So I think that, yeah, maybe the transformer by itself might have trouble doing a lot of this. But you should think about the transformer plus the system of tools and plugins that you can build around it, like Cloud plus Code Interpreter and things like that.

And I think that that system could solve tower of annoyance and things like that because Cloud might be smart enough to look up the algorithm and then implement the algorithm and then do it. And so actually, I'm curious for your immediate reaction to that point.

I mean, look, I've always advocated neurosymbolic AI. And I think that that is actually a species of neurosymbolic AI. I don't think it's the right one. But the point of the neurosymbolic AI, the arguments that I made going back to 2001 was that you need symbols in the system to do abstract operations over variables. There were several other arguments. But that was the core argument. And what you're doing when you have Claude call an interpreter is you are doing operations over variables in the Python that it creates. So you're

you're moving it to a different part of the system. So the pure neural networks don't have operations over variables and fail on all of these things. Symbolic systems can do them fine. And here you're using the neural network to create the symbolic system that you need in order to solve that particular problem. So it's absolutely a neuro-symbolic solution. I think that the rate-limiting step is that they don't always

call the right code that they need. If you could make that solid enough, that would be great. Okay. You could think, I'll just say a couple more sentences there. One of the first attempts at this strategy was to put an LLM into Will from Alpha, which is a totally symbolic system, right? Not Will from Alpha, into Mathematica. The results were

Hit or miss, right? And the problem was on the interface getting to the tools. If you can really reliably get to the tools, then you have a neurosymbolic system that works. If you can have the neural networks reliably call the tools that they want, or that they should be calling relative to the problem, I should say, then you're golden. I think empirically it is hard to get the tools to work reliably.

So that's helpful because I think that sort of collapses a lot of your barriers into one barrier, which is reliability. Because perhaps you would agree that if we can get them to... We should come back to the world models piece of it. But yeah, go ahead. Go run with that.

Well, yeah. So there I would say it does seem to me like the AIs have been getting more reliable over the last couple of years. And one piece of evidence I would point to is the horizon length graph from Meter, which you've probably seen me talk about. I wrote a whole paper about where I hid it. Oh, okay. Interesting. Yeah. In my sub stack. So the way I would interpret this graph, and I'm sure many of our audience has already seen this, but they have this

you know they they have this suite of agentic coding tasks uh that are sort of organized from uh from shortest to longest in terms of like how long it takes as human beings to complete the tasks and then they note that you know the ais of 2023 could generally speaking do the tasks for

from here to here, but not do the tasks above this length. But then each year, like the crossover point has been lengthening. And a very natural interpretation of what's going on here is that the AIs are getting more reliable. You know, if they have a chance of getting into some sort of catastrophic error at any given point, then, you know, if it's like a 1% chance per second, then after like 50 seconds, they're going to

get into an error. But then if that goes down by an order of magnitude, then they can go for more seconds and so forth. And so the thought here, I would say this is evidence that they are just getting, generally speaking, more reliable, better at not only not making mistakes, but recovering from the mistakes they make. Not infinitely better. They're still less reliable than humans, but there's substantial progress being made year over year. I see that your argument you're making...

With the particular graph, I think there's a lot of problems. And some are actually relevant to this argument. One is it was all relative to coding tasks. It wasn't tasks in general. The coding tasks, I'm very concerned from a scientific perspective that we don't know how much data contamination there is. We don't know how much augmentation is done relative to those benchmarks and so forth. Which leads me to my next point about it, which is

I'm blanking on the guy's name-- from 80,000 hours posted something on Twitter yesterday, which was a parallel to that graph. I think it was also by meter looking at agents. And so the y-axis for the coding thing was sort of minutes to hours to days or whatever. And for agents, it was like seconds to tens of seconds or something like that. So it was a totally different y-axis. What type of agents? With the coding thing, it was like coding agents.

This was some-- maybe like web browsing agents or something like that. And so the argument was that you're getting the same kind of curve, but the scaling was completely different on the y-axis. So there's something there for both of us, right? So what's there for you is a general--

You know curve of reliability over time What's there for me is the performance is still you know pretty poor on those agent things more generally I don't like the axis at all. I think that it's very arbitrary In my critique we give a bunch of examples, but like how long does it take this task and also? It's all to the point of 50% performance. It's just a very weird

And you can find many things that a human can actually do in three seconds that, according to the graph, if you look at the graph, should have been done by 2023 model. Here's one just off the top of my head, which is choose a legal move on a chessboard. This takes a grandmaster, I don't know, 100 milliseconds or something like that. But O3 still can't do that task with 100% reliability. It can do maybe 80% reliability or something like that. So it's just--

It's weird to read anything off of this graph. I'll send you the Substack essay on it. Yeah, thanks. One other thing I was going to mention, which is a new topic sort of, is that I think in a lot of these cases,

My defense of the AI is that, well, they never were trained on this task. I mean, that's the essence to me, right? I understand. Is that they fail on things that they're not trained on. I mean, that's an oversimplification. But if we're talking about AI risk and... I understand that the really dangerous AIs of the future have to be able to do stuff without having trained on it.

At least I would say to the same extent that humans can do things without having training. So to take your grandmaster example, grandmasters can do that task in less than three seconds with greater than 80% reliability. But also, they've trained on it a lot. They've done lots of that. Whereas O3, we don't know the details of O3's training process, but it's possible that there was not a single playing of a game of chess at all in the training process for O3. Yeah, so I don't think that's very plausible. Yeah.

They may have seen transcripts of games of chess. They've seen transcripts. They've seen the explicit rules in Wiki. I know they've been trained on Wikipedia. They've probably read like Bobby Fischer played chess, which is how I learned to play chess. So they have books on chess and so on. They've read everything related to chess. They've read everything in what is the famous phrase, publicly and privately available sources or whatever.

That wasn't quite the famous phrase, but you get the idea. So I think they probably actually have a lot of data on chess. And just as a side note on transparency, from the scientific perspective, it's very hard to really evaluate these systems because we don't know what's in the training set. I agree. The core question, going back to my work in 1998, is how do they generalize beyond what they've been trained on? And we just don't have transparency on that. So then-- so first of all, I totally agree about the transparency thing. Back to the training thing.

I want to talk about Cloud Place Pokemon.

And I want to, again, talk about the chess thing. So I wasn't aware of this 80% statistic for 03, but I'm curious what the statistic is. I don't know the exact number. I mean, that's a but fact. So I haven't heard about this. I had a friend look into this, and he's very easily able to do it. I can give you some other examples. Let me just put one other example on the table, similar in flavor, which is I played Grok in tic-tac-toe, and I came up with some variation. And then I might write this up this weekend. And so I'm--

asked some variation. We played it, and then it offered another variation, which was let's only play tic-tac-toe where you can-- let's play tic-tac-toe where you can only win if your three in a row is on the edges. And I said fine. We had some moves back and forth. And then it suggested some moves for me at a point where I could, in fact, make a three in a row. So it had failed to develop the right strategy.

I hadn't played this variation either before. But it was obvious what to do. So he failed to identify what three in a row was, which I would say goes to a conceptual weakness. Like you should have had enough data to understand what the three in a row was. And so it suggested other moves didn't recognize it. It's a pretty bad failure. And then we played again after I corrected it. You know how you get these sycophantic answers? I'm sorry.

So we played again and it lost to me again. And then a third time. I posted all of this on Twitter a couple weeks ago. So if you move these things off the typical problem, they're even worse. If they had a robust understanding of a domain like tic-tac-toe or chess, it wouldn't be too hard. I posted another on Twitter.

which was like, I had some variation on chess. Like, give me a board state where there are three queens on white side, but black can immediately win. And performance was not great. So like, you take knowledge that for an expert would be out of what they've specifically practiced, but they understand the domain well enough, and experts will have no trouble at all. And these systems still do.

So I totally agree with those limitations about current systems. But the thing that I would say is that it seems like the trends are going up. So I would predict that if you measured this sort of thing for the last five years, even though the current systems might still have weaknesses, they would be less weak than the systems of two years ago, which are less weak-- Well, chess is an example. And then I'm going to go to you. I was going to say the last thing, which is I first noted the problem with chess, I think, two years ago.

The illegal move probably really I think Matthew acre was first Or Archer was first first appointed out and I spread it on Twitter and said look This is serious problem and it persists there are a lot of problems that I feel like have persisted, but you're done You've been fine. Yeah, yeah, one thing is on benchmark based forecasting I think that has limitations in particular streetlight effect where

where, well, when it gets 100%, that will suggest that it has the task or that suggests it's solved the task. The issue is that they often have some structural defects that are only obvious later in the, or when you go fairly far out into the curve.

So, for instance, in video understanding, you think, wow, if you looked at the benchmarks from a few years ago, they're totally at the top of them. But then you can come up a year later with several other sorts of benchmarks that challenge them. So I think that there tends to be a gravitation toward what's very tractable for AI systems and where some interesting action is happening. That's a selection pressure on the sort of benchmarks.

If one's looking at cognitive tasks, such as those that you would give kids if you're testing their intelligence, for instance, if you randomly sample many of those, the models don't do that well. Maybe on, it'd be a double-digit percentage, maybe be almost on half of them, they don't do that well. For instance, count the number of faces in this photograph. O3 can't do that very well.

or connect the dots or fill in the colors in this picture. Just an example for visual ability. So there's...

I don't think when Gary's pointing these out, it's just that he's just running the cherry picking program and what he's going to do is he's just going to keep cherry picking and doing God of the gaps thing for AI until it basically is AGI. I do think that there is a non-adversarial distribution, a difference from the cognitive science angle. You would see many of these sorts of issues.

And I think that there's potentially some speaking past each other in part because he's not viewing intelligence as

some unidimensional thing entirely, a lot might be correlated together, but there are various other mental faculties that are important that aren't really online that much. We saw with the GPT series that by pre-training on a lot of text,

we got some sub-components of intelligence. We got reading-writing ability, and we got a lot of crystallized intelligence or acquired knowledge from that. And that process took several years. So it's not the case that often people will sort of point out, once it gets traction on it, then it will solve it immediately or solve it very shortly thereafter. For a lot of these core cognitive abilities, they took multiple years. Mathematical reasoning would be a recent example with Minerva. Google's Minerva system got 50%.

percent on the math benchmarks, a benchmark I made some while ago, in 2022. And I think we've only recently crushed it in 2025. So it took a good three years. And I think the reading, writing ability got to an interesting state and now relatively complete. Four years later, I think crystallized intelligence as well is now relatively complete. However, there's still a variety of others. I wouldn't expect visual processing ability through

for video to be taken care of in a year or two. Maybe audio ability would be taken care of in a year or two, though. We don't see almost any progress on long-term memory, basically. I think that they can't really that meaningfully maintain a state across long periods of times over complex interactions. And so once we start to get traction on that, then we can start to forecast that out. The reason this is...

And for fluid intelligence, which is what we see in the Arc AGI thing and Raven's progressive matrices test, that still seems very deficient as well. So I think when thinking about these, it's important to split up the...

one's notion of cognition because if there is a severe limitation on any of these dimensions, then the models will be fairly defective economically or at least for many tasks. For instance, if a person doesn't have good long-term memory, it will be very difficult to enculturate them and teach them how to be productive in a working environment and load all that context.

If they have very slow reaction time, that's also a problem. Or if they have low fluid intelligence, that will limit their ability to generalize substantially. So I think the numbers are going up substantially or continually on many of these axes for these benchmarks.

But at the same time, when those benchmarks are at their conclusion, there might be another peak for those, and there's many of these. And for it to be an AGI, you're going to need all of them as well. And for some of them, they're fairly early on in their capabilities, such as with long-term memory. So I think if one's doing forecasting, I think understanding intelligence somewhat more and having a...

in more sophisticated accounts such as one would find in cognitive science can help foresee bottlenecks that will actually be fairly action relevant.

I mean, I completely agree with you. And I'll just mention physical intelligence and visual intelligence. You mentioned visual. We left out physical. I skipped physical. Physical and spatial intelligence are really very serious limits right now. So if you ask O3 to label a diagram, for example, it would be quite poor at it. If you ask it to reason about an environment, it's going to be poor at it.

So, I mean, I agree with, like, you made it sound like I think intelligence is a single dimension. I don't. I don't know where you got that impression. I agree that there's all these limitations. Well, you said reliability in an all things considered type of sense. I think that your forecasts don't make sense.

make a lot of contact with the kind of stuff that the two of us just talked about. This became two against one in a way that I entirely did not-- Bring it on. I'm usually a bridging type person. But anyway-- I understand that this is why we call it the benchmarks plus gaps, is that we understand that just because-- You didn't actually explain that phrase. Can you just explain benchmarks plus gaps? So yeah, the benchmarks is you take all the benchmarks that you like, and you extrapolate them, and you try to see when they saturate.

And then the gaps is thinking about all the stuff that you just mentioned and thinking about how just because you've knocked down these benchmarks doesn't mean that you've already reached AGI. There's all this other stuff that the benchmarks might not be measuring. You have to sort of try to reason about that. Another way of putting my argument is I identified a set of gaps in 2001.

rightly or wrongly, but I identified them. I think that they were real in 2001. I mean, those were applying to multilayer perceptions, not to transformers, which hadn't been invented yet. But I still see exactly the same gaps. And I see some quantitative improvement, but no principled solution to any of the gaps. So there were three core chapters in that book. One of them was about operations over variables. One was about structured representations. And one was about types and tokens or kinds and individuals.

And I just don't see that the things that I described then have been solved. I see certain cases where they can be solved, but all of the problems that I see seem to be reflexes of those same core problems. And so if I seem like a grumpy old man, it's 'cause for 20 some years, and really I pointed these things out in 1998, 27 years, I have not really seen them. And so the notion that we're gonna solve those gaps all in three years seems weird. Like I thought your estimates

of solving a few of the things you just mentioned were generous. You say, we solve reading and writing in three years. Video is much more uncertain. Let me finish the sentence. But the kinds of numbers that you gave were like three or four for this one, three or four for that one.

And I think like really, you know, my estimate is really 10 years is most of the distribution is past 10 years. It's because I see several problems that like we'd be really lucky if we tell, let's say, the visual intelligence problems in three or four years.

We'd be really lucky if we solved the video kind of visual intelligence over time, let's call it, problems in three or four years. I just see too many problems, too many gaps, to use your phrase, to think we'd get all of those at once. Like in the movie, everything everywhere all at once. To me, to get to 2027 would be everything everywhere all at once for a set of things that I've been worried about for 25 years.

Excellent. First of all, yeah, fun point. It sounded like you were disagreeing with me, but then you were saying like, yeah, three years, three to four years. That's what I'm saying. In terms of the overall, basically, there's maybe a caricature, which maybe isn't exactly you, but the caricature would be things will keep scaling. That will automatically solve the problems. That's what's been happening. You point out an issue. It's the word automatically that I bristle at.

Well, I'm just interested in the bottom line numbers for the years. Like we had this potential dispute about whether it's just LLMs or whether it's neuro-symbolic. I don't want to fight you about that. Like we can say it's neuro-symbolic if you like. We can say it's not automatic but rather involves some new ideas. But the point is I think it's going to happen in like three or four years. Yeah. Okay, great. So I just think new ideas –

are hard to project. I'll give you my favorite example. In the early 20th century, everybody thought genes were made of proteins. Somebody even won a Nobel Prize on that premise. They were all wrong. And it took a while for people to figure out that this is wrong. And Oswald Avery did these experiments in the 1940s, which really ruled it out.

and discovered by process of elimination that it was DNA. And it didn't take that long to move very fast in molecular biology once people got rid of the bad assumption. I think we're making some bad assumptions. It's hard to predict when people will move past and need a fix, which is where I think we are now. I mean, proving that is very hard. I can give a lot of qualitative evidence to point in that direction. But there's at least a possibility that I'm right about

that. If we don't have the right set of ideas, it depends on when we come up with new ideas. It could be that we automate the discovery of new ideas, but most of what LLMs have done is not discovery of new ideas. It's really exploiting existing ideas. So I would claim that

The timelines, there are many paths to it being shorter before 2030, for instance, but the picture that you point out, if there are these various other cognitive abilities that are long unaddressed, and this isn't a cherry-picked distribution, this is actually...

what you would do if you're inspired by cognitive science or psychometrics. I think basically both positions are legitimate. But I think if you'd integrate them, there could be a bullish case that you'll be able to knock off some of those core cognitive abilities by the end of the decade. It's not totally certain. It's funny you say it. I think that the absolute most bullish, if we want to use that word, case is

is 2030. Like in the very fastest case, it'd be 2030. We'd require solving multiple problems that I think we're just not positioned quite yet.

I don't think it's very likely, but I think that that's the maximum possible. I just can't get my mind around 2027. It just does not seem plausible because of the number of problems. And I think also what you said is right. We want to integrate these two approaches to make the forecast. And nobody's really quite done that. Maybe that's an adversarial slash collaborative collaboration that we could do. I haven't really seen--

something dig into that side of it, of sort of new discoveries. Maybe your GAPS does it a little bit. So let's talk about the Superhuman Coder Milestone and let's talk about the Superhuman Coder Milestone, the automated encoding aspects, which is not all of AI R&D, but just part of it. I think I'm excited to try to drill down into that and for us to try to make bets or predictions of what the next few years are going to look like, basically. So from my perspective, it feels like

You have often over the last couple of years been pointing to limitations of LLMs that were then like overcome in the next year or two. Which? I mean, there are specific examples. Yes. Right. So like, for example, any particular set. Well, chess hasn't been solved. So let me clarify what I mean by specific examples and I'll come back to you, which is I give, you know.

on Twitter, here's a sentence where-- here's a prompt and it gets a weird answer. Those are always remedied. I was told-- I don't know if it's true-- that people in OpenAI actually look on social media, for example, including mine, and people patch them up.

There's specific examples in that sense, like literally this prompt gets this weird answer. A lot of them have been solved. These more abstract things like playing chess actually have not been solved. And the more abstract problem of can I give you a game with variations on the rules? I talked about that in rebooting AI in 2019 in the first chapter. That hasn't been solved. Are you then saying that like for all you know six months from now,

the AIs will be able to solve the legal move in chess thing? Or are you saying... I don't think LLMs will solve it. I mean, I think someone could come up with a scheme to get LLMs to play chess without making illegal moves by calling a tool. I think that's possible. Like...

Without calling a tool, I don't think it will be possible in the next six months. I'd be very surprised. And you can ridicule me in six months if it happens. And that's fine. But also, let me add a twist to it, which is what we talked about in rebooting AI was, like, for example, if you train a particular system on, I think we use Go as an example, but if you train it on a 15 by 15,

What does it go? Is it-- sorry, 21 by 21? If you train it on the ordinary 21 by 21 board, will it be able to play on a rectangle instead of a square? Different variations. I had a conversation with a guy at DeepMind who did chess in a really interesting way without using Monte Carlo tree simulation, which is what the alpha 0s and so forth do. And he trained it on an 8 by 8 board. And I said, if you just had to do 7 by 7, would it work? And he said, no.

So he did a version where there was no tree search, so it's not neurosymbolic. The tree search is one of the similar processes, or at least it's much less neurosymbolic. And astonishingly, it played pretty well, trained on billion games or something like that. But even then, it was sterile knowledge in the sense that it was purpose built for this, but it wasn't general knowledge of chess. If you asked it to put three queens on a board and--

but still three white queens, but make sure that black wins on the next move. He wouldn't be able to do that at all.

There's a flexibility to human expert knowledge that we emphasized in Rebooting AI that I see no evidence of progress, or maybe a little evidence of progress, but relatively little progress on that. My tic-tac-toe example is just like that. So can you give me more examples of things that a year from now I won't be able to get an off-the-shelf model to do without giving it tools? I think this...

I'll put it this way. There'll be many, many examples I can come up with distribution shift, all of what I was just doing. We'll still find failures on next year. I mean-- You'll be able to come up with new examples, but you can't come up with any example now that will still be true a year from now. If I make it in a slightly more generic fashion, then yes. I am sure that I will be able to come up with variations on chess.

that are not orthodox variations, but maybe there's some already known. Chess actually has lots of variations, or chess problems and things like that, that these systems won't be able to do. I'll probably be able to come up with variations on tic-tac-toe. So five by five board, but you can only win on the edges. There will be tons of problems like that. They're all kind of outlier-ish, but I think that a year from now, these systems will still be very vulnerable to those outliers.

I think that you'll probably still be able to come up with some things like this a year from now, but it'll be harder. And then it'll be even harder the next year and so forth. It should be monotonic, of course. I mean, here's another way to put it, is the AGI that I think we're afraid might be unconstrained or whatever is...

There really shouldn't be a lot of edge cases like that. Yeah. Right? That's why I focus on the superhuman code or milestone. Again, I don't think we just leap straight into this. Do you know about the LiveCode Pro benchmark that just came out? Not much. Tell me about it. So I don't know what the human data on it, but I know that the machine data on it was 0%. And what was interesting about the LiveCode Pro benchmark, LiveCode Bench Pro, excuse me, was that they were all brand new problems.

And so they ruled out data contamination. And I've seen two careful studies ruling out data contamination. The other was with the US AMO, US Math Olympiad problems six hours after testing. Performance was terrible on them. I think if you rule out data contamination, the performance on

problems that are new is not good. And for programming, especially programming AGI, that's super relevant. If you're using these things to code up a website, that's not new. There's so many examples. But if you're using it to do something that's new, you get out of distribution, there's still a lot of problems. What about frontier math? Wasn't that also-- There's some contamination issues there. OpenAI had access to the problem.

They said they didn't train on it, though. I know they did. But what augmentation did they do that's relevant to those problems, right? OpenAI is just not, I mean, you can agree with me on this, it's not an entirely forthcoming place about what they did. Sure, but doesn't like Claude and Gemini now have like okay scores on frontier math? I haven't actually checked.

But like-- I think they have-- OK, but everybody is teaching to the test right now. And they're all doing augmentation. We don't know what the augmentation is. The point is, if you have AI, it's not just about teaching to the test anymore. And you need to be able to solve things that are original. I mean, we're lucky if AI never can do that, because then it gives an Achilles heel.

I guess you're pointing at that superhuman coding may still be around the corner if we extrapolate out the benchmarks. I mean, I also just-- for the record, I disagree with you guys. I think that progress is being made in this sort of hard to measure dimension that you're pointing to. It's hard to measure. The best ways we have to measure it are for you to come up with examples.

Let me give you a parallel and then we'll come back. But I do think progress is being made. Let me give you a parallel and then we'll come back to-- Yeah, I mean-- Sorry, let me just insert very quickly.

The parallel is to driverless cars. In driverless cars, we know there's been progress made every year, absolutely. But the distribution shift problem still remains. So even Waymo, who I think is maybe ahead in this, they just announced New York City. But when they go to New York City, they're going to have a safety driver there. They're going to have geofencing and so forth. The general form of driving solution would be level five. There'd be no geofencing. You wouldn't need specialized maps, et cetera. So there's been progress.

for 40 years every year in driverless cars. But the distribution shift problem is really what is still hobbling that from being a thing everywhere as opposed to a thing in San Francisco. Well, we're also getting better at distribution. We have gotten better. I took a leave on yesterday. It was great, but I took it here in San Francisco. There's still the problem of if it's getting better, there's a question of is it able to do full automation, which is more the key thing. Yeah.

And unfortunately for other abilities that probably the ability has most, or well, its two main abilities would be crystallized intelligence, acquired knowledge, and reading-writing ability. But even for reading-writing ability, the models, and when you ask them to write an essay, if you score them on like the GRE score out of six, they get like 4.5 or so. They're not particularly great writers, and you can't automate writers that well with

if they're doing somewhat complicated writing. People still need to be doing that. And that's somewhat surprising because they've had so much data. They've had all the data in the world, basically. And likewise for coding, they have had GitHub. So that at least, it potentially...

The case that as you extrapolate them out, they'll get a lot of the low-hanging fruit, but it crossing some sort of threshold for doing more automation or full automation. That could actually require resolving some of these other bottlenecks that are other cognitive ability bottlenecks that Gary's alluding to. I'm going to...

put a hold on this discussion. I think we did a good job. We're not going to convince each other, but we laid out the issues, and it's great. I want to ask at least one other question because we've somewhat limited time. Do you think that we've made any progress on alignment? Do you think there's

a conceivable solution to technical alignment? Are we close to it? Let's talk about the technical alignment problem. I'll just put one thing out there, which is I think even though I don't think there's been as much progress as you do on capabilities, there's obviously been some. There's no argument there. I would say that a lot of it is interpolation rather than extrapolation, and we haven't solved the extrapolation problem. But interpolation, we've made huge progress, mostly just in virtue of having more data and more compute.

But there's no question that new systems are much better at interpolating than previous systems. And that has lots of practical consequences for labor, for example. There's a whole bunch of things that you can use these systems to do that you couldn't use them before that's already affecting labor markets. Whereas my intuitive sense is on alignment, all we have is maybe human reinforcement learning

helps a little bit so that like, if you ask these systems the most obvious question, like, how do I build a biological weapon, they'll decline, that's a little bit of progress. But like, we all know that those are things are easily jailbroken. Like my view, and you can agree or disagree or whatever, is like, an alignment system still don't really do what we want them to do. There's still kind of sorcerer's apprentice style problems. There are kind of

problems of just like they don't quite fit, they don't do what we want. We have system problems that say don't produce copyrighted material and they still do. Don't hallucinate, they still do. Like

they, you know, they can approximate what we ask for them, but like they don't really do what we ask them. I take even like don't make illegal moves in chess to be a form of the alignment problem, like a very simple microcosm of the alignment problem, and they're still struggling with that. That's my view. How about you guys? Why don't we start with you? Yeah, so there's some different notions of alignment or alignability, and I would distinguish between aligning proto-alignment

superintelligences and aligning a sort of recursion that gives rise to superintelligence. Those are very qualitatively different. One is more model level and one is more process level. So I don't think you're going to solve that process level

one of doing a recursion fully de-risking that anticipating all the unknowns unknowns and solving a wicked problem in a nice clean way beforehand which points to the necessity for resolving geopolitical competitive pressures and giving them an out to to proceed with a recursion more slowly or substantially forestall that on the proto-asi things i think that they follow the instruct instructions

fairly reasonably, there are some other parts of it. Yeah, but that scares the shit out of me. Fairly reasonably? Like, I mean, as, you know, when these things are still in our hands, kind of, that's okay. But like, if you have them guiding weapons or something like that, there are many circumstances where fairly reasonably is not good enough. Yeah, no, I agree. There are definitely safety critical domains for it. And I

I think for, for instance, in bio, I think for refusal, for instance, in, in some high stakes contexts or given some high stakes queries, it, it depends. There are some cases where I think you can actually get multiple nines of reliability, such as with bioweapons refusal.

However, for other types of refusal, like don't cause any criminal or don't take a criminal or tortious action, that's a lot fuzzier. And I don't think that's the structure. Even on the first one, though, can I rewind? Yeah. You probably saw Adam Gleaves post things a couple weeks ago. I can't remember who he was leaning on.

But he described someone else's work saying that it was a very easy jailbreak. I think it was with Claude. Yeah. So in production, they're not using the techniques that are as adversarially robust because they come at a cost of maybe a percent or two in MMLU. And so they're not doing it. That's going to be the epitaph for humanity. If they hadn't squeezed out that last bit of MMLU, we would have been OK. Yeah.

But actually, so I think there are some types of solutions for some types of malicious use where you can have some nines of reliability. But for the general problem of refusal, including for everyday criminal and tortious actions, which would be extremely relevant for agents and making them feasible, I don't see substantial progress. I mean, think about Asimov's Law, by the way, which we're reading right in the

40s or something like that and you know number one was don't let harm come to humans or whatever and the cleaned up version law is actually the law kind of cleans it up and saying don't cause foreseeable harm because harm is way too strict but foreseeable harm is what we demand of humans and so and now when we're not

doing it that way? Yeah, yeah. That one we're not... Yeah, yeah. And so I think for most of these reliability issues and safety issues, we'll keep seeing new symptoms crop up and we'll have specific solutions that partly target them if there's willingness by these... As we've seen in cyber crime, a kind of constant cat and mouse. That's right. And I expect that...

this game will keep continuing and we won't get to a state where that is basically mostly managed in time because the risk surface will keep evolving with agents that will present new things and we'll have to deal with those current cases that will create a substantial backlog and we just won't have the adaptive capacity. And so consequently on both fronts for aligning recursion and aligning proto-ASIs, we...

the geopolitical competitive pressures make it such that we're probably not going to solve either problem. So this is dark. And going back to the beginning of our conversation, it's a reason to stand in front of the train, especially a particular train. So let's say that there is one train that's about chatbots and people having fun with chatbots and using them for brainstorming and whatever that are not mission critical and safety critical.

Maybe it's fine. We just let people do that. And there are already some risks, like around delusions that Kajmir Hill wrote about in the New York Times the other day you probably saw. But the really safety critical things, or the maximally safety critical things, if things are as dark as you said, that is a reason to stand in front of that train right now and say, look, if we can't do alignment well on the refusals, et cetera, and the causing harm to humans, foreseeable harm even to humans,

that's a reason to say, hey, we got to wait until we have a better solution here. If that takes 500 years, if it takes five years, like in the safety critical stuff, isn't that a reason to slow things down a bit? Yeah.

I think that that's a direction or an interpretation of those facts that is reasonable. But there's a question of incentives and what, you know, there might be a somewhat different aspect that's better. Sure, you could go at it by incentives. Yeah, yeah. So that's why in superintelligence strategy, we'll...

suggest some different things like making the threat of preemption if you're crossing these sorts of lines as opposed to stopping today because that is a lot harder to get institutions around. I'll weaken my claim and say, is that a reason to intervene, I think is the word. Yeah, that would be a reason. Yeah, of course. Right. And yeah, like to continue your metaphor with stopping the train, like

Right now, if I were to step in front of the train, it would just sort of run me over. And you need to have quite a lot of people in front of the train for the weight of the bodies to really slow it down. I've been kind of trying on Twitter. There's some pushback when I try to hold the train back. I'm not actually advocating holding...

the train per se. But on these safety critical things, I really could see an argument for some kind of intervention. As you note, it could be incentives rather than more time. But my view is that we are not going to solve the technical alignment, the sort of nearer term technical alignment problem with-- let me finish the sentence.

with LLMs, that they're just not up to the task. Maybe with neurosymbolic we might have a chance. So neurosymbolic gives you a chance to state explicit constraints that is very difficult in pure LLMs. And so I think there's reason to explore that avenue as a way-- I don't know that it helps with superintelligence, but at least with the near-term use of these systems for safety-critical measures, there might be some mileage to be gotten there.

One last point on this for Daniel is I think in both cases, I think it's a problem to use the phrase solve like that because it's not something you can sort of do beforehand. In both cases, you want adaptive capacity and slack in the system, some sort of safety budgets and people who can put out the fires faster than they're emerging.

Because as AIs continue to develop, they'll keep being new problems as they evolve. And likewise with recursion, we just see that, but on steroids. So with both of it, it's a resource thing. And I think it's less of a technical thing. The technical things can increase the capacity to deal with these problems or have more efficient solutions for some of these particular symptoms or these new failure modes that crop up. But I'm not expecting a total increase

monolithic solution that a pause would necessarily give. I think you have to have the background context in both cases be that you're able to proceed with development under some risk tolerance that's much lower than what there is today. I agree. We're totally not on track to have figured out the alignment stuff in time.

I can pontificate more on that, but I've talked a lot. So I mean, we should probably actually wrap up. So maybe some final words. I'll start with some final words. Maybe I'll make some very last words. I think we actually agree on a lot here. Our clearest disagreement is on

forecasting, even there, we're not probably as far apart as maybe people thought that we were. So I push all of my probability masks five years out and have basically none before. And you've got some at three years out or two years out that I don't. We both have some out at 2045. I have some even past then, and you may or may not.

We have slightly different methodologies that we have talked about. You were surprisingly coming to my aid, which I love. Happy moment for me. But we're not hugely apart. And I think we've acknowledged the value in some of the forecasting techniques that the other has used, even if we don't. So we're not--

hugely apart there. I think we're completely agreed that we're not doing a great job on the alignment problem and that we need to do much better and that there's a temporal dimension to that, as you were just saying, which is like, you know, it's not great for humanity if we solve that problem in 200 years and we have AGI or ASI, you know, in the next decade or two. And I think

We agree also that the current companies are not entirely trustworthy. Those are some of the things that we agree or don't disagree so much on. Like, in the broader picture, we're remarkably aligned. Yep, I would agree with that. Also, I just realized I forgot to ask you my big question about scenarios. Do you still have time for a brief foray into that? If you guys can sit for a few minutes, I'll do that. So here's the way I would pose the question. You know, you've read AI 2027. Thank you for reading it.

if you were to write your own scenario of the future, what would it look like? And perhaps where would it start to branch off from here in 2027? For example, would it look basically the same until 2027 or would it branch off earlier than that? And when it does branch off, can you sort of like

So that looks like you know, yeah, so there's a couple different pieces so I think the the speed at which things unfold in AI 2027 is not plausible to me I think by the end of the year we will already be less far ahead on agents as you Hypothesize there like I think the first couple months were actually kind of an agreement, but you know, there's a divergence in speed for sure I would say that

So take AI 2027 and then double the length of everything, maybe, or triple it. What would you say? Well, I mean-- Or at least-- yeah. I think that the level of intelligence that you attribute to the machines towards the end of the essay-- I remember some of that actually happens after the year 2027, I think.

you know, I don't really think is very likely in the next decade. And I don't think it's super likely in the decade after that, but it's certainly possible. What about the superhuman coder milestone? Do you think that that won't be happening in the next decade, probably? I don't remember how you define the milestone. Basically, think about like...

these coding agents like Claude and so forth, but imagine that they just like actually work to the point where you can just treat them like a software engineer and chat with them and give them high level instructions. And they'll just do as good a job as a very professional, excellent software engineer would have done. So I think apprentice engineer, we're actually close already. Well, I mean, top, I mean, top software engineer, I don't think we're close.

I think that that requires an understanding of the problem, what humans want solved by the problem. It requires a deep understanding of various domains.

I just don't think that we're that close to that. So I do think these things will continue to improve regularly. We will get more and more value out of them. They will improve programmer productivity with some asterisks around how secure is the code, how maintainable is the code, et cetera. But I don't think that they're going to replace the best coders. You're not going to get a machine that is Jeff Dean.

anytime in the next decade. Like I will be really surprised if you get a Jeff Dean level coder, right? You probably know some of the stories that the Chuck Norris stories that aren't really true. But he was able to look at problems that nobody had seen before about the distribution of these searches and coming up with advertisements over enormous scale that nobody had ever done before.

and pretty rapidly prototype and then make, maybe with some help, some production level solutions. So take Jeff Dean as kind of our example. He deserves to be our example in this. I don't see Jeff Dean coming out of these systems soon. I just don't. And then a little bit more on scenarios.

I think that the human mind sucks in light of scenarios, that it takes them very seriously. The vivid details-- and there's lots of psychological literature on this-- overwhelm people's ability to see things. What I would like to see, actually, would be a distribution of scenarios.

So when you put Scott Alexander, who's a brilliant writer, or at least a very compelling writer of a certain sort, into making one scenario vivid, everybody goes home and thinks that that scenario is real. But you and I know that that was one scenario of many. There's reasons to consider that scenario. And it's sort of-- the darker one is a very vivid version of the dark scenario. But really, we want to understand the distribution of scenarios.

And that's a lot more work. I'm sure it took you a few person years or something like that to put together that report. There were multiple people involved. You probably worked on it for a while. And so it's a big ask. But what I would like to see is really a distribution of scenarios.

We're working on it. I agree with the problem you're pointing out. We currently have a project to make a good ending, so to speak, at a similar level of detail to what we already have. And then also a mini project to make a more scrappy spread of possible scenarios illustrating different stuff at lower levels of detail, just like a few pages each. Yeah, I think that that would be helpful. My own personal scenario is like,

in three or four years, neuro-symbolic AI starts to take off. I already see signs of this. AlphaFold just won a Nobel Prize. That's a nice thing for neuro-symbolic AI. The conferences for it are getting bigger and so forth.

And I think eventually there will be a state change. I find it very hard to know when there will be a state change. But I think in 2035, we will look at LLMs and be like, nice try. We still use them for some things, but that wasn't really the answer. I think Yann LeCun would say the same thing, again, despite our differences. I think we both think LLMs

LLMs are not really the route to AGI, and that when we get there, it's going to look pretty different. It might use LLMs. They're great at kind of distributional learning. It might replace them because they're very inefficient in terms of energy and data. So somebody might find a better way to do the same kind of thing of learning the models of distributions of things, which is a super helpful cognitive skill. It's not the only one, but it's super helpful. But we'll have much better ways of doing reasoning and planning. We'll have much more stable world models.

I think it will take five or 10 years to develop that. I think the semantics that the current models have is very superficial. It's really about distributions of words. And we need a deeper one. Like if you talk about three in a row, you should understand what a three in a row is. And I think we're missing something to get that. We will get it. Like I don't think it's impossible. And you don't think we'll get it after automating AI R&D? We'll get it in a-- like humans will come up with the ideas that get us there. I mean, I guess it relates--

So for me, automating R&D, like there's a plausible version and a much more distal version. So the near-term version is like a lot of people do a lot of experiments on LLMs. And I think you can automate a bunch of that. But genuinely new ideas has not been the forte of this. Somebody just did a paper, I'll try to dig up the reference, in which they looked at whether these systems develop new or are able to infer causal laws.

And they're not that good at it. Like, I don't expect that an LLM is going to do Einstein level, look at a problem, come up with a completely different solution anytime soon. All the solutions they have seem to me to be kind of inside the box. And I think that getting to AGI is going to require outside the box solutions. They might automate the stuff inside the box. And inside the box...

They may even do better than people. Like the famous, was it Move 37 or one of the Go championships? It's kind of inside the box. It was still within the realm of things that, you know, it's still within Go. You know, it's not outside the box the way that thinking about relativity, like there's a whole different way to think about physics than we thought before. I think that we will need some Einstein-level innovations in order to get to AGI and certainly to get to ASI. And I don't expect that at least

automating the current machines will do that. Okay. When we do get to AGI, how fast do you think the takeoff will be? To ASI? Yeah. Like for example, you just mentioned now that like the current systems are very inefficient in terms of how much energy they need. That's a bit scary. If you really think that in the next, you know, 10 years, human scientists doing neuro-symbolic research will come up with much more efficient systems that are also better at generalizing. Holy cow, that's

That's going to be orders of magnitude better in various dimensions than what we currently have. I think whether I'm right about neuro-symbolic AI or not, I think that there are orders of magnitudes more data efficiency to be carved out than we have. I mean, you just look at human children. They don't need that much data. Ah, okay. So you're saying it'd be orders of magnitude more data efficient, but still only about as data efficient as humans? Or...

Possible you could find better. I mean, humans are pretty good data efficiency-wise, but I doubt that they're at the theoretical limits. There are some things in psychometrics where people really are at the theoretical limits. So we can notice the presence or absence, I think, of a photon. You can't do better than that. So there are things where we're at the theoretical limits. There are things where we're not. We are constrained very much, I argued in my book, Kluge, by a lack of location-addressable memory. And so like

You know, my daughter just memorized 105 digits of pi, which I could never do. But it's still nothing compared to what a computer can do, memorize billions of digits of pi. And so, you know, there are some advantages to machines in places where they should really be doing better than people if we had the right way of writing the software.

So at least we should be able to get to human levels, because humans are an existence proof of data efficiency. And humans are fabulously data efficient on many problems, not all. And then we have problems like we have cognitive impairments such that we have confirmation bias and motivated reasoning. And motivated reasoning is like--

I want this argument to be true, and so I kind of play little games. And we all do this. I mean, scientists are better because we recognize the behavior in ourselves and try to self-correct, but scientists do it too. Machines shouldn't need that. Like some of our things are, to use Freudian terminology, even though I'm not a Freudian, we have ego-protective ways of reasoning. Machines should not need that.

Right? Like, I believe in my political party. And so when my political party does a dumb thing, then I go and try to, you know, come up with a rationale for it. Machines shouldn't need to do that kind of stuff. And so in that way, you know, certainly the upper bound is going to be way beyond, you know,

I did a panel once with Daniel Kahneman, and he's not with us anymore, but he was very fond of these studies that showed that in certain domains, machines were already better than people. And these are problems basically of multiple regression weighing multiple factors, and he was right. And I think I came back with some where the machines were not very good and people were better. And in the end, he said something like, humans are a really low bar, and his whole research

Well, not his whole-- he had many research. But one of his whole research lines was showing that humans, in fact, were pretty bad at all kinds of reasoning. So he says, humans are a low bar. We're doing this panel. And I said, yeah, and machines still haven't met them. They will someday. They will exceed them. There's no question about that in my mind. It's a question of when and how and so forth.

we really are a low bar because of all of the kind of cognitive biases and illusions, the problems with memory. My book, Kluge, was all about this kind of stuff. There's absolutely room to do better. And yes, it could happen in 10 years. I don't, you know, probably it will happen on some of those dimensions and not others. It already did on like math, you know,

60 years ago or 80 years ago. It will happen dimension by dimension, maybe several all at once when there's a breakthrough or something like that. But it will happen, and it could happen in 10 years. Again, I don't think it can happen in two. I think we're missing some ideas right now. We're missing some critical ideas. But when we get those critical ideas, they could go fast, just like molecular biology. Once Watson and Crick figured out DNA, things moved pretty fast. In 40 years, now we can do CRISPR and stuff like that, or

not 40 years, but 70 years in remarkable progress. There will be, I think, periods of AI progress that exceed the last few years. I know that the last few years feel like a lot to a lot of people. But I think in hindsight, 30 years from now,

AI will be enormously ahead of where we are now. I mean, almost on any of these kinds of projections, right? And we will be like, yeah, a bunch of stuff happened and they were really proud of themselves. But the way that we look back at flip phones, they're like, yeah, those were kind of cool, but they didn't know about smartphones. I agree with everything you said about what we agree on. We continue to disagree about what the next couple of years are going to look like. I think that

Well, I think it's going to look more like A2027, where rather than sort of like tapering off and running into sort of, rather than the limitations that you're talking about becoming bottlenecks that the companies can't work around, I think that they're going to be more like a series of road bumps that the companies sort of like bash through. I mean, that is the question. So we could have a second edition of this two years from the day and see how that sounds. Yeah, yeah.

Okay, so one thing worth clarifying, since some people may be confused, I'm sort of in the middle of thinking through AI timelines and largely because I'm sort of trying to reflect on what intelligence is in a sort of multidimensional way. We maybe got a little bit spoiled with reading, writing ability and crystallized intelligence, the main progress over the past decade.

two years or so has been in mathematical ability and its short-term memory is also better. But, so I'm still wrapping my head around that since I'm being more noncommittal about some of these different forecasts where I think maybe still seems very plausible, more than plausible by 2030 something that is, has the cognitive abilities of a typical human, some system like that.

There's some differences in how things might play out at a technical level. In AI 2027, I don't think much really goes through mechanistic interpretability or technical solutions really solving much of it. I think you need to really ease the geopolitical competitive pressures. I think the main dynamics that make way for that are

transparency and how easy it is to do espionage, as well as the sabotageability of that, which I think are very important dynamics that are in some ways reflected in there but not totally captured. On sabotageability, for instance, if China were interested in stopping the US, they could do some sort of cyber attack on some power utilities, but say that that doesn't work.

They can also, there's basically a lot of vulnerabilities that they can exploit. For instance, they could, from a few miles away, snipe the power plants, transformers, and that would take down the data center. So I think that, and there's lower attributability. Was it Russia? Was it Iran? Was it China? Was it some US citizen, as an example? So I think that affects the strategic dynamics where I speak about that in superintelligence strategy.

as well as I think the transparency that China has to the US would be relatively high. Right now, it's a matter of just hacking Slack. And then you can see Anthropic Slack. You can see OpenAI Slack. You can see XAI Slack, Google DeepMind Slack. So you can have very high transparency there and hack the phones of top leadership as well.

So, this paints, in some ways, a different picture, but I think we would agree that we want to work toward a verification regime so as to have red lines around things like intelligence explosions and things like that.

I won't really say anything more substantive, but I will say this. I thought this was a fantastic conversation. I hope that it won't be cut too much because it was really interesting and I salute anybody who made it through watching the entire thing. We got pretty technical at times and really laid out, I think, where the state of play is today, which was my fondest hope. I think we did a great job with that. Thank you, gentlemen. Shake hands for the camera.

All right. All right.

Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks) 02:07:07 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks)