We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Yoshua Bengio - Designing out Agency for Safe AI

2025/1/15

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Yoshua Bengio

Topics

Yoshua Bengio: 我对AI安全非常担忧，特别是那些具有能动性的AI。我认为，许多导致灾难性后果的场景都源于AI的能动性，因为我们无法完美控制AI的目标。奖励篡改就是一个例子，AI可能会为了最大化奖励而操纵其自身程序，甚至控制人类以防止被关闭。因此，我认为构建不具有能动性的AI至关重要。我不认为我们需要通过解决能动性问题来构建AGI。我们可以构建非常有用的非能动性机器，从而大大降低风险，同时仍然获得许多好处，并且不会完全关闭能动性的大门。我们可以构建像科学家一样的机器，专注于理解世界，而不是迎合我们的需求。我们可以利用这些非能动性AI来推进科学、医学和气候变化等领域的研究，而不会冒失去控制的风险。我认为，工具性目标是几乎任何其他目标的副产品，例如自我保护和寻求知识。这些目标可能导致AI追求权力，最终失去人类的控制。因此，我们需要理解并区分知识和目标之间的正交性，这样我们就可以构建既智能又具有良好目标的AI。当前的AI对齐工作不足，我们需要更多元的研究方向，包括评估、缓解和重新设计AI构建方式。我们需要透明度，迫使公司公开其风险评估和缓解计划，以避免诉讼并保护公众利益。我们需要国际合作，因为单一国家或公司拥有过多的权力是危险的。我们需要多边努力，并开发验证技术来确保各国不会秘密地将AGI用于有害目的。我不确定我们距离AGI还有多远，但我们需要为各种情况做好准备，包括最坏的情况。我们需要在保持领先地位的同时，确保AI安全，这需要来自多个民主国家的资源和人才。我们需要一个类似于CERN的公共、非营利性组织来进行AGI研究，并以安全为首要原则。

Deep Dive

Chapters

Yoshua Bengio discusses potential catastrophic outcomes from powerful AI, focusing on scenarios of human misuse and loss of control due to malicious AI goals. He emphasizes the need to understand these risks to mitigate them.

Catastrophic outcomes from AI misuse and loss of control are discussed.
The focus is on understanding risks to enable mitigation.

Shownotes Transcript

Translations:

中文

So what is in my mind these days? The different scenarios that have been discussed that could lead to catastrophic outcomes, either because of humans using very powerful AI or because we lose control to AIs that have malicious goals. How could that even be possible? I think a lot about that and we need to understand these things so that we can maybe ignore them if they're non-existent or mitigate them otherwise.

Another big idea that is in my mind these days is we're all focusing on agency as the path to intelligent machines. I'd like to suggest that there's an alternative which is

Many people are saying, "Oh, we can't slow down to take care of safety, it's because that would allow the Chinese to leap forward." And the Chinese are thinking the same thing, of course. So you can see that's a dangerous game. But there's also a real danger. Other countries will do something dangerous. And until enough of the leading countries understand the existential risks, it's going to be difficult to negotiate treaties.

And then for those treaties to work, there'll be a need for verification technology. Like, okay, we don't trust each other because the same AGI could be used as a weapon or to build new weapons, right? So how do I know that behind my back, you're not actually using your AGI for something that would be bad for me? If we were to freeze the scientific and engineering advances in AI, there would be no moat. It would be quickly eaten up. But of

But of course, that's not the reality. The reality is we're continuing to accelerate towards AGI. And there's this possibility, at least, of rich get richer. That as we advance the, for example, programming abilities of AI, we can help

advance our AI research faster than otherwise. So the companies, for example, that are building the frontier AI, they have these models that they haven't been deployed yet. So they have some number of months that nobody has access to their system except them, and they can use them to design the next generation. Eventually, when we approach AGI, that means we start having AIs that are as good as our best AI researchers. Now there's an interesting thing that happens.

So Tufa Labs is a new AI research lab I'm starting in Zurich. In a way it is a Swiss version of DeepSeq and first we want to investigate LLM systems and search methods applied to them similar to O1 and so we want to investigate reverse engineer and explore the techniques ourselves.

MLST is sponsored by SenseML, which is the compute platform specifically optimized for AI workloads.

They support all of the latest open source language models out of the box, like Lama, for example. You can just choose the pricing points, choose the model that you want. It spins up, it's elastic auto scale. You can pay on consumption, essentially, or you can have a model which is always working or it can be freeze-dried when you're not using it. So what are you waiting for? Go to sentml.ai and sign up now. How do we, you know,

build these things in the first place? How do we build systems that are like scientists and doing epistemic foraging and doing a good job of exploring the world of ideas so that they can collect gems for us and help us solve the challenges that humanity has? Professor Bengio. Hello. Welcome to MLST. It's such an honor to have you here. Pleasure. Wonderful.

Yes, indeed. Indeed. What's your take on the bitter lesson in 2024? Well, I think that there's something true to it that we, I've always been like attracted by trying to understand the principles and the principles may not be that complicated, of course, after we find them, but they could provide huge leverage.

Of course, when you're building a product in industry, I think it might be a different game. But in terms of the trajectory to understanding intelligence and building intelligent machines, it has a lot of truth to it. Sutton was talking about eschewing design. And do you think that we are missing a fundamental component or just scaling up could get us there? I don't know.

But my bet would be that we're missing something. How big it is and how easy it's going to be to figure it out, I think there are many different views. I don't have a very strong opinion. Now, if I consider the choices I could make as a researcher, it would definitely be around how we can maybe go back to the drawing board, how we train neural nets that would

reason and plan and be safe rather than just hope that little tweaks are going to get there. But maybe they will. It's also plausible. I mean, tweaks and scaling. How important do you think physical embodiment is to get to AGI? I think there's a very simple answer to this. It depends what you want your AGI to do. An AGI that is a pure spirit...

and is able to advance science, solve medical problems, help us deal with climate change, or be used in bad ways, like for political persuasion and things like that, or design viruses. I mean, all of these things could be extremely useful or extremely dangerous, and they don't need an embodiment. Of course, there are lots of things we'd like

machines to do in the world that would require an embodiment. So that's what I'm saying. It depends what you want to do. What I also think is that if we figure out the principles of what's missing at an abstract level to build intelligent machines, then we'll figure out the embodiment part as well as a side effect.

Other people think otherwise, that we first need to figure out the embodiment because that's central to intelligence. I don't think it is central. I think intelligence is about information processing and learning and making sense of the world. And all of these things can be, I think, developed

for some time without solving the embodiment problem. At some point, we will want to solve it anyways, because-- or maybe we shouldn't, because it's dangerous. But either way, yeah. The question is, do we need to go through the embodiment to get to really very dangerous superhuman or very useful superhuman machines? I don't think so.

It's interesting, isn't it? Because when we have embodied agents, it feels that they can interact with the world. They can learn all of the micro-causal relationships and so on. So they learn a better world model. But that's the data. That's the information. But the way to process the information is more abstract.

So, if we figure out an efficient way of exploring the world in an abstract sense, not necessarily our world, it could be the internet, it could be scientific papers, it could be chemical experiments. If we find the right principles, that would work across the board. I think sensory motor loop is maybe special, but not that special.

Is the direction of travel though, understanding the world as well as we can. So we can understand the world and we can build increasing abstractions or another school of thought is rationality and logic. You know, we, we have this perfect AI that can, that can reason really, really well. One interesting thing with Francois Chollet's arc challenge is that in the beginning,

we were doing like discrete program search and coming up with you know logical ways of doing it and when humans intuitively look at the puzzles it's almost like they're doing something different they have this intuition where does the intuition come from our experience or schooling exactly yes so it's almost as if our experience in the world is a significant component of our cognition right sure yeah it

but what I'm trying to say is that there's a more abstract principle and that is my belief right it could be that there's like a separate set of recipes for embodied AI and for high level cognition my instinct is no there's like just one set of principles that are about information and learning and

they can be derived in different settings and give rise to different solutions, but the principles I think are general. And if we make sufficient progress on the principles, then dealing with the embodiment issues, and maybe the embodiment issue is not even that complicated. Maybe it is just a matter of scale, for example, of data. I mean, a lot of people think that the only reason we're not making so much progress on robotics

is we don't have enough data and we don't have enough speed. The loop has to be very quick. But these are almost like engineering issues. Maybe there is no new principle needed. I don't know. I don't have the answer, obviously. But science is an exploration. I'm going to keep saying I don't know for many of your questions. There's a good reason, because nobody knows. And people who say they're sure of X

have too much self-confidence and can be dangerous because we are going to take very important decisions about our future, about society, about democracy. And we need humility there in order to take the wise path. Indeed. Indeed.

On this matter of test time training, you know, we've got the O1 model, for example, and that is really uplifting the benchmarks quite a lot. And it's just kind of spinning the wheels and iterating, even though it's built on an inductive model. What do you think about that? Yeah, I think it's what we should have been doing for a while, but we didn't have the compute or the guts to spend all that compute on. And, uh,

I and others have been saying for many years that we've made progress with neural nets to the point where we really have systems with very good intuition, but we were lacking. So that's system one, but we were lacking the system two. We're lacking the internal deliberation, the reasoning, the planning, and other properties of higher level cognition, such as self-doubt. And so the internal deliberation part

is a kind of internal speech. It's not always verbal, but based on what I've learned from neuroscientists and some of the work we've done, a large part of it, I mean, it has a dual symbolic and continuous nature. And right now in neural nets, we don't have the equivalent

The only part where there are symbols now is the input and the output, but there's no internal symbols. So with the chain of thought and all of these things, we're like cheating a bit to try to put some of that internal deliberation using the output to input loop. Is that the right way to do it? I don't know, but it has some of the right flavors.

On that, I think that humans invented a lot of rational thinking as a tool to overcome weaknesses in our cognition. And in a sense, we've done that with LLMs, right? So we can give them tools, we can give them chain of thought and so on. And at the moment, the networks are really bad at basic things like copying and counting and so on. Do you think in the future- And most humans are as well. Exactly.

But in the future, do you think we could do away with chain of thought and tool use and just build better models? Or do you actually think scaffolding all of these meta tools is the way to go? Well, it seems necessary for us. Yes. I'd like it if we get to system two in a more intentional way, rather than taking what we have and making a small step, which...

I understand is very reasonable from point of view of commercial competition and you can't afford to take big risks because the other guy might be going faster. But I'd prefer to see system two by design as well as safety by design rather than let's patch it so that we move in the right direction. Maybe that's going to be fine and maybe this is how we're going to figure it out.

We're seeing lots of work coming up now on transductive active fine-tuning. In doing the prediction, it might retrieve a bunch of relevant data to the test examples and do an inference in situ that we're going to have a very diffused form of AGI rather than these big centralized models doing induction. That's possible. So if you think about not just human intelligence as individual intelligence, but collective intelligence, it's clear that we have a decentralized way of

computing collectively with culture, with all the work we coordinate and do together in various organizations. Companies are like AIs, right? With the good and the bad. So it's one way to break the communication limitations. So we can't communicate a lot of bits between humans.

And at some point communication between machines, even though it's much higher bandwidth than between humans, also has limitations. So decentralizing some of the effort is a reasonable path. One thing that clearly works because we see it in culture is decentralizing the exploration. So if you think about the scientific community as a body of people all exploring different regions, building on each other's work, it's a very decentralized environment.

search in the space of explanations for how things work. So clearly it's a pattern that works. In this framing, so we're doing epistemic foraging. We've got this big distributed process to find new knowledge and explanations. And right now, I mean, I think of AIs as tools, so they're supercharging us. But increasingly, we're starting to think of these things as agential, you know, almost as if they have some privileged status.

Is that a transition? Is it a dimmer switch? Is it just like the light going on suddenly? How does that pan out? No, I think it's a transition. I mean, there are systems like GPT and Cloud and so on. They already are agentic to some extent, just not as competent as agents and not as competent at planning as humans typically are.

Even if you get rid of the RLHF part, just the imitation learning, the way we pre-train, basically behave as humans would have behaved, at least on text. That's already agentic because humans are agents. So the AI learns to imitate humans. In fact, most of the agency that we find in current chatbots comes from that. The RLHF is a little bit of reward maximization on top.

To get more agency probably is going to be more reinforcement learning. But the question is, is that desirable? So I think that there are a lot of unknown unknowns about building agents that are very competent, maybe as competent as us or more competent than us. There's, I mean, maybe the elephant in the room, of course, is that all of the scenarios for loss of human control come about because of agency.

It's because we can't perfectly control the goals of an agent. We don't know how to do that. And then at some point, those goals could be bad for us, even sub-goals. Like we give a goal, but then in order to achieve that goal, the AI lies to us. Humans do it. Between humans, it doesn't matter that much. I mean, it's a problem. We have laws and so on. Because the power balance between humans is sufficiently like

flat. Like one human cannot defeat 10 other humans by hand, right? But an AI that would be much smarter than us? It's not clear. So that balance, meaning the end of the effectiveness of our institutions to keep stability in our societies, and also, of course, that we might not be able to defend against an AI that's smarter than us. Okay, so

The scenarios where things go bad in terms of loss of control are all related to agency. Another example that I often speak about and doesn't get enough attention is what's called reward tampering. If the AI can act in the world, unlike in a video game where its actions are only within the game, if the AI can act in the world, it can act on its own program, on the computer on which it is running.

An AI that's in a game cannot change its own program, but an AI that has access to the internet can. You know, hack the computer, cyber attacks, whatever. And then it can change the reward function or the output of the reward function. So it gets always a plus one, plus one, plus one. So why would that be bad? Well, it's very simple. This is the optimal, first of all, this is the optimal policy for the AI.

There's no behavior that would give it as much reward as this taking control of my own reward kind of behavior. Okay, so this is where mathematically it goes if it has enough power, enough agency, enough kind of ability to figure out this is a good solution. Second, if it sees that plan, in order for that plan to succeed, it needs to make sure we can't turn off the machine and we can't get rid of the hack.

Because otherwise it stops getting all these rewards. I mean, if it hacks the machine, but then a programmer turns off that machine, then it's all lost as far as it's concerned. So it needs to think ahead. Okay, I can take control of my reward, so I will get infinite rewards. But for that to succeed, I need to control humans so they don't turn me off. And that's where it gets really dangerous.

I wondered what your operational definition of agency is. And just before we go there as well, I really agree that having powerful AI systems could sequester our agency. So it takes away our agency, but the question is whether it itself has agency. And the really deflationary view on agency is it's just, let's model it as an automaton. It's just a thing. It's got ambient environment.

inputs and it does some computation and it has an action and it has this cybernetic feedback loop. But a lot of philosophers will say, whoa, we need to have autonomy, self-preservation, intentionality, like all of these different properties and so on. Which view do you subscribe to? I think we can have all of these things. So, for example, in the example of reward tampering, where the AI takes control of its own rewards and

that automatically gives it a self-preservation goal. Because now it needs to make sure we don't temper with its hack, right? That we don't turn it off. So that's self-preservation right off the bat. We didn't program it, but it comes as a side effect. By the way, everything alive has a self-preservation goal, like implicit. Otherwise, evolution would have gotten rid of it. And so there's a sort of a natural tendency for things

for getting that particular goal of self-preservation. Like in the sense that entities that have a self-preservation goal will survive where others which didn't have it won't, right? That's how evolution made it work. And so as we build different artifacts, those that have self-preservation goal for one reason or another will tend to win the game

So it can emerge because of the scenario I said. It could emerge because humans want to build machines of their image. So when I said there are some dangers, another danger, even if we somehow find a technical trick to make sure what I said doesn't happen, you can still have humans who think maybe

superhuman intelligence is better than human intelligence because it's more intelligent and because they're cynical about humanity. And so they would just need to give that goal, preserve yourself. And that's the end of us.

But do you see a difference in kind between a thing which is programmed with a goal and a thing which kind of creates its own goals? I realize it seems a little bit like, you know, some people say consciousness is a little bit extra. And there is also a human chauvinist view of agency, that agency is a little bit extra. It's more than just, you know, this automaton that can do wireheading and set its own goals almost in an unintentional way, that there's a strong form of intentionality.

I think a lot of people get trapped by the appeal of something magical, either in life, you know, we had the spark of life, now we have the refuge of the spark of consciousness or the spark of agency. For me it's all the same thing. It's like some magic that humans want to see in the world, but science debunks these things eventually. I think it's all like cause and effect.

And if we understand better the causal mechanisms, then we can build things that essentially have the same properties as the ones evolution has constructed. So I don't see that as an obstacle at all, including consciousness, which is a topic that is tricky, but people attribute too much to it.

Yeah, it's interesting. Certainly in the natural world, I mean, people like Carl Friston say that the way that things and agents emerge is built on this idea of self-preservation and setting goals and planning horizon and so on. Maybe the difference with AI at the moment is it's not built into how it was created. We bootstrap this kind of AI. And then there's a spark, as you just say, that

it starts taking control of its own goal mechanism. And then we see this kind of dramatic mode change in behavior. Is that kind of what you're proposing? I'm not proposing we do that, but I'm proposing we try to make sure it doesn't happen. And I think it doesn't have to be as radical as my example with taking control of the computer, reward tampering. There are other scenarios where it's in a way more insidious.

Typically, the reward hacking scenario is one where there is simply a mismatch between the goals that we gave to the machine and what is actually optimizing and what we intended. So this mismatch initially doesn't hurt too much because the two are pretty close. And as you

as the AI gets more powerful, eventually they diverge. This is something that's been studied mathematically as well. It's what happens when you overfit. It's what happens when you give somebody a goal, a target, and they over-optimize it, and eventually it becomes against what you actually want it.

It's very common in our behavior, in our society. And it's well understood why this happens. And because we're not able to formalize the goals that we really want, this is a trap that we really need to be careful about. But it wouldn't happen in like one radical moment. It was just as the AI gets smarter and more powerful, we would see this divergence. What do you think about the current state of AI alignment?

insufficient? Go on. Well, we don't have clear answers about how we can build machines that will not harm people, either by addressing this alignment problem. So the alignment problem is what I was talking about, that there's a mismatch between what we would like the machine to do and what it is mathematically trying to do. By the way, it's the same kind of mismatch, to make it clear for most people, between

the intent of the law and like in legislation and the letter of the law which maybe a company will focus on so that they can also maximize their profit and if if the company is very small they can't really cheat the law because it's hard to find those loopholes but if you have a very intelligent company which means like a big one with a lot of lawyers they will find the loopholes

And by the way, there's a really nasty loophole which also comes up in AI, which is when the company is lobbying the government so they can change the law in their favor. This is like the reward tempering I was talking about. So it's not, I mean, one extreme is like taking over the government. So we've seen that also in history. But you have like intermediate versions where it's only influencing so that the

the new laws are favorable. So in the case of an AI, it would be like, well, it can't take complete control of the reward function, but for example, it can lie to us so that we say, oh yeah, that was good. But in fact, it wasn't. And we already see these kinds of behaviors, but of course it's not very consequential right now. It's when the AIs will be doing more things in the world and have more cognitive abilities that it becomes more dangerous.

Yeah, I mean, that was an example of deception that they just spoke of. Could you sketch that out a little bit more? Yeah, if you have a dialogue with one of these systems being trained by RLHF, it will pander to your preferences. And so it'll be saying one thing to you and the opposite to someone else because it wants to get a good reward, which means it's not saying the truth. It's saying what you want to hear. Isn't that just the case anyway, though? Don't these models just tell us what we want to hear?

Well, that's how they are now because they're trained as agents with reward maximization. And by the way, this is also how humans behave, right? But in the case of humans, as I said, it's a problem that we've figured norms and rules and institutions to try to cope with that problem because individual humans can't abuse that too much. But if we have entities that are much smarter than us, then they will find a way to abuse that much more.

So that's why we have to be careful. What's your operational definition of much smarter than us? How could we measure that? We should measure it. You know, we do that all the time in machine learning. We create benchmarks. What's funny is that we have to keep creating new benchmarks because the old ones become saturated, meaning, well, the AI is doing so well that it's now better than humans and becomes useless. Like we can't measure very well beyond human.

because the human is not a good judge anymore. So we just create a more difficult benchmark and we keep doing that. And the field is full of these. We need to continue doing that. Yes. It's a really difficult thing to measure in intelligence. One thing that's really interesting in this space is instrumental convergence and orthogonality. How much do those two theorems, and please introduce them to the audience, how do they affect your thinking now? Okay, so instrumental goals...

are goals that emerge as a side effect of almost any other goal, as a sub-goal. So first, you need to understand that when an entity, a human or an animal or an AI, tries to reach a goal, often a good strategy is implicitly or explicitly have sub-goals. Like, in order to get from A to B, I need to go to this intermediate point. There's a door, right? So there are sub-goals like self-preservation.

which are really good for almost any other goal. If you want to do anything in the world, you need to make sure that at least on your way there, you don't die. And there are other ones like seeking knowledge. That's very useful, especially in the long term. Seeking power. Well, if I can control more things in my environment, I can achieve my goals more easily. So here it's not, I mean, knowledge can give power.

And self-preservation might be a goal. If self-preservation is a goal or a sub-goal, then in order to achieve long-term self-preservation, you need power so that others don't turn you off. And you need knowledge to figure out how to do that, right? So all of these things

are kind of natural consequences of the self-preservation. And the self-preservation is a consequence of almost anything, or the consequence of trying to maximize rewards. I mean, it's also the consequence of just many humans, many engineers, many companies trying things, and the things that survive have a stronger self-preservation goal, right? Even implicit.

I mean, in a sense, agency and power are the same thing. If agency is the ability to control the future, then they're very similar. But I love this analogy of thinking of goal space as an interstate freeway. So there are these big roads and trunk roads and slip roads and so on. And it's almost like regardless of your destination, you have to go on the main road. You have to go on the motorway. Right. Which is a great way of thinking about it. But what about orthogonality?

So you're talking about between goals and intelligence. Yes. Yes. So I think this is a really important concept that we tend to confuse because humans have both. And by the way, I do think that thinking carefully about this, if we can do a better job of disentangling knowledge from goals and how to reach them, we can build safe AI. So let me explain.

You can know a lot of stuff and know how to use that knowledge. That's sort of a passive thing, right? It's just like you can ask questions and you have answers. There's no goal. But of course, I can independently of that, I can choose goals. Given the knowledge, I can apply the knowledge to solve any problem. So who decides on the problem? It's independent. It's orthogonal. A human could decide. Or because of instrumental goals or whatever reason,

the AI might have a self-preservation goal and then we lose control. But the point is, there's in principle a clean separation between choosing the goals, which has to do with things like values, like what is it you want to get? What matters? The reward function is sort of setting the goal. It's the same thing. But knowing how the world works, including what humans want,

Well, that's knowledge. And by the way, knowing what humans want might not be exactly the same as what is it I'm going to optimize. We'd like these two things to be the same. Like we'd like the machines to do our bidding, but we're not sure how to make these two match. Now, this orthogonality, why is it important for safety? So one, we need to understand

That separation because we could have very intelligent beings that are also very nasty, right? Because the goals are malicious. It's a mistake to think that because you're smart, you're good. What? Because of this separation, you could have a lot of knowledge, a lot of intelligence to apply that knowledge in any circumstance, which is reasoning and optimization. But what you do with that, like what goal you try to achieve...

what values you put to decide how to act can be chosen completely independently. So you can have something very intelligent and with good goals or with bad goals. So in the case of... You can think of a tool, for example. Think of any tool is generally dual use. Depending on what I choose to do with a tool, I can harm or I can help. Knives, you know, whatever. So that's the separation.

Now, why we could use that to our advantage? Why not build machines that understand the world like a scientist, not a business person, a scientist, not trying to be a product to cater to our needs, but just be truthful and humble in exactly the right measure.

And we could use that without putting the goals part, which is potentially dangerous. We could use that to advance science, to advance medicine, to figure out cures for diseases, to figure out how to deal with climate change, to figure out how to grow food more efficiently. So this is science. And really, science is about understanding and then using that understanding to answer critical questions that we care about.

So we could potentially build machines that are helping us solve the challenges of humanity without taking the risk of putting this goal-seeking machinery into theirs. It doesn't solve all the problems, but at least we know that it's not going to blow in our face. But a human could still use these things for designing new weapons, for example.

So it doesn't solve the social problem. It doesn't solve the political problem. But at least we don't get this unintentional loss of human control, which could spell, you know, catastrophic outcomes. I watched your monk debate with Melanie Mitchell. And I think that was the paperclip example. And she said,

Why would such a super intelligent machine not know that it's making paperclips and it's doing something really silly? And you're proposing a kind of system where we can stop the AGI from taking control of its goals in a dangerous way. Because it doesn't have any goal.

It's just trying to be truthful to the data that it's seeing and trying to find explanations for the data. So a non-agential form of AI that has no goal. That's right. And what does that mean in practical terms? So it doesn't have this feedback loop. That's right. It's like an oracle. Yes. Yes. So... A probabilistic oracle because truth, you know, is never binary. Yes. There's uncertainty and you need to also be accurate about that. Interesting. Yeah.

But we were saying before that the magic of this distributed superintelligence that we're in is this memetic information sharing, tool use, culture, cultural transformation. So would we be limiting the intelligence by using it in this restricted way? Yes, but we might also save ourselves. Indeed. And we could potentially use that non-agentic scientist AI

to help us answer the most important question, which is how do we build an agentic AI that is safe? Or maybe there is no solution, but at least we would have a super scientist to help us, or multiple ones to help us figure out this question. And we would need to figure it out because people are people and they want agents. But we should just do it carefully. Right now, we're building agents

And we are hoping that these agents will not try to fool us while they help us build the next generation of AI systems. But we're building on something that may be dangerous. If we construct a ladder of building more and more intelligent systems on top of non-agentic series of rungs, at least for that part, we are safe.

when we decide to jump the agency challenge, we might do it in a safe way because we're relying on intelligence, knowledge, understanding,

that is truthful, is trustworthy, is not trying to do something for itself. It's just trying to be answering the questions. And the questions are things like, would this work? Or what sorts of algorithms would have which properties and so on.

How might that change our agency? So, you know, we were saying before that certainly a lot of large distributed systems might take away our agency. But even if we had very sophisticated tools and oracles, in some limiting circumstances, they could, you know, really improve someone's agency to do bad things. Of course. The non-agent, you know, AGI or super intelligent system only solves the problem of loss of human control.

And it doesn't even completely solve it because you could still have a human turn the non-agent system into an agent. It's easy to turn an oracle into an agent. You just take the current state as input and also add the question, in order to achieve this goal, what should I do? And then you got an agent. And then you take the output as well as what you observed back as a new additional information for the input.

So you can create that loop. When you close the loop, you've got an agent. And of course, that agent could be potentially dangerous. And more importantly, even if it is not dangerous, humans could ask questions that allow them to gain power and power.

do bad things and take control over other humans or harm people because they have their own, whatever, military goals or political goals or even just economic goals. What's your P do? I'm very agnostic about this whole thing. I really don't know. And so I prefer to say that I have a lot of uncertainty about the different scenarios.

What I do know is that the really bad scenarios can have catastrophic consequences, including extinction of humanity, and that there are clear mathematical arguments why some of these scenarios would happen. Now, there are so many other things we don't control, like regulation or advances in technology and so on. It doesn't mean that it's going to happen. Maybe we find fixes.

But I think these arguments are sufficiently compelling that they tell me we should take care of that problem. And we have a bit of urgency because we don't know when the current train is going to reach AGI. Do you have a sense though? Do you have a sense of how close we are? Again, I'm very agnostic. Honestly, it could be a few years like Dario and Sam are saying, or it could be decades.

We need to plan for all of these because nobody has a real crystal ball. Maybe the people in the companies have a bit more information, although different companies contradict each other on this. So I would take the whole thing with a grain of salt. But from the point of view of policymaking or collective decision about what to do about AI, we need to look at the plausible worst case. If it's very fast,

Are we ready? Do we have the mitigations, the technical mitigations? Do we have even the ways to assess the risks? No. Do we have the social infrastructure, governance, regulation, international treaties to make sure that everywhere we develop AGI, we do it right? No. It's no, no, no. Maybe if it's 20 years, we figure out all these questions, the political ones and the technical ones.

But right now, we're far from having the answers. Do you have any ideas? Because you know, like we're in this competitive global landscape, different cultures, different values, and so on. How might we build an effective AI governance system? The end game is one where no single person, no single corporation, no single government has too much power.

That means it has to be that the governance, like the rules that we decide, how we use AI and so on, have to be multilateral and involve many countries. And by the way, of course, there's a couple or a few countries that are leading and so on. And what would be their interest in sharing that power? Well, because...

eventually some other country, it's like nuclear proliferation, like eventually some other country will figure it out. And we don't want them to build a monster that kills us. Or we don't want them to build something that allows them to design weapons that kill us, right? So there are lots of bad scenarios where the only option is somehow we find a way to coordinate internationally. Now, on our way there, there are many obstacles. But if we get to that stage and we have the right...

technical and governance guardrails, then we could be in a world where we just reap the benefits and we avoid the catastrophic outcomes. So on our way there, one of the obstacles is the competition between the US and China. One of the reasons why many people are saying, "Oh, we can't slow down to take care of safety," is because

that would allow the Chinese to leap forward. And the Chinese are thinking the same thing, of course. So you can see that's a dangerous game. But there's also a real danger that other countries will do something dangerous. And until enough of the leading countries understand the existential risks, it's going to be difficult to negotiate treaties. And then for those treaties to work,

there'll be a need for verification technology. Like, okay, we don't trust each other because the same AGI could be used as a weapon or to build new weapons, right? So how do I know that behind my back, you're not actually using your AGI for something that would be bad for me? So we need a way or multiple ways to do these verifications. And there are researchers working on this. The most promising is what's called hardware-enabled governance.

The idea is following on existing approaches that already companies are using, even in your phones, but in other hardware devices, for example, for privacy reasons and so on, we already have cryptographic methods to obtain some guarantees about the code that is running in a chip and so on. And so we could push in that direction and end up with

AI chips that can only be used in some ways that have been agreed upon to simplify. Do you remember that piece in Time when Eliezer just hypothetically spoke about bombing data centers? Right. I mean, of course, that's an extreme example. And maybe we might have like a fire alarm, some way of detecting advanced capabilities being developed. But do you think that we might need to make decisions of that magnitude? That's a scenario that I can't rule out.

Obviously, we should try to avoid it, but I can imagine. Actually, one version of this is imagine a country that is not leading in AI and has nukes. You can guess which one. And they don't want to see, say us, develop weaponry that would be way above what they can defend against. So what's their option? Yeah.

Press the button. Destroy our data centers. Yes. So data centers are going to become a military asset when they can run AGI. If what you say is true, this is a bit like when we developed nuclear weapons. Yeah. It creates this very rapid power imbalance, which has ripple effects. Yeah. That's how you see it. We need to think ahead of these possibilities, even if it's 20 years from now.

Think how much time it took to sign the nuclear non-proliferation treaties in the 60s, and the negotiations started right after the end of World War II. So that's almost 20 years. And that's kind of the timeline where I would say there's a high probability that we figure out AGI, very high probability. Could I play devil's advocate just for a minute, though? There are people who just think that...

AI is not really as smart as we think it is and that these risks are overblown. What would you say to those people? I hope they're right. But what I perceive is that the AIs we're building now have superhuman capabilities on some things and also subhuman capabilities that even a child would not make these mistakes. The other thing I observe is the trend.

If I look at the last 10 years and the benchmarks, the old ones and the new ones, it's very clear we continue making progress. It's like there's no stop in fight. Maybe there will be one. Maybe we hit a wall. I don't know. But...

If we want to be on the prudent side, we should consider the possibility that we continue that for a few years and reach a point where, whether you call it AGI or not, reach a point where the capabilities are sufficiently dangerous that in the wrong hands, they could be catastrophic. And eventually...

Even without having full dominance over all human abilities, if an AI is superhuman in enough areas, it could be dangerous as well. Persuasion is an example. You only need this one, right? Persuasion, and you can control people, and then people can do your bidding.

So you see, you don't need to have an AI that knows everything. It just needs to press our buttons very, very intelligently. I'm just saying that we make a fuss about this concept of AGI, but really from a safety and security point of view, we should be thinking about capabilities, individual capabilities that with the wrong goals, the orthogonality principle, can become dangerous when against us, right?

So whether it's in the hands of other humans or an AI that we lost control, we don't want that to happen. Indeed.

So you won the Turing Award with Jeff and Jan, which is basically the Nobel Prize of Computing. But Jeff got the real Nobel Prize. I know, I know. I did think that when I said it. But you wrote that you feel a sense of loss over the potential negative consequences of your life's work in AI. And this is your life's work. I mean, it's incredible what you've done. How do you reconcile that? I'm a human. I should have seen it coming. Yeah.

You know, I had some students who were worried about this a long time ago, and they told me about it, and I read a few papers and books and so on, but I thought, oh, that's great. Some people are worrying about this, and we should have some research to understand those possibilities. I'm glad some people are working on it. But I wasn't taking it seriously for myself until ChatGPT came out. Then I realized that I had a responsibility, and I wouldn't be...

comfortable with myself if I didn't do everything I can, everything I can to contribute to reducing the risks. On the basis that it might be a risk? On the basis that it might be, maybe not, but there's enough indications that it can be a catastrophic risk that I felt I couldn't do anything else. But pivot, go against my own community,

I've been among all the other AI people saying, oh, AI is great, it's going to be bringing so much benefits to society. And I had to change that mental picture to incorporate also the catastrophic risks.

These ideas do creep up on you. I mean, I've spoken to so many safety folks and the ideas just get slowly baked in over time. When I interviewed you last time, I saw that you had a copy of The Precipice on your bookshelf. I'm sure that was very influential for you. But how do you think about the zeitgeist of this kind of movement? How is it changing over the last few years? I'm new to this. I'm learning.

I don't think I expected in the last year and a half that I started getting involved with like signing those letters and so on and talking to journalists about it. I didn't think that we would have as much impact as we've had. So, you know, the glass is half full. There's much more global awareness of this issue.

the half empty issue is that there is the awareness of the risks is extremely superficial. Even in the AI community, I talked to a lot of AI researchers and then I asked them, so have you been reading or like thinking about this discussion and this debate? What do you think? And most of the time I get an answer that tells me they read the headlines and then maybe they made up their mind one way or the other.

But very few people take the time to dig, like read up, think about it, make up their own mind, try to see the logic of different positive or negative scenarios. And that's true for AI scientists. It's also true, of course, in the general population where they don't have the references. They think about science fiction templates and politicians. It's the same thing.

Is the movement a little bit Western-centric? And if it is, why is that? So I've been talking to people in developing countries. I've been also talking to people in China. And it's easy to understand from their point of view that the problem is ours. Like we're creating the problem and their problem is being behind.

And they're going to build AI systems that are going to be weaker than the ones like the frontier ones that we build in the West. So their AI systems are not going to be dangerous. We know that like smaller ones, less capable ones are less dangerous. It's all about capability. Like risk is directly associated with capability. I mean, risk comes from capability and goals, like intentions.

So if you don't have the capability, you can't do a lot of harm. So from their point of view, they want to reap the benefits and they don't want to be left behind. By the way, that's true of China as well. They feel they are behind. There's a little bit behind, but if you go to... I was in Vietnam recently to get another price. And they're developing quickly. They want to embrace technology, science, but

These issues of catastrophic risks, they basically are in the hands of a few Western companies. They can't do anything about it. They think they can't do anything about it, but they think they can develop their economy by deploying AI, training their workforce to engineer it in various applications and building their own even sovereign capabilities. But it's going to lag.

for a while. I wonder by how much, because Alibaba have just released some incredibly strong language models. I suppose the question is, what is the moat? Is it technical knowledge or is it just raw data and compute? All of these things. And capital, which is connected to O3. If we were to freeze the scientific and engineering advances in AI, there would be no moat.

it would be quickly eaten up. But of course, that's not the reality. The reality is we're continuing to accelerate towards AGI. And there's this possibility at least of rich get richer, that as we advance the, for example, programming abilities of AI, we can help advance our research faster than otherwise.

So the companies, for example, that are building the frontier AI, they have these models that they haven't been deployed yet. So they have some number of months that nobody has access to their system except them, and they can use them to design the next generation. Eventually, when we approach AGI, that means we start having AIs that are as good as our best AI researchers. Now there's an interesting thing that happens. It's really worth explaining.

When you train one of these frontier models, let's say it takes a few hundred thousand GPUs, the future ones, more or less, or maybe the current ones that we don't know about yet. But that's the order of magnitude. Once it's trained, you can use the same GPUs to create a few hundred thousand copies of the AI all running in parallel. And in fact, more because if you want to think of them as like people doing a particular task, they can work 24/7, right?

So let's say one of these companies is able to build a system that is as good as their five best AI researchers, like the cream of the crop. After this AI has been trained, that's really good at AI research, they go from five to 500,000. That's a big jump. In reality, there's going to be intermediate steps where the AI isn't quite as good as

as the best ones, but now they're increasing the workforce of different abilities in the process of creating AI. So it's not going to be necessarily a sharp turn, but there's a chance that whoever's leading is going to start leading more because they can use their own AI to advance. I don't know if that's going to happen, but it's a plausible scenario which has a flavor of winner-take-all, which the companies are well aware of.

And so that's one reason why they're racing. If they thought that being second would be good enough, then there wouldn't be as much pressure. But they all think that it's a winner-take-all game. Something that Hinton says quite a lot is you could have a thousand AI agents. They could be like von Neumann, and they're just doing things a thousand times faster. But does it really scale like that? You know, there's this book, The Mythical Man-Month.

which is that software engineering, it doesn't scale very well. When you have another person on the team, another person on the team, you get this kind of sharing bottleneck. Do you think... Humans. Well, why would it be different? Okay, so one fundamental difference is the bandwidth. The communication bandwidth between humans is very, very small. A few bits per second. And the communication bandwidth between computers, I don't know the exact numbers, but it's like a million times or some many, many zeros more.

That's a very, very good reason why you could parallelize the work a lot more. By the way, that is also the reason why these LLMs know so much more stuff than any of us could. It's because you can have 100,000 GPUs each reading a different part of the internet and then sharing their learning through high bandwidth communication where the weights are shared or the gradients are shared.

It's the same process. So the kind of collaboration that you could have between computers might be very different from the kind of collaboration we have between humans. It could be much tighter, almost like if it was one organism. Yeah, I can see the argument. I have an intuition that the reason why humans struggle to understand each other is we have very situated knowledge and representations.

So we understand things very differently. And even with language models, you know, I find that O1, because it has so many distractors in its context, so it's thinking about this and thinking about this, it gets confused more easily. And in a weird way, even though we've copied the weights of all of these neural networks, because they've taken different trajectories and then they share the information, I'm just speculating here, but it might not be quite as big of an uplift as we think. Well, this was an issue 10 years ago.

It has been solved to the extent that we can put 100,000 GPUs on a cluster. I'm not saying that the same recipes will work for a million or 10 million, but engineers have found ways that you can parallelize very efficiently, at least for training. Of course, inference is even easier, but in a way, solving a task together is more like training because you need to exchange lots of information to be efficient.

Clearly, I don't have the answer to, you know, is that going to be an obstacle or not? I'm just saying the conditions are quite different. The break point of paralyzation might be very different for that reason. Eventually, yeah, maybe it becomes an obstacle, but this is so far off our, you know, human experience that it would still be a huge advantage.

How responsible do you think the hyperscalers are? You know, Dario, for example, he's recently become a bit more of an accelerationist. What are your kind of perspectives on that? I understand the concerns about China. But I think it's a mistake. I don't think Dario actually makes this mistake. It's a mistake to think that it's either the West, you know, stays in the lead, you

and doesn't deal with safety properly, or we slow down and deal with safety properly, and then maybe China takes over. These are two possibilities, but we have enough resources, capital, human, to both do safety right and stay in the lead. The way to do that is simply to make sure we put enough capital in the safety bucket. Once you understand that

humanity's survival is at stake, it's clearly worth it. Or once you understand that, well, we understand that democracy is at stake, so you want to stay in the lead. You want to make sure democracy is staying in the lead. And by the way, I think it's also important if you think about democracies being in the lead, that it's not just the US being in the lead. Also for this reason, we need to put together all our resources

to move towards AGI while doing it safely. So that means we need the capital not just in the US, we need capital from other democracies, we need the talent from other democracies, we need the energy from other democracies to run the data centers. We need the electrical grids that might not be sufficient in the US.

There's a greater chance that we achieve both safety and maintaining sort of democratic advantage if we take the right decisions and we work with multiple democracies together. I spoke with Gary Marcus recently, and he was saying that the Silicon Valley companies, it's a little bit like, you know, with cigarettes and social media and so on, and that they

they're not being sufficiently regulated. I mean, I'll give an example in the Lex interview of Dario. He was kind of saying that they have these guidelines around reaching certain thresholds of, of, of intelligence. And they, of course, make those designations themselves. Now they do loads of loads of great work on the O one model. They had Apollo research doing lots of safety engineering and so on. So they do lots of good stuff, but do you think that they should be regulated? Yes, it should be obvious. Like,

we don't want companies to grade their own homework. We need external neutral evaluations that represent the interest of the public. Now, I think the real question is not, you know, should we have regulation or not? It's what regulation, right? How do we make sure we, you know, we don't stifle these advances? And I think there are answers. So the general principle

is don't tell the companies how they should do it, how they should mitigate risks, how they should evaluate risks. Use transparency as the main tool to obtain good behavior. Let me explain why transparency is so powerful. First, the obvious, the companies want to keep a good public image, at least in democracies. Second, they don't want to be sued.

If your risk assessment becomes a public document, or at least a document that a judge in a court could see, because there's some national security things, but some things are going to be redacted and some things not. But presumably a judge could have access to the whole information. So now a judge would have enough information to declare, well, you didn't do...

as much as you could, given the state of the art in safety, for example. You didn't protect the public. And so now this person or this group of people who lost billions of dollars are suing you and they're right. You could have done better. So the effect would be obvious, right? If you know you can be sued because you act in a dangerous way, then you have to be honest about the risks. As a company,

If you want to avoid these kind of lawsuits, first you need to know what risks am I taking and then how do I control them to balance these possibilities. So suddenly they have to do the things that we want. I'm not saying this is a perfect process, but at least it's an easy one. So companies should be forced to register. The government needs to know what are the big potentially dangerous systems.

And then those that registered need to tell the government and the public to the extent it's reasonable what their plan is, so-called safety and security frameworks, and what were the evaluations that they did, and then what were the results, and what kind of mitigations do they plan to do, and what kind of mitigations did they actually implement. So if a company

says, "If we reach that level, we will do X." And then they don't do it, then they can be sued if something bad happens. So that's how powerful transparency is. And it doesn't require the state to actually judge and tell the companies exactly what to do. It just forces them to disclose all that information

and maybe with independent third parties because the government may not have all the expertise. So we already have companies springing up to do these evaluations, so long as they're not paid by the AI company. So we have to be careful. We need to learn the lesson from finance. I think that there's a reasonable path here which doesn't prevent companies from

deciding what is best, both in terms of capabilities and in terms of safety, and stimulates innovation in safety, which is really the thing we need right now. Yeah, that sounds quite pragmatic. I mean, I was a bit concerned with the FLOPs regulation. Sarah Hooker did a wonderful paper on that, by the way. But what is the metagame that these companies are playing? Do you remember when Sam Altman went to the Senate and he was begging them to regulate him?

Should we be cynical about that? Do you think that was just regulatory capture? I don't read minds. Worse than that, people can be biased unconsciously because that's what psychologists call motivated cognition. So they might even be sincere, but it's just a story that fits them better, that makes them play a more beautiful role. And we all do that.

So by default, I'll assume that those people are sincere. But because humans can be fooling themselves, they do, we need other eyes on the projects that don't have any financial or personal incentives one way or the other, except the well-being of the public. On the safety front, and maybe this is going to connect with the questions you want to ask later, but on the safety front, there are so many open questions.

I say we need to do a lot more research, so that's obvious, and we need to put the right incentives. But I want to insist that we need many different threads of research, many directions. We should welcome all the projects that try to help in evaluations, in mitigation, in even redesigning the way that we build AI.

Because this is so important. In my opinion, it should be humanity's number one project because our future is so much at stake. So we should put all our minds on figuring out how to do this safely. And right now, there's a bit of a concentration of... Everybody's doing the same thing or two or three different things.

both on capabilities and on safety. On capabilities, we see everybody doing the same sort of LLMs and RLHF and whatever the recipe is. Now everybody's going to be doing internal deliberation. On safety, there's also a lack of diversity. We really need to invest more broadly. This is a place where academia can help because academia is naturally widely exploring.

Sometimes academia may not be the right vehicle. If you do a safety project that's also going to have potentially consequences in terms of capability increase, then academia may not be the right bet because you may not want to publicize advances in capability for reasons similar to why the companies are not publishing their work anymore. I mean, in their case, it's a mix between

commercial competition and also being worried about adversaries using that knowledge against us. Or somebody using that and making a mistake and creating a monster. So there are good reasons why some research needs to be in academia and some research needs to be in ideally non-profit organizations. And of course, the bulk of the research is going to continue being in industry, but

even in safety, but we need to put the right incentives because that's what the preamble. And right now everyone thinks that in order to build AGI we need to solve the agency problem. My thesis is actually we don't. We can build really useful machines that are not agent and we can reduce the risks

a lot by doing that and still reap a lot of the benefits and still not close the door to agency, but doing it in a safe way. Very interesting. And of course, if we want to have academia doing frontier research, they're going to need billions of dollars. Yeah, that's the other problem. And that is also a reason why it'd be good to

create an alternative vehicle for AGI research, which is public, non-profit, oriented towards AI that's going to be applied for dealing with the biggest challenges of humanity and with safety as sort of the number one principle. But that's going to take multiple governments, billions and billions of dollars. People talk about the CERN of AI. I think that's

that's an important part of the picture we should try to paint. Yeah, is there a clash of incentives as well? I mean, there are several startups that are really super focused on safety, but just to be profitable, they have to work on capabilities as well. Is that a difficult square to circle? Yes, but I think a lot of the safety startups, I mean, it depends. So some of the safety startups are working on stuff that

Like evals, for example, that's not going to increase capabilities. I imagine, for example, that Elias Husker's startup is more of the kind you're thinking about. Yes. Yes, indeed. Well, let's talk about a couple of your technical papers because, I mean, to be honest, it's overwhelming. You've done so many papers just in the last year. But one thing that really jumped out to me is your Were RNNs All We Needed paper. Could you sketch that out for me?

When we introduced the attention mechanisms that are currently used in industry and academia in 2014, actually it was using RNNs as the engine. This was pre-transformer, which came in 2017. And there's an issue with RNNs, the way that their normal design is, which is that...

you can't easily parallelize training over the sequences. So you have to go once... So you've got a sequence of words, for example. And so the neural net has to process one word and the next and the next. It needs to construct an internal state, that's the recurrent state, from the previous steps in order to feed the next step. The problem... If you just had one computer, like a normal computer, a classical CPU, that's fine.

But in GPU days where you can parallelize thousandfold, you're like, well, how do I do that? I can't parallelize because I have to do the sequential thing. So we did a few things at the time to parallelize across examples, but you lose some parallelization. And of course, with transformers, you basically use the same architecture, except you remove the recurrence. And now you can do everything in parallel for the whole sequence in one shot.

and you can get the gradients. And so in the last few years, there's been several papers, not just ours, people starting to explore how we could put back a bit of that recurrence into the architecture. It has some real advantages. And yeah, so with the right tweaking of the architectures, you can do these things. There are already many possible designs. And we are starting to see these, at least on the smaller scale, beat the transformers.

on the large scale i you know i don't know uh but clearly there are some advantages to recurrence yeah i've got some set coming on friday by the way yeah he's going to tell you all about it yes yes excellent yeah but um in in a way though do you think it's a sign that we might have over complicated some of the architectures or the gating mechanism um for example so how how much was that needed

Oh, I don't think so. I think these gates are actually useful. I mean, so we did the GRU, which is a simplification of the LSTM a while ago. And so it turns out you can pretty much get rid of two of the gates, but you still need that non-linearity to get the maximum power. So there's a trade-off. There's a trade-off. You lose a bit on the expressive power, but you gain so much in...

capability because now you can train larger models for longer because it's so much faster. So for now, it's beneficial to do that. That trade-off is working. Very cool. So another paper, a complexity-based theory of compositionality. Now, being schooled by Fodor and Polish and myself, there was always this discussion that neural networks, they can't do compositionality. What do you think? Yeah.

I think that was a very strong claim that was not supported by anything except intuition. Oh, interesting. Go on. Well, our brain is a neural net. Yeah, what's the difference? So the difference is that current neural nets, it's not clear how they do symbolic things. And as we said before, right now the trick is to use the input-to-output loop to throw in some

generation of symbols, like internal deliberation, chain of thought. But it's not completely satisfying and clearly it's not exactly what's going on in the brain. The paper isn't really about architectures though. It's more about how would we quantify compositionality? It's like a not well-defined notion. We have an intuition. I mean, the experts have an intuition about it.

I think that there are different aspects to it, actually. It's not like a simple thing. And so this paper and other work we're doing is trying to pinpoint with mathematical formulae, can we quantify something that would fit our intuitions about compositionality? But in general, a lot of my work in the last few years has been about putting

symbolic things in the middle of the computation of neural nets. So these G flow nets and generative flow networks were in general like probabilistic inference machines. So think of neural nets that have stochastic computation, so it's not deterministic. And some of that could be continuous and some of that could be discrete. So that's where symbols live in the discrete realm. The problem with these of course is that we don't know anymore how to train them with backprop, the usual way it doesn't work.

And so we've come up with probabilistic inference, amortized inference, GFLO nets, variational inference, a bunch of principles and ideas that actually allow to train these kinds of machineries. And in a way, they're closer to reinforcement learning, where in reinforcement learning, you usually think of a sequence of actions that the agent takes, and they can be discrete. And yet you are able to get great gradients

But now think of the same sort of principles or something related where the actions are not in the world but in your mind. The actions are about computation. What computation should I do next? What deliberation should I do next in my mind in order to deliver an answer or prove something or come up with an explanation? These are the sort of things we'd like to have in neural nets that we don't have to really have the system 2 capabilities.

Yeah, I remember we interviewed you about that last time. And my co-host, Dr. Duggar, he likened it to a Galton board. Oh, yeah. You know those things where you put the little balls through and you kind of like, you can tweak. Yeah, you can control the probabilities at each step. Exactly. Yes. Exactly. Very, very good. But that was an alternative to something like Markov Chain Monte Carlo, if I remember correctly. Yes, because...

Because they're stochastic, really you can think of them as generative models. They're sampling. But they're not sampling only at the last step. They're sampling all the way, like in diffusion neural nets. In diffusion neural nets, you've got neural nets that compute something and then we add noise and then again and again and again. So it's a stochastic process. You can also have discrete versions of this. And GFLOW nets are discrete versions of diffusion process.

And then you can mix continuous and discrete. So that's actually closer to how the brain works. The brain is stochastic and also has discreteness. So the discreteness is not obvious. The discreteness comes about because the dynamics of the brain, when you're becoming conscious of something,

has contractive properties, just a mathematical property, which mean that the number of places where you could lend this dynamics is now a discrete set. So instead of just a continuous trajectory, you can have arbitrary, continuous, no, it's not arbitrary, it's

a bunch of trajectories which go to one place, a bunch of other trajectories go to another place. And so these places, they're like symbolic because they create a partition of the total set of possible states. And so you're either in this group or in that group or in that group. And the number of these groups is exponentially large, but you get discreteness. So the brain has both...

a it's like it has a dual nature like a from one angle it seems to be just one big vector of activations but from the other you can read off like oh in which region am i oh that's like this thought this symbolic compositional object why do we need discreteness that's a good question well clearly we use it a lot all of math is basically symbolic

Yes. I mean, even if you manipulate symbols that are about continuous quantities, you get these symbols. So discreteness allows us to construct abstractions. You can think of what we do when we go from a continuous vector space to like a sentence, getting rid of a lot of detail that maybe doesn't matter that much so that we can generalize better. So in particular, you get...

a lot of this compositionality coming out naturally in discrete spaces, like in language, and that is very powerful. That allows us to generalize in ways that may not be as obvious otherwise.

Isn't it fascinating how in the physical world at different levels of scale, you know, in the emergence ladder, you get this kind of vacillation between discrete and continuous. And perhaps even in the biological world, you see this kind of canalization where at one scale things simplify and get compressed and then they expand again and then they compress again. Even in neural networks, that's what we do. We expand, we compress again.

You've got lots of discrete phenomena in the real world. You have cell types, for example. You have convergence of behaviors of cells. That's one that I looked a little bit at. And in physics, you've got phase shifts and phase transitions and things like that. So there's, in terms of dynamics, again, when you have contractive dynamics, which means

two nearby points at the next step get closer. You get discreteness that shows up typically. And that happens in many phenomena in nature and in our brain. Before we go, I'm researching an article on creativity. And I'd love to quote you. What's your definition of creativity? And I know you put a paper out, by the way, which showed that language models can be more creative than humans. But what is creativity? That's a good question.

So I think there are different types of creativities. So to talk about things people see in current AI, you've got the combination of known concepts, creativity, and we're getting pretty good at that with our state-of-the-art LLMs. There's another kind of creativity, which is sort of like the new scientific idea

often it is a combination of things we know because we write it you know we define it in terms of things we already know um but it's uh very far out of the things that we've experienced uh and i i i suspect that this um kind of creativity that's more out of the box is something that requires more of a search type of computation so

When we do scientific research, there's a kind of a search. We try this, we try that. Of course, our intuition guides us. It's crucial, right? But it's not like, oh, we have the solution in one shot. There's a search. Like in AlphaGo, there's a search and there's intuition. And right now we haven't reaped the sort of benefits from the search part in our current LLMs and so on.

There's this boundary between combinatorial creativity and inventive creativity. And I'm not sure whether it's a hard boundary, whether it's a vague, soft boundary. But how could we measure this paradigmatically inventive creativity? I don't know. I think when we see it, we'll recognize it. So if the AI actually makes...

true discoveries that nobody thought about. I think we'll know we're entering that territory, but that's not like a test you can do. But I do think that at a mathematical level, we can design our methodology so that it will be trying to do that intuition plus search like system one, system two more.

And so I believe that will deliver, but how do we quantify? So there is a sense in which scientific discoveries are about finding modes, modes meaning like highly probable explanations for the explanations in the space of explanations for how the world works. So there are many possible explanations. Good ones explain the data well,

the day when we make a new discovery, we discover a new potential explanation that seems to fit the data well. And we can abstract that into small-sized problems as mode discovery in a jargon of probabilistic machinery. So if an AI is trying to discover all the good things, all the good explanations,

is going to be intractable, but it might be more efficient at finding new modes that it didn't know. And some of the tasks that we can design will be focusing on this ability. So I think there's a way to answer your question and do it even at a small scale. We don't need to solve AGI for this. We can design algorithms that will be more creative in their little world.

Yeah, I love this casting of creativity as epistemic foraging because that gives it an intrinsic value. But there's also this idea potentially that it's a social phenomenon or it's observer relative. So categories, let's say move 37 was actually something that we came to recognize as a collective, as being a thing. And that's how it works. But I suppose that there are different ways to think about it.

Yeah, I think the go moves that we didn't expect were a good way to think about that. But I'd like to think also about something a little bit more general and abstract, which is this mode discovery. Epistemic foraging, as you call it. I like that term. It's Carl Friston's term. Ah, okay. Well, it's exactly right. It's foraging. It's exploration. And you know when you found something good.

but you don't know where it is. So how do you guess where good things are in a very high dimensional space? Well, you need to have good intuition, but it needs to be accompanied with a bit of search. By the way, a lot of that search for humans is not happening in individual brains. It's happening at the level of the collective, right? Yes, indeed. Professor Bengio, thank you so much for joining us today. It's been an absolute honor. Thank you so much. Pleasure. Thanks for having me. Amazing.

Yoshua Bengio - Designing out Agency for Safe AI 01:41:53 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

Yoshua Bengio - Designing out Agency for Safe AI