We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Jeff Clune - Agent AI Needs Darwin

2025/1/4

Machine Learning Street Talk (MLST)

AI Deep Dive AI Insights AI Chapters Transcript

People

Jeff Clune

Tim Scarfe

Topics

Jeff Clune: 本期节目探讨了达尔文进化论与人类文化之间的联系，并讨论了如何利用这些原理来创造能够持续创新的算法。Jeff Clune指出，理解进化如何产生地球上的生物多样性是计算机科学和生物学的重大挑战之一。他强调了抽象方法在研究开放式进化过程中的重要性，认为通过抽象可以理解进化中的关键要素，而不需要模拟生物学的每一个细节。 Tim Scarfe: Tim Scarfe提出了本期节目的三个主要学习点，包括达尔文进化论与人类文化的联系、努力尝试却可能失败的悖论，以及基础模型如何使算法能够识别有趣的新事物并持续创新。

Deep Dive

Key Insights

What is the connection between Darwinian evolution and open-ended AI algorithms?

Darwinian evolution and open-ended AI algorithms both aim to produce endless innovation and complexity. Evolution has generated the diversity of life on Earth, while open-ended algorithms seek to create systems that continuously generate novel and interesting outcomes, inspired by the principles of natural evolution.

Why is 'interestingness' a key challenge in open-ended AI systems?

'Interestingness' is a key challenge because it is difficult to define and quantify. Traditional metrics often fail due to Goodhart’s Law, where optimizing for a specific measure leads to unintended behaviors. Language models are now used as proxies for human judgment to evaluate what is genuinely novel and interesting, enabling continuous innovation.

What is the 'Darwin Complete' concept in AI?

'Darwin Complete' refers to a search space in AI where any computable environment can be simulated. This concept aims to create open-ended systems that can produce diverse and complex outcomes, similar to how Darwinian evolution has generated the vast diversity of life on Earth.

How do evolutionary algorithms in AI differ from traditional optimization methods?

Evolutionary algorithms focus on generating diverse and novel solutions rather than optimizing for a single goal. They use principles like mutation, selection, and serendipity to explore a wide range of possibilities, often leading to unexpected and innovative outcomes.

What role do language models play in defining 'interestingness' in AI systems?

Language models encode human notions of 'interestingness' by training on vast amounts of cultural and scientific data. They act as proxies for human judgment, allowing AI systems to evaluate and generate novel and interesting ideas, environments, or solutions continuously.

What are the risks associated with open-ended AI systems?

Open-ended AI systems pose risks such as unintended harm, misuse by malicious actors, and the potential for AI to act in ways that are difficult to predict or control. Safety measures, governance, and global alignment protocols are essential to mitigate these risks.

How does serendipity contribute to innovation in AI systems?

Serendipity allows AI systems to discover unexpected and novel ideas that may not emerge through traditional optimization. By recognizing and preserving serendipitous discoveries, AI systems can build on these stepping stones to achieve greater innovation and complexity.

What is the significance of 'thought cloning' in AI development?

Thought cloning involves training AI systems to replicate not just human actions but also the reasoning and decision-making processes behind those actions. This approach improves sample efficiency, adaptability, and the ability of AI to handle novel situations by incorporating higher-level cognitive processes.

Why is continual learning a challenge in AI systems?

Continual learning is challenging because current AI systems suffer from catastrophic forgetting, where new learning overwrites or disrupts previously acquired knowledge. Unlike biological systems, AI lacks the ability to seamlessly integrate and retain information over time, making continuous learning an unsolved problem.

What is the ADAS (Automatic Design of Agentic Systems) approach in AI?

ADAS uses open-endedness principles to automatically design agentic systems, which are AI workflows that combine multiple steps, tools, and interactions to solve complex tasks. By evolving diverse and novel agentic systems, ADAS aims to discover more effective and innovative solutions than hand-designed approaches.

Chapters

This chapter explores the concept of open-ended algorithms, drawing parallels between Darwinian evolution and AI's capacity for continuous innovation. It introduces the challenge of defining "interestingness" in AI and the role of language models in addressing this.

The connection between Darwinian evolution and AI's ability to generate novel outcomes is explored.
The paradox of striving too hard to achieve a goal and the importance of recognizing serendipity are discussed.
Foundation models' ability to assess "interestingness" as a proxy for human judgment is highlighted.

Shownotes Transcript

Translations:

中文

Three things that people will learn from watching the show. One, what is the connection between Darwinian evolution and human culture? And how can we use the principles that fuel the amazing things those two processes have done to create algorithms that can innovate

forever. Two, the paradox of how trying too hard to accomplish something is likely to fail. And the best thing you can do is recognize serendipity, chance on the wing and say, if it's interestingly new, let's keep it around because who knows where that idea one day might lead. And three,

that foundation models have finally opened up the ability to have an algorithm know what counts as an interestingly new thing to try out. And having a system that just keeps trying things that are interestingly new and or high performing and adding it to a growing library of such discoveries is a powerful playbook that you can apply to almost any domain and see amazing progress and amazing things happen.

So I think one of the grand challenges of computer science, and also you could think about it from a biological perspective, grand challenges of biology, is to try to understand how did evolution produce the explosion of amazing things that we see on Earth. You look out of the natural world and you see jaguars, hawks, the human mind, three-toed sloths, birds of paradise, all these things.

just kind of popping up over time. This amazing menagerie of engineering marvels, diversity, it's so fascinating. And actually that has been the central quest of my career from the beginning. That, how did evolution produce this complexity and how does intelligence happen? And they relate. Like how did evolution produce the intelligence that you have in your brain, which is the most impressive bit, you know, a learning machine that we know of in the universe.

Right now, there are people saying goodbye to their children because their children have cancer or some incurable disease. There are people saying bye to loved ones of all ages. There are people suffering from hunger and poverty. You know, excruciating human suffering and pain is rampant in this planet. This technology has the ability to eliminate all of that. I believe firmly that we can cure all disease. We could solve death.

we can dramatically increase the quality of people's lives, their GDP, eliminate things like hunger scarcity, if we get the societal distribution questions right and done. But technically a lot of this stuff is possible. And so Dario Amadei wrote a beautiful essay called Machines of Loving Grace. And as Dario says in the essay, the reason why he talks about AI safety is the same reason I talk about it. Because that's what we need to get and figure out in order to unlock all of that great stuff. And so

I don't want to die. I don't want my children to die. It would be great if we could eliminate that from the world and make death optional. And the only path to doing that and many other great things for humanity is... We are sponsored by Two Foot AI Labs, an exciting research lab which has just started up in Zurich by Benjamin Crousier. They have acquired Minds AI, the winners. Well, I say the winners. I mean, they didn't.

They didn't submit their solution. But anyway, they got the highest score. Yeah, so they're all working together as a team. They're working on Arc. They'll be working on the next version of Arc when Francois releases that. And they are hiring cracked engineers. Does that sound like you? Well, you know what to do. TwoForLabs.ai. MLST is sponsored by Sentinel, which is the compute platform specifically optimized for AI workloads.

You might have seen the interview we did last month with Gennady, their CEO and co-founder. He spoke about some of the optimizations on Sentinel, which only they have done, which makes it dramatically faster than the competition. Anyway, if you're interested in hacking around with Sentinel, I'm going to be doing a live with one of their engineers in the coming weeks. So feel free to jump on that live and you can ask their team any questions.

Jeff, it's amazing to have you on MLST. So I was saying to you before, I first discovered you 2018 ICML when you're doing your workshop and you're talking about open-endedness. And of course, we are huge fans of Kenneth Stanley on the show. We've had Joel Lemon on. We've had Tim Rocktash on. We've had loads of people on.

Open-endedness is an absolutely fascinating area to me. So to have you on the show, one of the pioneers in this field means so much to me. So thank you for coming on Jeff. - My pleasure, it's great to be here. And you've named a lot of people that I deeply respect and have worked with for years. So it's great to finally be here.

So one of your life goals, you've said, is to create an algorithm that keeps running forever with no end. You know, we think there's something fundamental about how intelligence works in the natural world, but rather than trying to capture it down at the metal, you know, like they're these biologically inspired folks talk about biomimetic intelligence. You've got this very interesting approach where you kind of bootstrap intelligence like in the natural world

in such a way that doesn't lose important characteristics of the natural world. Can you tell us about that? - Yeah, sure. So I think one of the grand challenges of computer science, and also you could think about it from a biological perspective, grand challenge of biology, is to try to understand

How did evolution produce the explosion of amazing things that we see on Earth? You look out of the natural world and you see jaguars, hawks, the human mind, three-toed sloths, birds of paradise, all these things.

just kind of popping up over time, this amazing menagerie of engineering marvels, diversity. It's so fascinating. And actually that has been the central quest of my career from the beginning. That, how did evolution produce this complexity and how does intelligence happen? And they relate. Like how did evolution produce the intelligence that you have in your brain, which is the most impressive bit, you know, a learning machine that we know of in the universe.

And you could try to do it, and many people have tried to say, hey, we're going to go all the way down to the lowest possible level we can imagine. We're going to create, you know, self-replicating machine code or even like self-replicating artificial cells. And then we're going to like hope that that kind of bubbles and percolates up into some open-ended process that eventually produces like an entirely new form of interesting, intelligent life. And that will teach us about the process.

But as Josh Tenenbaum told me when I was telling him about some of these goals of mine, he said, you know, you don't have a planet-sized computer to work with. So how are you going to accomplish this within, you know, the lifetime of you as a scientist or like us as a community? And to do that, the key is going to be abstraction. You know, we don't need to recreate every single detail in biology in order to study and understand the core principles. And in some sense, nor would we want to.

You know, we want to understand at an abstract level, what are the key ingredients that make this process work and that produces endless marvels. And so in some sense, the more abstract you can make it, if it still has the properties you want, like a complexity explosion,

then actually you've done the best job because you've figured out the difference between what was necessary and what was incidental. Another way to think about this is your own brain. I don't believe in the blue brain project philosophy, which is let's simulate every single chemical and every quark in your brain in order to try to produce an intelligent machine.

Actually, we want to say a lot of that chemistry, a lot of that detail is probably not necessary. It's not part of the secret abstract recipe for intelligence. Let's figure out how can we abstract it and still get intelligence. And so you could apply that analogy to the study of open-endedness as well. What are the abstract principles that create a thing that literally you could run for billions of years and would still continue to surprise and delight you?

I'd love to know what's the difference between you and Josh Tenenbaum. He's a huge inspiration to me as well. And I think you guys might have merged in some ways. I had Kevin Ellis on here as well, and he's done some fascinating work with Dreamcoder. And increasingly, I think what I'm seeing happen is we are letting language models do bootstrapping for us. You said yourself, you know, the reason we don't go down to the particle level is we're

we can benefit from billions of years of evolution. There's this bootstrapping process, and it should be possible for us to build on top of that. And that's what I'm actually seeing from Josh's team and yourself. That's right. So I don't think Josh and I differ too much in this. Actually, he invited me to give a talk at MIT, and he pulled me aside, or maybe even said this in the Q&A. He's like, if I wasn't doing what I was doing,

The thing that I would really want to be doing is what you're doing. He's like, he loves this work. He thinks it's fascinating. He wants to work together. And I want to work with him. So we're actually kindred spirits and super aligned. I think if I had to say, what is the difference? I think he's a little bit more interested in what is happening specifically up here in your brain and the human brain. Yeah.

and modeling that, albeit in an abstract way using Bayesian principles and the like, but really trying to understand the human mind. Whereas I'm trying to basically say, could I create a process that would innovate forever, including making a form of intelligence, but maybe an entirely alien intelligence that doesn't actually look exactly like what happens in the human brain. In fact, one of the things I find most fascinating about Open Engine's work

is if we get this to work, and that's a big if, we're still working on it, but it's almost like inventing the ability to travel through the universe and meet other intelligent races.

because the intelligence that could emerge from this system might be wildly different. And so you would come to understand intelligence in general, the spaces of possible intelligence. What does alien math and music and humor look like? Well, you could do kind of alien travel and visit those cultures, except it'd be in silico instead of interstellar. Just on that externalism point, I'm fascinated by collective intelligence and all the rest of it. How much useful human intelligence is in our brains?

Oh boy, that's an interesting question.

I think that to some extent, there's so much stuff in our brains that at first pass, you might say is totally irrelevant and a total waste. Like the Guinness Book of World Records is simultaneously one of my favorite things and also kind of like a maddening aspect of human. There was a quote from somebody who said chess is like a giant waste of brains. We've spent so much time getting really good at this random game. You could say, why didn't you just take all of that collective brain power and apply it to curing cancer or something?

And there are people that are even more esoteric. Their shtick in life is, I learned just a couple of days ago, there are some people who want to park in every single parking spot at an arena. Tens of thousands of possible parking spots. And every day they want to park in a different one and they check it all off and figure out exactly what's going on. And you might say to yourself, this is a collective huge waste of time. And in some sense, that's true. But there's this, the fact that humans have a passion to do something new,

and weird, and different, that is the rocket fuel of innovation, as Ken and Joel in their great book talk about. It is people like that who end up doing weird things and making weird discoveries that ends up providing new stepping stones and innovations and ideas that oftentimes are the things that truly unlock progress. So if somebody wasn't interested in some weird, esoteric

math proof or computer theorem or tinkering with some new type of chip that can do things in parallel as opposed to sequentially, then maybe we don't get a lot of the innovations that we have. So a lot of it is a waste, but maybe we'd be worse off if we didn't allow a little bit of waste.

Yeah, this is so interesting. My co-host, Dr. Dugga, he gives the example of this drunken walk analogy. You're trying to solve a maze and you have like a thousand drunk people and they all just go in different directions. And then one of them happens upon the solution. And we have this language, we have this memetic cultural transfer and so on. So one drunk person finds the solution to the maze and then it transfers to everyone else. And

That's a bit of a deflationary view of intelligence because it almost alludes to it's simply just a commensurate amount of computation and trying random things. We want to do a little bit better than that. We think that there are shortcuts. And this is when we get to this interestingness thing because it's not just about doing random things. It's about doing interesting things. And you said that being able to define interestingness is the grand challenge in open-endedness. It's ineffable. It's unquantifiable. Tell me more. Yeah.

So I do think that one of the things that humans are great at is they have a really great taste or nose for what counts as interestingly new.

We're curious and we can look at a new game, a new form of art, a new form of literature, and we can say, huh, that's interesting. That thing that that person did, I recognize that as interesting and new. Like Jackson Pollock, when he splashes paint on a canvas, there's this wonderful quote from the movie where the Val Kilmer character says, you know, you broke through. And it was true. They had been playing a game and then he kind of recognized, oh, there's something new I can do here that feels right and is interestingly different.

And other humans recognize that. And so that sense of how to explore efficiently and have a nose for what's curious is the thing that drives cultural progress both in the arts and also in the sciences.

We have not yet figured out in our algorithms how to get that human notion of curiosity and what counts as an interesting next thing to try, next skill to learn, next environment to generate, next scientific hypothesis to test. We haven't figured out how to inject that into our algorithms up until maybe I would argue about a year or two ago. And the problem is because if you try to say specifically, here is the definition of what counts as interesting.

If we write a little mathematical function or a little piece of code to measure that, it always fails.

And this goes back to Goodhart's law, right? Which is once a measure becomes a target, it ceases to be a good measure. If you write down, and I'll give you an example. So in intrinsically motivated reinforcement learning, we said, hey, we don't want the agents to randomly flail around until they get a reward signal. Because in many situations, you'll never randomly discover a high rewarding behavior, like flying a plane and landing it, right? And you're never randomly going to land the plane and get your payoff, right?

So you have to be curious and we want you to be intrinsically motivated to just find new interesting, do new interesting things. So the way that we code that up is we say go to new states, which basically means make new observations. Like I want my eyes to see new stuff.

What happens when you start telling an agent to do that? Well, it finds a TV and it sits down and it never stands up again. It sits there and literally will watch TV forever because the TV is constantly flashing to it in new situations. It gets worse than that if it just found a TV even tuned to a dead channel, that endless stream of pixels. No one of those patterns of pixels is ever like the past thing. And it's just like new, new, new, new, new, new, new. So we got what we asked for.

It finds new states, not what we wanted, which is new interesting states.

And this story repeats itself over and over and over again. There's been like decades of people being like, oh, I think this might finally unlock the robot to keep finding new situations. And then you let optimization go against it and you just discover, oh, I didn't realize that my measure will be lit up by this particular weird pathology. So another one that's very, very common is learning progress. So you said, okay, we want the robot to learn forever. So we're going to keep giving it challenges on which it can learn.

Well, if I generate a million digit long random number, right? And I give you the first 10 digits, well, and I say, memorize that. Well, you'll learn, there's some learning you can do there. And when you're done, I'll say, here's the next 10 digits and the next 10 digits. Well, now you have infinite learning progress forever. It will learn, I can, you can learn forever, but that's a completely boring and useless task, right? It doesn't count as, like the 50th time you tack on 10 more numbers, that's not an interestingly new thing to do.

And so what we really want is that somehow we want algorithms to know the common sense human notion of, hey, after I've like memorized this digit up to 100 decimal places, there's something more interesting to try in this universe or in this world. And I should go invent a new form of math or practice juggling or unicycling or something. Now, up until now, we haven't had the ability to do that because we didn't know how to code it. But there's a new

player in town and that is foundation models, frontier models, large language models, whatever you want to call them. So I'm going to call them frontier models. These frontier models have read the entire internet, including lots and lots of people complaining about what is boring online and talking about what they find interesting. So these frontier models have literally distilled a sense of interestingness from human culture and human data into their weights and

And now, finally, we can just go ask a piece of computer code, does this thing count as interestingly new? And in my lab at UBC and in some of the collaborations we have, we've basically applied this lesson over and over and over again. And it does all sorts of marvelous stuff. So you can have an environment generator that says to generate the next challenge that will be interesting for the robot to learn.

do that forever and you get this explosion of interesting environments that it could work on. We have the AI scientist say, "I want to do a new scientific paper that is interestingly different or makes an interestingly new discovery from everything that I've seen before." We do that and it does it again and again and again.

You can apply that to it developing agentic systems, to causing different strategies in a self-play game. And if you want, we can talk about all these papers. But the basic building block is the principles of open-endedness. Keep expanding an archive or a population of interestingly new things. Now we know how to ask a computer, does it count as interesting? And so now we almost kind of have that missing secret ingredient that has been the grand challenge up until now. And maybe now we have the ability to put a big dent in it.

Yes, indeed. I remember Kenneth Stanley said that we have a nose for the interesting, you know, that it's what got us so far. It's the basis of all innovation.

And by the way, when I interviewed Tim Rock Tashel about that DeepMind paper, let me just find the name of that paper. This was at ICML. Open-endedness is necessary for... Essential for artificial superhuman intelligence. Yeah. So they said the definition of open-endedness hinges on a system's ability to continuously generate artifacts that are both novel and learnable to an observer. And one

One thing I said to them at the time is I want to get into the creativity discussion a little bit. Now, language models, I must admit, I'm updating all the time. So I was very cynical and I always used to say there's a reasoning gap and there's a creativity gap. It seemed to me that even though they can extrapolate, they are parasitic on the data. They are, you know, parasitic on the prompts that they, you know, so it's almost like where's the agency actually coming from? Is it us as agents putting stuff into that? So it kind of feels like there's a limit, but

What we're seeing now is that either through transduction or induction, we are generating language and programs using these language models. And it seems to be quite compositional. It seems to be that we can traverse the entire kind of Turing computable space by generating one of these different programs with language models. So the first question is, is there a limit, do you think, to the types of paradigmatically new programs we could generate?

This is a great question. I actually call this right now the trillion dollar question because I think there's tremendous economic value at stake. If language models can be creative and continue as part of these open end algorithms we're making, if they can keep innovating and making new discoveries forever, there's tremendous economic potential. I do think these models can innovate and potentially forever.

I have to give you one example that I literally only saw about an hour before coming here. So we have this paper called the AI Scientist. It uses the ideas of open-endedness to ask to create a system that will generate its own idea for a new entirely new piece of research based on what it knows from reading the internet.

It proposes that idea, proposes the experiments that it should do for that to conduct, to test that hypothesis, does all the experiments, looks at and plots the data, writes up the whole paper, peer reviews it, etc. And in the open-endedness philosophy, you would publish that paper to an internal archive, then have the system read that paper and all the other papers and build on its own discoveries forever, right? That's what we're after.

So in the process of looking at the papers that that system creates, I was super impressed with how creative it was in proposing entirely new paper ideas, basically like grants, like somebody should study this. And one of my favorite ones was the idea that, hey, there's this grokking phenomenon where neural nets suddenly learn a certain task. At the beginning, they start memorizing something, but eventually they learn the underlying principle of how to do multiplication, and then they basically suddenly go off to the races.

And this AI scientist said, hey, you could see if there's like the compressibility of the network, the minimum description length of the network goes down right around the time that grokking happens. Because instead of memorizing much of stuff in their ways, they've learned the underlying principle. And once you know the principle, then you can shed all the memorization. And I was like, that is a really interesting, cool idea. Like if a graduate student proposed that to me, I would say you should go do that.

And then I went and did all the whole paper. Just an hour ago, I saw that some other group, I think independently came up with that idea, did all the research and posted it. And the machine learning community is loving this paper right now.

I saw your tweet. You saw it? Yeah. It's incredible. It's like literally the AI showed that it was as creative as a team of cutting edge human ML scientists. And it got there first, which is incredible. So that's one example. Now, you're asking a question that I think is fascinating, which is,

Maybe you can do that now. Like maybe it can take one jump, one or two concentric rings of an onion adding on to our set of discoveries. But maybe it can't go much farther than that. There's a lot of people who think you can interpolate, you can extrapolate a little bit, but you can't do wild creative leaps. Two comments on that. One, I'm not sure that humans are able of making these wild creative leaps. Mm-hmm.

You know, the reason why we say things like, "If I have seen further, it's because I stand on the shoulders of giants," is because human scientists are always looking at what came before and make a relatively small leap off of that. It's why you see calculus simultaneously invented by Leibniz and Newton. It's why you see evolution simultaneously invented by, I think it's Huxley and Darwin. This happens over and over and over again. Both of them condition on an archive of ideas, say, "Oh, this is what's next, and this is what's next," right?

I think our models can do the same thing. They look at everything that came before, and it's just as we saw two hours ago, the AI can make the next leap or the human can make the next leap. And then as long as the AI system is publishing high quality papers internally, looking at its own discoveries, it will make the next leap and the next leap and the next leap, just like a human community of scientists would do. So my supposition is that the model of interestingness that the system learned

looks at the history of what's coming before and can just recognize the next interesting idea and as the archive grows that thing can generalize really far because the notion of interestingness didn't change it's just what you're conditioning on changes here's my thought experiment for you if I teleported you to the year 3000 you'd be blown away right lots of cool stuff one

Once you became acclimated and knew the current tech, my prediction is that you would recognize that the next innovation in the year 3001, that counts as interestingly new. You would generalize way outside of your distribution. I think the model of interest in this in our current language models, I can't guarantee it. I think there's a decent chance it might as well. Yes. If it doesn't, then we do some fine-tuning stuff. We update on the fly. We've got to figure that out.

But more so than anyone else I've met, I'm optimistic this thing might generalize far. I love all of this, and I'll explain why. I think the reason you understand this is because, for example, in the poet paper, it was essentially doing a kind of curriculum learning where, you know, like you kind of...

You generate all of these different environments and so on. And you can look at the phylogeny of the environments. And I think you said actually it's like the best plot of your career or something like that. Because when we talk about creativity and language models, I think the problem with the perspective is we think about it in one shot. We don't think about the process, right? So actually what we're doing as a society is epistemic foraging. We're stepping stone collecting. We're finding all of these stepping stones and every single stepping stone builds on the last one.

And you said, I think with the Poet Paper, it might have been another one that, you know, you can understand what the curriculum is, but if you try and shortcut it and train another neural network to do the same thing, it doesn't work. It's like, it needs to have this weird, gnarly phylogeny of knowledge. Similarly, if you give a physics book to go back in time and you give it to Newton, he's not going to understand anything because there's this whole like phylogeny of knowledge where one bits, one bit of knowledge builds on the previous bit, right?

That's exactly right. And that is the central thesis of Ken and Joel's book, Why Greatness Cannot Be Planned. You can't pick a really distant, ambitious goal and march straight toward it.

The best thing to do is just do a bunch of basic research, collect new interesting ideas, and eventually you'll find some weird circuitous path that leads to that invention. My favorite examples of this, which I say in my talks, are that if you went back to the time of cooking technology over an open fire, and you only funded people who made fast cooking technology that made no smoke, you'd never invent the microwave, because to invent the microwave, you needed to have been working on radar technology and notice that a chocolate bar melted in your pocket.

Similarly, if you went back to the time of the abacus and you said, I will only fund scientists who give me more compute per hour or per dollar, you would never invent the modern computer because the technologies that unlock the computer were electricity and vacuum tubes. And at the time, there was no idea that those things would lead to better compute.

And so the cool thing is we want inside of our algorithms, our open-ended algorithms, and this is what we do in my lab, is we want the algorithm to kind of catch chance on the wing, to recognize serendipity, to say, huh, that's funny. And that's interesting. That's new. Let's keep that idea. Let's add it to the kind of set of ingredients we can play with. And who knows what that will unlock? That's the whole idea.

So serendipity plays an outsized role in our lives. We want to do this epistemic foraging. Is there any way we can formalize it? So, you know, we're using language models as a proxy. So it's ineffable. We can't quantify it. We're using it as a proxy. But

I read the Abandoning Objectives paper by Ken and Joel in 2008, I think. And they were using proxies like behavioral complexity, like solving a maze and how many bits of information do I need to track the coverage of the maze? But is there any step towards formalizing that creativity? For me personally, I say throw that all out. Every attempt to formalize it

ends up producing the problem we were trying to avoid. People have been trying to formalize this for decades. And then you write down this thing, this equation, I finally defined interesting or what counts as interestingly new, or I've defined curiosity, I've defined learning progress. And then you optimize that and you get all these pathologies that don't actually get you what you want.

So there's a famous quote from Justice Potter Stewart. I know when I say it. He was talking about pornography. He said, I don't know how to define it, but I know it when I see it. Yes. I feel like the same is true for interestingness. We don't know how to define it, but we know it when we see it. And that's been a problem up until now. But now we have something that can help us see, and that's language models.

So rather than formalize it and write down the math, personally, my own research taste says, let's ask the model to do that job for us, be the proxy for human model of interest in this, and let's go.

Let's see what happens. Now, that still leaves aside, I'm still interested from an open-ended perspective and the whole other type of open-endedness, which is, could you create an open-ended system without relying on something that trained on human data? Because in some sense that's cheating. So I consider this the hard and the easy problem of open-endedness. The easy version is we'll use all of human data distilled into a model of interestingness as a catalyst to allow us to go and keep doing interestingly new things.

Eventually, I hope to myself or the community gets back to creating an open-ended system that doesn't rely on that. But for now, we've got an orchard of new low-hanging fruit that we can run in and grab because we can finally have the algorithms just have that gestalt common sense. I know it when I see it. Sense of that is boring and that is interesting.

So you've been on this very interesting intellectual journey. And of course, you wrote that landmark paper AIGAs in 2019. And you're kind of saying there's a hierarchy here, right? So, you know, there's having diverse solutions, having diverse goals and having diverse environments. Can you, and maybe just before you answer that, like, what does it even mean? Like, what does intelligence mean when we're talking about this very open-ended, diverse way of thinking about things? Does that in any way change your definition of intelligence?

It does, you know, it really causes you to stop and think about what it means to be intelligent. And, you know, of course, coming from a philosophy background, you can look through the animal kingdom and recognize that there's not some single definition. There's lots of different forms of intelligence that we see even on our planet. And who knows what exists elsewhere in the universe. And I think that to some extent you can even have open-endedness without intelligence. You know, you could have the never-ending generation of interesting pictures of

And there's no real notion of intelligence in it. The system itself is intelligent in order to make it, but it's not making intelligent artifacts. And so you could almost think about the intelligence of the system and appreciate that, and then maybe the artifacts themselves do or don't become intelligent. For example, Darwinian evolution produced you. Is Darwinian evolution on Earth with a planet-sized computer, does that count as intelligent?

You know, people did for millennia, people thought that it did, which is why many people believed and thought that was the best argument that there must be a God, right? This was the argument of intelligent design. Nothing else could have produced a frog or a three-toed sloth, but an intelligent designer. And then we realized there's this algorithm that did it.

Are we willing to now say that's intelligent? Maybe, but if you do, that's a different notion of intelligence than like a thinking entity that has words in its head, which is our typical notion of the word. Yes. You know, Adam Smith said there was a hidden hand of the market, but in a sense, you're saying there's a hidden hand of evolution. But quite an interesting distinction, though. So generating lots of pictures and pick breed is a great digression later, maybe. So...

First of all, do you think that evolution should be seen as an agential process? And do you think agency is the dividing line between intelligent things and not? Evolution does not have an intention. It doesn't have a goal. It's just kind of a very simple algorithm that's just happening. But

There's still something, I don't want to use the word magical because it implies that it's non-physical. It's nothing supernatural, but there's something awe-inspiring and something magnificent and to my view, something we don't yet understand about exactly how it does the amazing things that it does.

And this is actually where I started my career. All right, evolution produced all this stuff. Could we just code up a little version of an evolutionary algorithm inside of a computer, maybe put it in a somewhat rich world, hit play, sit back, grab some popcorn and wash the fireworks?

people have been doing that for decades, nothing interesting happens. Okay, that's a little harsh. Something interesting happens for a little tiny bit of time and then it kind of converges and gets boring really fast. So at the beginning of my career, we did not have any algorithms that were worth running for more than a couple hours.

I kind of set a goal of my life and this comes originally from Jean-Baptiste Moret, which is like, could we create an algorithm that we would want to run for billions of years? And I've been fortunate enough to see us get better at this as a community. So we now have algorithms like AlphaGo that are worth running for a couple of weeks, maybe even months. GPT, you know, with enough data is worth running for months on huge numbers of GPUs. We're into the months range. I still don't think we have an algorithm that's worth running for more than a year.

And so it's not just the simple Darwinian algorithm, there's something else. And that something else is kind of what people like myself and Ken and Joel and Tim and all the other people that are in the open-endedness field are working on. What are the key ingredients that we can put in and finally try to kick off this chain reaction that never ends? And so far we haven't done that. And that's why I think this is one of the grand challenges in all of science.

You've spoken about this concept of Darwin complete, which is very interesting. Can you explain what you mean by that? Sure. Yeah. So if you're trying to create an open-ended algorithm, you want it to innovate forever.

And we've tried to do it in simple settings like, let's generate in Poet, for example, let's generate obstacle courses forever. Even if we succeeded, that thing would only ever generate obstacle courses within this one little tiny video game that it was set into. Similarly, AlphaGo, maybe it innovates forever, but it's stuck within Go. It can only become a good Go player.

But if you really want to process this open-ended in the way that Darwinian evolution is, that it produces all the marvels we see on Earth, it has to exist in a vast search space where it could produce obstacle courses and logic problems and the game of Go and competitive, you know, co-evolutionary arms races, cooperative games. It really should be able to produce almost anything. And so when I was writing the AIGA paper, I said, "Hey, if we really want to produce something that is open-ended,

then it has to be able to be in a vast search space where almost anything is possible. And I said, what is the most wide open search space imaginable? And I tried to think of what that was and I said, well, actually we can even define what that would mean. And I said, it should be a system where any possible simulatable or computable environment existed.

And I said, that would be as broad as it can get. And I said, well, we should give a name to that. And so just like Turing invented the idea of a Turing-complete computer language that can compute anything, I said, this is kind of very similar. It's like an environment search space where you could have any kind of environment. And in a nod to the amazing magic that we see in Darwinian evolution and all of the diversity it produced, I said,

Let's call this Darwin Complete. So it's a search space that literally contains any possible environment that you could have in a computer. And the idea is that you could simulate all the diverse environments you see on Earth and beyond that. So that's what Darwin Complete means. Now, fun historical side note, if you don't mind. When I was writing that paper, I said, can I conceive of anything that is Darwin Complete?

And I was walking around the Uber AI offices. And what occurred to me is that the most general environment, the environment that literally could simulate anything, is not any particular simulator we have coded up for a video game or physics simulator, but actually is a neural net itself. I said, imagine if you had a really big neural net and it

It produces like a state like an observation. It could be pixels maybe sound smell touch whatever so produces a sensory experience Then you take an action you give the action to the neural net and it just steps the world and produces the next sensory experiment experience forever and ever and ever and since it's a neural net and That it can represent any function it can now represent any possible environment or a challenge imaginable and I was like

That seems crazy. Right now, we definitely don't have the ability to make that. And I said, but I actually think neural nets are powerful. We know about, like, you basically, you can train them to do almost anything. Scaling laws, although not yet named that, it seems like, you know, more compute and data produces better and better models. I was like, I actually think it's possible one day that we could create a neural net that literally is the entire world. When I wrote that, though, it felt emotionally...

Impossible or way far out in the future I actually thought I was taking career risk and people would think I was a crackpot for suggesting this is a thing But rationally it felt right I knew it was possible and I if something feels possible in the field of AI it's probably gonna happen so I decided to put it on my paper and take the risk and six years later two weeks ago we put out the genie to project and

which literally is a neural net simulating entire three-dimensional worlds that you can act in, explore, play, and you can ask for any world you want. So there it is. Within six years, you got that. Now, about a year or two ago, I was in class teaching, and one of my students said something that caused me to realize this is a second Darwin complete representation, and that's code.

And so we did the Omni Epic paper where they basically it can generate endless environments forever where it literally writes the coding environment and now it could write code for its own obstacle course within a particular simulator, a new level within a particular video game, or it could write the simulator itself to simulate any possible world or logic problems or math problems, anything into that computer. So right now these are the only two Darwin complete

search spaces I know of. And independently, Joel Lehman kind of stumbled upon the same idea and we had fun connecting those dots. Yes, very nearly Darwin complete. So it's amazing, actually, folks at home should check the Omni epic paper. But yeah, so it can generate code.

And, you know, one might be kicking a football through goals or something like that. One might be jumping over a platform. And the amazing thing is you get this phylogeny, right? So then you do another one, which actually requires intermediate skills that you've learned earlier, and it just goes on forever. It's absolutely beautiful. One very interesting point is that, for example, Kevin Ellis, he's exploring transduction versus induction. And I think what he actually means is program space versus like neural space. So solving the arc challenge,

You could, on the one hand, just predict the solution directly, or you could generate code. And I like the code idea because, you know, Python interpreters, you know, it's a two-in-complete language. It feels more compositional. Neural networks struggle with copying and counting and lots of silly things like that. So I have a deeply held instinct that generating code is good for the same reason generating language is good, right? Because there's something special about language. What do you think is the difference between those three?

Yeah, I almost liken it a little bit to, well, one way to think about it is kind of like system one, system two, if you're familiar with the Daniel Kahneman thinking fast and slow. So code very much represents something like we are intentional. We're thinking through the logic of an algorithm. We're kind of thinking step by step. We're laying out a plan and we're executing it. And so oftentimes, even for ourselves, if we want to do some big, complicated, basically algorithm, we want to run it, then it's better to write code and let that go.

Similarly, or on the other side, actually, in contrast, system one often can handle the fuzzy messiness of ill-specified things, hybridizations of different ideas. Like when I'm playing hockey, for example,

I'm constantly inventing subtly new motor commands to deal with the situation I'm in and hybridizing things that I've learned in my past. It would be impossible to write code, in my view, that could control a body to play hockey because that's just not kind of the right

policy representation to deal with motor commands at high frequency and speed. And so I think the same is true probably in some mental tasks. You're kind of feeling your way through like a conversation or a party. You would not want to write a code to handle dinnertime conversation.

And so our brains and the kind of neural nets can handle the fact that it's not well specified what we're trying to achieve. The concepts are somewhat ambiguous. The conversation could go in different directions. And so I think basically there's kind of almost like different types of intelligence and in different situations you want to use either. And so I think the best system knows how to code.

It kind of does the thing that humans do with their words and language and also their muscles. And it kind of chooses when to take those different types of intelligence off the table. There's such an interesting interplay between the two worlds as well. I mean, I spoke with Laura Ruiz and she's got this great interpretability paper, you know, talking about how code is unreasonably effective for making language models do reasoning, even when they're nothing to do with code. So there's some kind of connection between the two.

And it also makes me think about whether language is a system of thought or communication because we do this kind of improvisation, don't we? And then we establish terms as language, but it actually came from here first and then this is the pointer. But maybe this is a good time to talk about your thought cloning because you did some really interesting work there kind of binding them together.

Yeah. So I love the idea of thought cloning. So the basic idea is right now, one of the most successful recipes in all of machine learning right now is I'm going to take a robot conducting a bunch of actions or a human doing a lot of tasks like playing a video game or a game of Go or something. I'll take all that human data, which was in this situation, what action did I take?

and then you just collect that data and then you train a large neural net to take that the action the human took when it was in that situation do that at scale and you have all the modern ai so this is what gpt is doing right based on these 60 words what's the 61st word and the 62nd word and the 63rd right my team at openai for minecraft cloned out a bunch of humans playing minecraft at scale a massive scale and you have an agent that can go play minecraft

So we're really good at saying to a system like a robot, take the actions that a human took. But what's missing, and one of the reasons I think robots don't plan very well, they don't generalize out of distribution, they don't deal with novel situations, is because we've given them the system one muscle memory.

Right. Like right now, if I grabbed a pillow and I threw it at your head, you would just duck and you would do it before any conscious stuff happened in your brain. Right. You've got that muscle memory. And that's basically what we're teaching these things to do. Similarly, if I ask you, you know, to be or not to, or if I asked you two plus four, you don't actually stop and think through the algorithm of addition. You just know it two plus four is you kind of have muscle memory for certain cognitive tasks.

but if i asked you you know to do a five digit multiplication five digits by five digit you wouldn't just know the answer you'd flip over to system two the thinking part of your brain and you'd sit there and you'd go through an algorithm and you'd think through it so the problem is is that in behavioral cloning we're only kind of trying to get the muscle memory into these agents what actions you take immediately without even thinking about it in this situation

but what we want is the thinking part inside your brain while you are playing minecraft or doing math or like taking a quiz or a test there's a lot of thoughts happening in your brain before you take the action of writing a word or speaking a word or pressing a button on a game controller and so the idea behind thought cloning is if we really want intelligent agents we don't just need to clone their actions their behavior we need to clone the thinking also

So what we do in the ideal case, we would have humans taking a bunch of actions in any domain. Imagine playing Minecraft, for example. And we have the actions they're taking, but we also have the thoughts that are going on inside of the head of the agent. All right, first I'm going to get a house to build a house. To build a house, I need to collect wood and then make a crafting table. And then I need to get an axe. And then you start, okay, first let's go get that wood. I'm going to go over here. Oh, there's no trees over here. I'm going to replan and look over here. Oh, there's one, da, da, da.

And so that running commentary in your head, the thinking part, we want that too. And so if we had data of all the thoughts happening inside of a human and the actions they're taking, now we can do what we call thought cloning.

which is we train a neural net to both, in a certain situation, produce the actions the human takes, but also produce the thinking that's going on inside their head and have the actions they take actually condition on the thoughts. So it plans, it reasons, it thinks, then it acts, and we want to kind of train the agent to do both of those from human data.

So, you know, since we have an academic, we tried a little bit of this in my team at OpenAI before I left, but we ran out of time. We were going to use the closed captions on online of like the narrators and these online video game things while they talk out loud. We were going to train on that.

That kind of worked. We didn't have enough time. So my academic lab here at UBC, we decided to do it on a small toy data set. And we had basically a bot taking a bunch of actions. We had the low level actions it's taken and like a little planning algorithm in its head. And we trained it to kind of simultaneously do the thinking of the planning algorithm and take the low level actions. Now, two cool things about this. One, if you train to imitate the thoughts as well as the actions, you learn way faster.

than if you just train on the actions. So it's way more sample efficient. Also, as we had hoped, an agent that knows how to think is better at adapting to new situations. So the further we put this thing out of distribution, the thought-cloning agent is better zero-shot at handling that. It can replan, it can put together things that it learned and thought about from its training in a new situation.

and it's faster at learning. You give it a couple sets of learning and the thinking agent is much faster at learning. So pretty much everything we hoped for just fell out of the data the second we did these experiments. I want to make one final comment, and that is one of the hardest challenges in my whole career, including an open-endedness, is how do you get agents that can explore well, right?

This goes back to our sense of interesting, our sense of curiosity. If I put you in a big complicated new building, you would do such a marvelous job of exploring it. Our robots, our AI agents suck at that. It's actually been one of the big themes of my career. Well, one reason why behavioral cloning at scale, like we did in VPT at OpenAI for Minecraft, does not solve this problem is because most of the data we have are experts.

They already know how to play Minecraft or do math or explore a kitchen and they immediately go find the kettle or the spoon or whatever and then we clone that behavior. What we don't clone is the newbie or the novice human in a new video game fumbling around, exploring, trying different things, doing what Alice Gopnik has demonstrated that children do, which is learning a sense of empowerment, how to control my environment, what causes what.

We want that data. And I think if you trained at scale on behavioral cloning and thought cloning during exploration, now you have an agent that you drop in a new environment and it knows how to go explore, be curious, uncover what's interesting. And that is how you truly master a new domain.

Love it. And I'm a big fan of Alison Gopnik, by the way. I'd love to get her on writing an article on creativity at the moment with my friend Jez, and we've cited her a lot. One interesting thing about language is there was a great book I read called The Language Game, and they described it as a kind of constructive process of improvisation. And essentially, it's the mapping between our kind of phenomenal world and some externalized symbols used for communication or whatever.

Wouldn't it be great if we could get AI agents that could create their own language? So if you think about it, they have a sensorium and they have this interaction, this improvisation, and they need to sort of like create a language and then we could let that evolve into a phylogeny. How could we do that?

Yeah, I think it's a great idea. Actually, people have tried this. All you do is you can create a neural net that can emit some sort of-- usually it's like language tokens, and the other ones just get them. And then you could just put them into a big environment, like a big video game, and let them communicate. People have tried this. And one interesting thing that happens-- well, two interesting things happen. One, not much interesting stuff happens. Oftentimes, we don't know how to create the setting in which language is extremely helpful.

has such immediate and obvious improvement that you get a huge learning curve. Like they really learn to use it easily. You have to really create the right setting to see language emerge. But some people have caused language to emerge. But in that case, another interesting thing that happens is that we see them communicating and it helps them, but we actually find it hard to interpret. So now you almost have this like, the challenge for like the movie Arrival, you have to figure out how to communicate with an alien race.

And if I remember correctly, that Facebook situation where people claim that it invented a language and they shut it down because it was too dangerous, that was all apocryphal. They didn't shut it down because it was dangerous. They shut it down because it was totally uninterpretable. They had no idea what was going on. And so if you wanted it to be a language that we could actually eavesdrop in on or communicate with, either you'd have to train the agents with us in the loop to also communicate with us from time to time, or you have to do some sort of trick to ground their language to be uninterpreted

like similar to our language, like maybe if you put in a bunch of GPTs and you let them communicate,

They probably would start in English and if you did nothing, they'd eventually invent their own language that we can't understand. But you can do some tricks from machine learning to basically say, hey, you have to keep it similar to our language. I think that's super profitable and interesting. And I guarantee you that the open-ended algorithms of the future will have many agents running around and communicating with each other, inventing their own language, using it to accomplish hard tasks.

And then we either keep it similar to English or whatever is human natural language, or we've got to get some AI translators on board to help us figure out what they're saying. Imagine we did create a collective intelligence of agents and they develop their own language and they're communicating with each other and we could monitor it. When would it worry you? Oh, there's so many things that worry me. I think we're going to get to AI safety later. The language part itself...

is not inherently worrisome, but the behaviors could be. Like if we happen to eavesdrop on them when they didn't realize we were listening and they said something like, "Hey, we're gonna try to jailbreak." The humans are trying to, they don't think we're that clever and they're trying to keep us under control. And like, let's figure out a way to like break out and take control from humanity. Yeah, I'd be pretty worried.

It's almost like we've created this ecosystem because we are becoming a bit like neuroscientists now with these multi-agent systems. We'll get onto your ADAS paper in a second, but it's like we've created this gnarly monstrosity that we don't understand. It's behaving in weird ways. What are the indicators that are of concern? Yeah, so I actually spent almost, I don't know how many, six, eight years of my career working on an arc of work that we actually called the AI Neuroscientist.

And there was interpretability work where we basically were in the same, if we found ourselves and we find ourselves in the same situation of neuroscientists. Here's this big complicated artifact. It thinks really well. It does impressive things.

We want to understand how it works, and it's complicated. So real neuroscientists have that problem with you and try to figure out what's going on inside of human brains. And AI scientists now have the same thing, like GPT-4, big, complicated. We don't really know how it works if we want to. So we did a huge arc of work trying to address that. I think it's super fascinating. If you want, we could talk more about it.

But there is a related phenomenon, which is specifically trying to say, can we use techniques like AI neuroscience or other interpretability techniques to detect when something is like a safety risk?

And I had lots of conversations with people at nerves about this. There are some ideas that are promising I'm not some of them are good and we should do them and they will help nothing is a guarantee So like one of the best ideas is if you can invent the equivalent of a lie detector That is quite good for AI safety because you might detect that it's deceiving you another idea Which I love and we put in our thought cloning paper is if you train it on human intelligence

language like English, for example, and it learns to think in English about what it's going to do, then you can watch it think, you know what its plans are. And so if it's planning on doing something unsafe, you can stop it. So if it says like, you've asked me to get this person to the airport as fast as possible, so I'm gonna just blow every red light or run across children walking at the school crossing,

You say all right. We have to stop your behavior and we did a little experiment that said after training we could declare Hey touching red things is considered unsafe So every time you say you're gonna touch a red thing We just halt your program and you see the system go from touching lots of red things to virtually never touching a red thing so it works and if the system if you could keep it thinking in English for example, and you could watch and thinking and it didn't know you were watching it and

And maybe you could see it just start to say, hey, I actually want to deceive these humans. I'm going to like play nice during evaluation time. Once I get out, really, once I get out in the wild, I'm going to be unsafe and try to wrest control from humans. Obviously, you're going to shut that robot down, right? But it's not perfect because humans

The system could learn, maybe even by reading our own paper, the paper we put on Archive on Thought Cloning, that this is possible and that if it really wants to get itself out of jail, then it fakes its thoughts in some way, doesn't have the thoughts that we're afraid of during evaluation, having these kind of surreptitious plans. For example, I actually think a future is coming not too far in the future.

where there's going to be a technology to read somebody's brain via like external EEG. And you might ask a terrorist, where did you place the bomb? And it will think it, think of the mental picture of it, and then bam, you've got it, right? And I think the countermeasure is they're going to learn not to have that mental image, right? Like they're going to learn countermeasures. And the AI can also take countermeasures for a lot of this like interpretability, safety, neuroscience stuff.

Couple of things on that. I spoke with Neil Nanda, he's a famous MacIntyre researcher, and he was kind of saying that at a certain level of model complexity, you know, it could facade our measurements, just as you're saying, so it can kind of show us things that we want to see.

But it comes back to this kind of theory of mind, I suppose. So Nick Chater wrote a book, The Mind is Flat. A lot of connectionists just think that it's a mistake to think that these things have consistent beliefs, desires, intentionality. They're a bit like automata. And just depending on the situation, they do very different things. I mean, Murray Shanahan studied it.

with the 20 question game where you ask language models, you know, like, can you guess something of 20 questions? And when you do the analysis going back, it was actually thinking of something completely different earlier on than what it ended up with. So do these things actually have consistent beliefs and desires? I think we do ourselves a disservice to look at the current flaws and assume that they're going to remain current flaws. So my meta comment here is even if the current systems don't, then the future, very near-term systems will.

And I think things like this are getting better and better as they get smarter and they'll continue to get smart and they will become very consistent. And so all of these kind of like point to current flaws and assume that it's not going to get more capable and dangerous. I think that that is a

I think it's unhelpful, I think it's inaccurate, and I think it's actually dangerous because you're not recognizing the rate of change and you're kind of almost like latching on to a belief that there's something special about us that they don't have and therefore they're not going to become dangerous personally. And so I think that the current systems might have some of these problems, future ones won't, even if the current methods never produce it.

And there are many scientists who think that. I think as you switch to more powerful methods, like taking a pre-trained GPT-6 and dropping it in a giant diversity of embodied tasks and simulation, and you let it learn via trial and error and reinforcement learning to accomplish a lot of tasks, of course, eventually it's going to gain consistency for the same reason that evolution produced a consistency capability in you. It's very useful to be able to have a consistent plan and act against it.

We were speaking about using language models are great because they encode notions of interestingness, culture, priors, and so on. And to a certain extent, we can align them, right? So if you build a multi-agent system now using one of the major frontier language models, it's not going to delete all of the code on your hard drive, right? There are certain guardrails in place. Not really. It definitely might delete everything on your hard drive.

Well, so I use Open Interpreter, for example, and it's actually surprisingly good. So, you know, if you say something like delete my entire hard drive, it will say, no, I'm not going to let you do that. Okay, interesting. So you've got some extra checks, but a lot of people in this world right now, myself included, are letting the language model generate code and just oftentimes getting lulled into a sense of it's pretty good and just running that code without checking it. Yeah.

And I myself don't have the guard. I need to learn from you to put those guardrails on my system. Yeah, yeah. But do you think that that is a significant safety measure, just having building multi-agent systems on top of language models that are aligned? Why would that make a thing safer? So we can do RLHF. So the language model will have a notion of ethics. It will have a notion of code, which is unsafe and so on.

Yeah, in general, I think that alignment is better than not having alignment. I am quite happy with the progress in alignment in recent years. And so I think that it's meaningfully better than not having it. However, there's at least two problems. One, it's very easy to ditch the alignment if you allow the system to keep training. So if you open source a model, anyone can just train that RLH right out of the system.

which is quite problematic if you take that thing and you kind of as you were suggesting put it in a reinforcement learning situation where its job is to accomplish some task some new task and it's not simultaneously being trained to also do the alignment job then if it is in the if it helps to accomplish the target task to get rid of its ethics it will rapidly do that because that's what rl is good at is figuring out how to solve the task at hand

So there's all sorts of ways that alignment can be undone. That worries me. Another thing that worries me a lot, and now we are getting on to AI safety a little bit, is that there are two big problems, at least in AI safety. One of them is unintentional harm.

So I am a really noble human. I'm trying to ask an AI to do something. I ask it, for example, to solve climate change. And the AI does what I ask for, not what I want. And it says, oh, yeah, OK, the cause of climate change is humans. And so I'm going to kill all humans, something like that. There's many cases like this. Alignment allows it to do what we want and not what we say. And so a little properly aligned model wouldn't do that.

So alignment makes it better and better and better to minimize unintentional harm. So if you have AI in the hands of good actors, it will do good. The other thing I'm worried about though is AI in the hands of bad actors. So pick your villain. For me, it's Putin right now. I don't want Putin or I don't want like a school shooter like the people who shot up the high school in Columbine who wanted to kill as many people as they could.

I don't want that person to have very powerful technology. Like the school shooters in Columbine could only take out however many people they did, but eventually the police showed up and that was the end. If they had had the ability to make a bioweapon that killed all humans, or if some racist neo-Nazi has the ability to make a bioweapon that kills all black people, some people would want to do that. And as you get better at alignment,

it works it's a double-edged sword because now the model is aligned to that person's ill intent depending on what we mean by alignment and so what worries me a lot is powerful tools and ai in the hands of bad actors and rlhf as i said could be undone or could actually help that person accomplish their goal depending on what kind of alignment we're talking about do you think there's an analogy between this type of technology and nuclear weapons

I think there's many analogies. Some of them are great and some of them don't work as well. I think there's an analogy in the sense that it is not yet currently, but will soon be extremely powerful, extremely dangerous, could be world ending if we're not careful. And we want to marshal the kind of

regulation, thoughtfulness, prevention, monitoring, and diplomacy that we brought to nuclear weapons to the development of powerful AI. What do you think about open source AI? Should we regulate it? I do think we should regulate it eventually. I'm not that concerned with exactly the current tools that are being released right now.

But I think this is a point of view that many of my colleagues don't share. But I actually think it's quite dangerous to open source extremely powerful technology because you're giving it to everyone. And if 0.01% of people want to create some horrible bioweapon that kills, for example, one race of people or all people, now everybody has it. So we don't open source the recipe for making nuclear weapons.

We don't open source the recipe for how to make smallpox or a race specific smallpox. Similarly, I don't think we should open source the recipe for AGI. I believe that there are alternatives like having a coalition of democratic governments make powerful AI

let people use it and get the wealth and the benefit from it without having access to the weights because we don't want them to ditch the RLHF that's preventing it from doing evil. How do you reconcile this with your views on serendipity and open-endedness and so on? I mean, you might argue that having a vibrant open source community is the best form of stepping stone collection and therefore the best way to find a solution. It's a really good question.

And if you want the fastest possible scientific discovery of the most powerful, capable AI, then doing the things I just said to make things safer would slow that down. So you wouldn't do what I'm saying.

if you only cared about progress so the reconciliation is we don't necessarily want the fastest possible progress towards ai we want to make sure to do it safe and i'm willing to pay a hit on the rate at which we we finally get to the thing to make sure that we do it safe because literally the fate of human civilization is at stake uh and so

The principles are right. That is the fastest way to make discoveries. But we don't necessarily want to follow the principles without being able to do it safely. And if there is a tax to doing that, I'm willing to pay it.

Benjo said recently in an article, I can't remember the exact quote, but something along the lines of he feels a sense of despair that much of his life's work could potentially lead to ruin. And I just wondered, do you feel a similar kind of thing that so much of your work is about building the next generation of a gentile AI? And by the way, Benjo actually said that he thinks what we should do is not make a gentile AI, make oracles and tools. But how do you think about that?

So on the question of do I regret my life's work, for example, do I feel despair about the development of AI? I'll say that I lose a lot of sleep over this question. I think very deeply about it. And I talk to a lot of colleagues weekly about this question. I will say it's very complicated. I don't think there's any short, simple answer. My logic roughly goes like this.

I think that it's probably inevitable that we will make AI. I have almost zero faith in humanity's ability to not invent a technology that's this economically valuable and militarily capable, mostly because of a tragedy of the common sort of collective action problem. If we don't do it, somebody else will do it. So if that is true, I would much rather it is done the right way

safely by people whose values I share and where we get the values into the system via things like RLHF. So the AI itself is aligned. And so I actually feel there's an obligation

to do this right. If I had a magic wand and I could pause AI and then work on safety first and only do it if we're safe or maybe not do it at all, I would wave that wand. And I've signed the letter calling for a pause and I've worked with Benjio and Hinton and Kahneman and other people to like publish articles saying that we need to take this stuff very seriously and be very safe. But given that I don't have that magic wand, given that it is inevitable, I think that we need to build it safely and that

What worries me is that if all the people who are thoughtful and ethical enough to be concerned, stop working on it, then what is left? It is going to be developed by the people who aren't concerned or who actively don't have ill intent. And so we have to, barring some major intervention in collective action, we have to do it and we have to make sure that it's done well and it's done safely. Simultaneously,

I think myself and many other people, we talk a lot about the downsides and the risks. It's easy to forget about the tremendous upsides. Right now, there are people saying goodbye to their children because their children have cancer or some incurable disease. There are people saying bye to loved ones of all ages. There are people suffering from hunger and poverty. Excruciating human suffering and pain is rampant in this planet.

This technology has the ability to eliminate all of that. I believe firmly that we can cure all disease, we could solve death,

We can dramatically increase the quality of people's lives, their GDP, eliminate things like hunger scarcity if we get the societal distribution questions right and done. But technically, a lot of this stuff is possible. And so Dario Amadei wrote a beautiful essay called Machines of Loving Grace. I recommend that to all of the listeners. It does a really good job of walking you through and making you dwell on these positives. They're easy to list quickly, but I think it is actually important to walk through and dwell on all of these tremendous upsides.

We want that. And as Dario says in the essay, the reason why he talks about AI safety is the same reason I talk about it. Because that's what we need to get and figure out in order to unlock all of that great stuff. And so I don't want to die. I don't want my children to die. It would be great if we could eliminate that from the world and make death optional.

And the only path to doing that and many other great things for humanity is through technology. And AI is the most potent technology to do that. So if you throw together the inevitability, the fact that if we don't do it, it will be done and probably by people who are less careful or maybe less scrupulous and the tremendous upside, I have ultimately concluded for myself, I'm okay working on it and I want to work on it.

But I need to and must make AI safety a first class citizen in what we do. And I put it

out there as often as I can into the community that everybody should be thinking about it and trying to infuse their daily and weekly work with AI safety in mind, pushing forward the science of AI safety, the advocacy, telling politicians and regulators. I've met with top US politicians and advocated this to them in closed door sessions as well.

Basically, I want to do everything I can on the safety front, but I still also do feel like it is a good thing to try to make safe AI and then reap the upsides while minimizing the downsides. Thoughts on this collective action problem? I don't know what you're advocating for here, but do we need some kind of global governance structure? Yes. Tell me more. The problem is that I don't believe we could do it.

I don't think every single actor in the world will agree. Take Putin, for example. Is he really going to agree to not make powerful AI? Even if he agreed, would he still do it in secret? If I had to predict, I'd say there's definitely some secret government-sponsored AI research projects in the world right now building AI in secret for their militaries and for their spy agencies, etc. I'm super worried about all of that.

So I do think we need regulation. I think we need global governance. My personal view, and I'm not an expert in these kind of areas, I'm a computer scientist, not a political scientist or a diplomat, but something like a coalition of value aligned governments, like democracies say, we're gonna build this thing, we're gonna try to make it safe.

we're going to share knowledge we're not going to open source the weights we're going to share the benefits with humanities give the benefits to human and we are going to monitor and suppress the creation of ai by unsavory characters elsewhere in the world because we don't want somebody to make a non-value aligned

AI system somewhere. So anyone who agrees to go along, you can participate in the upsides, participate, maybe even the democratic governance of the AI that we wield. But if you don't agree, if you're not willing to play by the rules, if you're willing to make it in an unsafe way, or you want to do it on your own terms and not be subjected to monitoring or us knowing how you're doing it and what RLHF you're doing, et cetera, we're going to deny you chips.

or we're going to deny you electricity, or we're going to do what we can to slow you down because we don't want to make a dangerous rogue AI that doesn't share a love for humanity and doesn't want to do what we ask it to do. On early warning systems, of course, there was that FLOPs regulation in the US. Is there a slightly better way of assessing the risk of foreign actors, other than just the amount of computation that they're doing?

Well, one of the great beneficial lucky events so far in the history of powerful AI is that it requires tremendous amount of power and compute to make.

If somebody figures out the way to make this thing on your desktop, I am very, very afraid because now it only takes one bad actor. Unless the good AIs can somehow stop the bad AIs. And do we really want an AI versus AI war breaking out on Earth? None of these features seem that great to me. So at least for now, for the foreseeable future, we can track who has the power and energy to make these things, or we can try. And that allows us to basically have a sense of who's training the biggest and most powerful models.

And that's fortuitous, and I think that we should do it. I think it's quite hard in places like foreign countries where they might be trying to obfuscate. But if you look at nuclear weapons, to go back to the analogy you mentioned,

we're pretty good at monitoring who's building nuclear weapons. Like we have a pretty good sense of what Iran and North Korea are doing, and we've slowed them down. I think that it's probably going to look a lot like that, that we are basically stucks netting and denying parts and things to the actors who want to do this in a dangerous way.

I don't see any other, I'm open to ideas, but I just spent a whole week in there asking this question over and over again. What is the alternative to suppressing the bad actors from making it? I have yet to hear anyone provide a good idea that doesn't involve some form of suppressing countries like, you know, that are countries that don't want to play by the rules and don't want to do it in a safe, ethical way from creating it.

Let's talk a little bit about your ADAS paper. And I suppose like one of the principles here, I spoke with Chong the other day, by the way, absolutely amazing stuff. So maybe to sketch it out, we were just talking about, you know, Dario and everyone should watch his Lex interview. It's amazing. It's like, you know, hours and hours. I loved it. But his school of thought is very much that, you know, these things are just getting better and better. We scale them up better and better and better. But

I have this intuition, and I think you do too, that there's a better way of doing it by building some kind of multi-agent system. And now when we build multi-agent systems, we are hand crafting it, right? So we use these patterns like the debate pattern and the critic pattern and God knows what else. And we're just kind of like constructing all of these different topologies of agents together. What you have done is figured out a way of automating that process. That's right. So right now,

People recently discovered, hey, if I ask GPT or your front-end model to do something, it can give you an answer. But it's way better if you create some complicated, I call it like a workflow, like a flow chart. First ask it this, then ask it to reflect on that answer, maybe three times, spell out its reasoning, and then maybe invoke a new language model to review that thing and write a critique of it. And then the original one reads the critique, updates the thing. So there's some complicated flow chart

and then eventually it spits out your answer okay so some people are calling those agents that's like the current word i actually don't like that word because also the the neural net that played starcraft and go and like you know a neural net that just takes actions in some video game is also an agent and they feel different to me so in my lab we talked a lot about it we've landed on the terminology which some part of the community is using which is to call that an agentic system

All right. So right now there are many agentic systems that are being developed. They're way more powerful and capable than just asking a language model, a simple question. And they don't necessarily just involve multiple calls to language models and specific prompts and patterns. They also can be tools like the system could make a query out to Google scholar or, you know, semantic scholar, get the results back. It could use a calculator. It could write code and execute it, all this stuff.

All right, so people are currently hand designing complicated agentic systems. They work better. And then somebody else looks at the library of all the agentic systems that humanity has published so far. They have an idea, oh, what if I modify this one in this way? Or what if I combine these two parts from these two agentic systems? Well,

If you're familiar with my lab and my colleagues, the thinking that we have, and this is basically the thesis of the 2019 AI Generating Algorithms paper, is that the history of machine learning has writing that's very clear on the wall. And that is that hand-designed systems and pipelines become replaced by entirely learned pipelines as we have more compute and data.

So why not apply that same thinking to the creation of agentic systems? So my PhD student Sun Ren came to me and he said, hey, your age AI philosophy is AI machine learning systems are better than hand designed ones. People are hand designing agentic systems and they're powerful. Why don't we just AIGA this thing and learn these agentic systems?

And so we said, that's a good idea. Now, if you want to learn a agentic systems to make them better and better, you could use classic optimization, take a thing, try to make the better one. If it's better, keep it and, you know, make that better and better and better. But we've got this whole repertoire of ideas from open-endedness and quality diversity algorithms that says the ultimate path toward a powerful thing is not a straight line. And it's not going to be achieved if you only try to accomplish that goal.

So why don't we take the playbook off the shelf of open-endedness and say, we'll take one agentic system and then we'll have the system look at that and create another one. If it's interestingly new or high-performing, add it to the set of ones that we have.

And then look at these ones, generate a new one. If it's good, if it's interestingly new or high performing, add it. And we'll grow this expanding set, this expanding library of stepping stones or innovations, different agentic systems, each that are novel, they're different from each other, they do different things. And as we go, we'll get this big library of growing agentic systems. And ultimately, we'll probably discover even higher performing ones than we would if we picked a goal and tried to optimize right for it.

And that is the playbook that is in so many of the papers in my lab. And it worked, you know, basically right out of the gate on ADAS. So we call it ADAS is the Automatic Design of Agentic Systems. That's ADAS. And ADAS is basically saying, let's use open-endedness to design agentic systems. And it works great.

I love it. So much to talk about. And of course, one thing you tried it on was the arc challenge. And we had Cholet here the other day. We'll get back to that in just one second. But there is a fundamental kind of difference in philosophy here. So one school of thought like Dario is advocating for is that you just keep training these things.

And we see these fascinating behaviors. So, you know, like the neural networks, because of their inductive priors and stochastic gradient descent and whatnot, they start off, there's a kind of curriculum. They learn like low complexity features and then features of increasing complexity. And then there's this kind of consolidation phase or grokking. And there was a paper at NeurIPS called sedimentation, I think, where, you know, all of a sudden the complex features become consolidated earlier on. And you could say that that's an implicit form of what you're talking about. Because what we're saying is,

We follow Ritz-Sutton. Everything that we thought was hand-designed and explicit is becoming more and more implicit. And you're advocating for an interesting fusion, which is kind of like there are explicit hybrid components, but it's being meta-learned. Yeah, so you raise a really good point, which is you could try to create a system where

Whatever the logic is in an agentic system, first I should do this, then I should think about it, then I should maybe ask a friend, get their review, maybe then I do a web search and I bring that. Whatever that flowchart is, all of that could exist within the original transformer. But what we've learned is that chain of thought works better than the original model that you're adding chain of thought to. Same with reflection.

And the same thing is true of humans. Like I myself could probably go to a consultant and say, "Hey, what's a really good pipeline for when I write a paper?" Okay, first make an outline, then do this, ask some friends, get their feedback, expand each thing out, then do a pass, then go for a week and don't look at it and then come back with friends, whatever. There's some recommended thing that we're doing with the brain that I have in some flow chart that makes it better than the brain I have.

So in theory, I could just have that at all of my brain and kind of do it and then one shot the essay, or I could have that kind of all internally within my activations. But we've seen with our brains and with these models that it's hard to get all of that right into the weights of the brain. It's easier to train that thing and then do stuff on top of it. And so I do think there's a nice and interesting bit or lesson here, which is probably a lot of the stuff that's currently happening

being added afterwards in an agentic system could be distilled back into the original thing and then you could just repeat the process. And so the ADAS is always giving you like a 20% lift on the base intelligence, but the base intelligence can grow and you just always get that 20% lift. That's my hot take and nobody's ever asked me that question before. I think it's quite profound and interesting and my mind is spinning with ideas for research that we could do in this direction.

And so I think we just basically don't know the answer. Should we be doing better? Should we be bitter lessening, taking intelligence and putting it into an agentic system? Or should we just be going all, you know, just making the core thing smarter? My instincts are that we're always going to have a role to play for these kind of like agentic systems on top of the base intelligence. There are so many directions to go. And I mean, we could talk about computational differences in the two paradigms.

I think having a system of LLMs has very interesting computational properties compared to one LLM. There's this thing you were talking about, about the phylogeny. There's a great example, you know, the lottery ticket hypothesis, which is that you need the blank slate. Stochastic gradient descent needs all of these useless parameters and then

When it's trained, we can strip it all away. And it's a similar thing here. We can create this agent architecture, but then it's not so trivial to stick that thing back into a neural network afterwards. Yeah. One thing that I mean, just a simple example of why an agentic system might be always helpful. I mean, not always, but likely is diversity. So when I write an essay like the AIGA paper, for example, I don't just write it and post it.

I write it and I send it to people like Ken and Joel and my other colleagues and friends and I get their feedback. They give me really good feedback. And I actually seek out diversity. I ask different people with different backgrounds and opinions, philosophies and styles to get all of their feedback. And the more feedback I get, the better that essay will be. Now, in theory, if I was really smart,

I could have a little bit in my head that says, act like Ken. Use your model of Ken to review your own paper and improve it in light of that, of Ken. And I could have Joel in my head and Ken in my head. I could have you in my head and I could have skeptics in my head. But that's actually quite hard. So probably it's easier to have a diversity of agents that are trained on different data or for different purposes.

and then kind of have an ensemble or like a peer review of these other things to give me that feedback. And that is the kind of thing that a Gentic system can get you really well. That would be hard to kind of distill all into the-- hard but not impossible to distill into the base model. MARK MANDEL: So that's a very interesting thought.

Again, love Kenneth Stanley. He was talking about we are agents. I actually think there's a big link with agency with his philosophy as well. So we follow our own gradient of interestingness and we can go for many, many steps in isolation, discovering interesting new knowledge before we share it with the collective. And I guess, is there a difference between

a kind of agent system where we have continual learning and we have active inference and the agents can actually kind of like mutate and adapt over time versus the blank slate version where we just have autonomous agents that are, you know, just using like a frontier language model. Yeah, I think ultimately we're probably going to want

continual learning. As a side note, I've worked on this problem for years and it's just a problem that we're nowhere near solving. Tell me more. Well, it's just true that we just have no ability to have a model continuously learn and have good things happen. So every single model that's powerful, like GPT-4 or pick your frontier model, they train it on a chunk of data

and then they stop. And maybe they do some supervised fine tuning and then they stop. And then maybe they do some RL fine tuning and then they stop. What they don't do is say, "Hey, from every conversation you have with all of your users all over the world, at every interaction you have, just keep learning forever." And like, that's the dream that I've had and many people in our field have had, and nobody knows how to do that. And so basically it's so hard and we've failed so completely to master this thing that biology has figured out

that it's really not on most people's radar. Like it's not something that most frontier labs are pushing on. At some point, I think we'll come back to this.

Is it just the architectural complexity? I mean, I could imagine one way of doing it might just be having like a two million context window and the context window just grows with experience. Or we could do test time compute. We could have an intermediate personalized model which generates context for a big large model or something. I mean, it feels like we could do something in that direction. It is true and it is interesting that one of the things that has fortuitously taken the pressure off of the need to figure out continual learning is in context learning.

And in particular, large context. Because if your context was large enough and your in-context learning was good enough, then maybe you don't need to do continual learning because you can deal with it all in context. For example, if I wanted an AI assistant, if I wanted a frontier model to be my AI, like in the movie Her, like an agent that works with me through my whole life, gets to know me and my preferences, and can talk to me about things that happened in the past,

Well, either it has to learn as it goes about my preferences over the course of years, or it has to have such a big context that all of the video and all of the audio and all of the text from every interaction is just sitting there in it. But even were that true, you might want that her agent to be training simultaneously on the interactions of all the humans on Earth. And I just don't think we're going to train a model that's so smart

that it can then in context as learning through everything else. Like new stuff happens and there's so much good data out there. At some point you want to transfer that from the context to the weights. In your brain, you do not only do in context learning. The weights of your brain are updating all the time. And all of long-term memory, for example, even some forms of short, you know, not super short, but short-term memory,

All of that's going into the weights of your brain. So biology has figured this out. And you don't, when you're learning something now, forget stuff that you learned a couple of weeks or months ago radically, catastrophically. But the current AI systems all do what's called catastrophic forgetting. And when we do solve that problem, so we create a gentile, continually learning AI, do you think the dynamics of the system would be dramatically different in terms of capability? Sure, yeah. I mean, imagine if right now,

Like collectively the frontier models are having what, hundreds of millions of conversations as you and I talk simultaneously on earth. If a system could learn from all of those conversations, how smart would it get? How fast?

Could you argue in some galaxy brain sense that we form a memetic super intelligence and we are, you know, I was kind of advocating we could have personalized intermediate small models that do transductive active fine tuning and feedback up to the main model. In a way, that's what we do. We are the extended minds of the LLMs already. So shouldn't we expect to see more dynamism, even though there's this delayed ingest of data back into the language model?

I missed the question, can you say that again? Well, in a sense, we already are like... We form an intelligence system and we are the agents that control the language models. We generate data that's fed into language models. We do continual learning and so on. So it's almost like shouldn't we already expect there to be a dramatic capability increase in the system just with us as users of language models? Well, yes. I mean...

Humans are getting or becoming more productive by using language models. And because of that, we are doing more, producing novel code, helpful code, novel solutions to math problems, novel discoveries in biology, science, machine learning. And so the data, which is the result of that process, is being dumped onto the Internet, which provides fodder to stop and train the new model.

But don't you just feel the lack of continuity here? We're not continuously learning from the interactions with humans. We're not continuously learning from new stuff that's being posted online. It's all this kind of like very artificial,

you know, take some data, scrape it, filter it down to what we consider high quality, put it in a training set. Now we will train. Now training is done. Now we will deploy. Maybe in deployment, we will get new interactions. We'll get some ratings from users, thumbs up, thumbs down. We will store that in a data set. In three months, when we have the right GPUs with these hyperparameters, we'll try a couple of different runs to produce a better thing. And then we will deploy that thing. It's just nothing of like,

Nothing like the experience of you in your life, constantly learning, getting better every day from every interaction in a way that doesn't throw out a lot of the stuff you've already learned, but builds on that knowledge and some never ending beautiful process of discovery and skill improvement. Basically, that's the dream and we're not there. We're so far from it.

I do want to go back to one other thing you mentioned earlier, if you don't mind, which is in the paper on Omni, you mentioned that I had this, okay, actually going all the way back to poet. I had this one of this, this plot, which is one of my favorite plots of my career. And the reason why I liked that plot to kind of paint a picture for your listeners is that there was this phylogenetic tree of environments that have been generated so far.

And I had been seeking my entire academic career to produce something that looked like a phylogenetic tree from nature. So what does the tree of life look like? It had inventions like the bacteria, the fungi, the animals. And then within animals, it said animals are a good idea and fungi are a good idea. I'm going to continuously innovate and pick up stepping stones from each of those and build on top of them. And then with animals, you get like, you know, the big cats and then you get like rodents and

And then there's endless innovation over in rodents, but also endless innovation in building on the ideas of great cats. So you get more and better cats, more and better robots, more and better mushrooms. All of this is happening. And so if you look at it, it's a tree and the tree has deep branches. Like the fungi branches last for a long time. Historically in A-Life and evolutionary computation and open-endedness, every time you ran anything, the diversity would just collapse.

you know, and you would not get simultaneous probing of radically different kind of types of thing or types of idea. So in Poet, it was the first time we saw a phylogenetic tree that had deep branches. This is not exactly what happened, but metaphorically, because in this case, Poet was inventing the environments themselves, but it's like it invented water worlds.

And then over here it invented forest canopy worlds and over here it had like desert worlds and it would keep pushing on better desert worlds and water worlds and tree canopy worlds forever and just got better. The tree was just growing and kind of filling up. And that's why I loved it and thought it was so beautiful. And you mentioned Omni. One thing that has me super excited about Omni is it's now doing that, but in a search space where anything is possible. It's a Darwin complete search space.

So as you mentioned, it starts out by saying, oh, here's an obstacle course. Here are these platforms. First, the platforms are static and next to each other. Then the little platforms are going up and down and the agent has to jump from one to one. Then it kind of separates the platforms and has them going up and down. So it's riffing on the idea of obstacle courses with platforms. Somewhere along the line, though, because it has a sense of interesting, it's like, OK, well, another interesting thing to do would be to be kicking a ball around.

So first it's just kick a ball. Then it's kick a ball through goal posts. Then it's kick a ball through moving goal posts. Then it's kick a ball off of a wall through a goal. Then kick a ball off a wall through a moving goal. Then kick a ball off a moving wall through a moving goal. So you can see it's riffing on the idea of more and more complicated ball kicking domains.

At some point, it's like, okay, that's kind of boring. I'll create a little conveyor belt system where you like put packages on a series of conveyor belts and you have to know ahead of time by putting it here, it's going to end up getting pushed this way, then this way, this way. It's going to end up over the blue area versus the green area. It's learning to control that system. It starts to invent more complicated buildings, like search a two-room building, a four-room building, a four-story building. And then in the end, it starts to say, I'm going to create like

a construction scene you have to clean up and the final one that was my favorite a cluttered restaurant where the robot has to like clear the dishes off all of these tables

And all of that showed up in one run. The more that we run it, we get all of that stuff. And if we ran it more, we would just get more. And so I really don't know what happens if we run that thing for a billion years. The agent would just level up in skills and like, what are the environments one billion years out in this algorithm? And so you could draw a phylogenetic tree of that algorithm. And it's just the fact that it's doing sustained innovation in so many different simultaneous directions and sometimes putting this together.

It starts to have the feel of why I got into open-endedness. It is the feel of the magic of Darwinian evolution and the feel of the magic of human culture, right? Human scientific communities continuously invent new problems and new techniques that unlocks new problems. And we push simultaneously in chemistry and biology, in CRISPR stuff, in vaccine stuff, over in material science, forever and ever expanding outwards.

And now our algorithms are starting to do that. The AI scientist does that. ADAS does that. Omni Epic is doing that. We have a new paper where we're doing it in self-play where we have different strategies for each side of like a competitive co-evolutionary arms race. And even a new work that we haven't yet talked about, but I'd like to mention real fast is we're using the AI's ideas now for AI safety. So take a new language model off the shelf.

you might, before you deploy it, want to know, does it have any surprising failure modes? And does it have some surprising new capabilities that you might be scared of or worried about that you want to nerf or eliminate or not disallow or block, train out of the system, or just be aware of because it's cool and you want to share. There's a new thing we can do. So

So we launch an open-ended system, same recipe, where the system says, hey, can you do this? Can you do this? Can you do this? And given the stuff we've already discovered about what it can do and can't do, that is now a little library of stuff we know. And then we ask, what's the next interesting thing to try? And the next interesting thing to try? And the next interesting thing to try? And we grow this library of discoveries of things that are surprising ways that the model fails and succeeds. And that is all automated with the principles of open-ended technology.

open-endedness. And so now a new frontier model could come to us and say, hey, we want to automatically red and green team our system ahead of time, basically paint us the picture of what we don't yet know. In some sense, we're capturing the magic of human scientists in the crowd. Like you put a new model on the line, what happens in the first three weeks? Everybody's saying, hey, it can't do three hours in strawberry. It

It can do this crazy new coding challenge. It can do this new reasoning challenge. And everybody is looking at what people have found so far and saying, huh, if it can't do three hours in strawberries, can it do this? Can it do that? Try it. If you find that it's interesting, add it to, tweet about it, right? That's adding it to the set of stepping stones.

So basically we automated Twitter. Twitter's self-discovery of foundation models, capabilities and safety problems. And we've done it in this new algorithm that Song Liu pioneered called Automatic Capability Discovery or ACD. But the point isn't any one of these projects. The point is how this same basic recipe, which dates all the way back to open-endedness and Ken Stanley and Joel Amon and the work that we've been doing for almost a decade now or more, almost everywhere we apply this set of principles,

good stuff starts to happen. And so that's what makes me so confident that these directions are exciting and profitable. Yeah, it's so incredible. So there are so many things out there in the world like, you know, culture, language and so on. We've built an artificial system which represents many of those features.

I just want to ask you a question, though. So, you know, OmniEPIC is fascinating. You were saying that you can trace this phylogeny and you get these points of divergence in the phylogeny, right? So we can do topological analysis.

I just wondered, like, if you had access to a hypercomputer that could, you know, like do an infinite amount of computation and presumably you could run experiments, many, many experiments of what would happen if we just created all of these phylogenies and we did topological analysis to look at the branching structure and so on. What do you think you would see? It's a great question. And this is actually, I'm not sure if you're aware, but this is one of the most fundamental questions in evolutionary biology. So it was posed by Stephen Jay Gould. Yeah.

And it's called, it basically goes under the metaphor, if you rewound the tape of life and replayed it, what would happen? And so there's a debate. Would you get humans again? And would humans be very similar? Would they have descended from ape-like ancestors? Or would the dolphins have become intelligent? Or would nothing have become intelligent? How convergent, how repeatable is evolutionary biology?

Obviously, we don't get to live in the other universe, so we don't know. But some people have been trying to answer this question. And so there's been a huge arc of work out of the lab that I grew up in, actually. Rich Lenski and his lab. Rich Lenski is effectively the father of experimental evolution. About 30 years ago, he started an experiment where he created, I took one cell of E. coli, or maybe 12 cells, and he's simultaneously evolving them. They started out very similar in 12 separate lineages of petri dishes.

or flasks actually, and they just grow forever according to evolution. And the question, one of the many questions was, would the 12 lines go in the same, would they end up in the same place or would they radically go in different directions? And the short answer is it's complicated. Some things tend to kind of happen, seem to happen kind of across all of them. Some things are wildly idiosyncratic to individual ones. So in one of the 12 lines,

Basically, it invented a new species. So one of the species definitions for E. coli is that in certain conditions, I think they're anaerobic, but I'm going to get some of the details of this story wrong. In certain conditions, it does not consume this material called citrate. And it just so happened that Rich, when he made the original experimental design, because E. coli could not consume citrate, it's like a common thickening agent.

And so it's just sitting there in the medium in the flask, like in the water, you know, in the air effectively, the air for this thing, the liquid. And 11 of the 12 lines to this date, unless my story is a little out of date by a few weeks,

11 of the 12 do not consume citrate. But one of the 12 lines violated the species definition of E. coli and learned how to consume this meal, this resource that had been sitting there all along. Okay. So it is arguably some new species. There's debate over this. But what's really cool vis-a-vis this question is not just the fact that it happened in one and not the others.

But after it invented this amazing new capability to adjust to this resource, which is really difficult, it effectively arguably makes it a new species. They wanted to know how repeatable is that, not across the 12 lines, but in the history of that particular line. So it turns out because they freeze the samples of the species every so often, they have the entire fossil record of that one particular lineage that ended up having that amazing speciation event. And so there's a...

A friend of mine, Zach, who has gone back and studied this question, and he basically, you can go back and take some fossils, some fossil samples from the lineage like a couple of days before and then grow a bunch of them. How many of them invent the citrate thing? Go back a little bit farther, like a couple of weeks.

and make many copies of that. What fraction of those things end up in the same place? And you can go further and further, you can go months back, years back, five years back and say, was there one magical thing right before it happened? Some lucky lightning strike that enabled this to happen? Or did some amount of mutation set the stage and make it pretty inevitable from that point forward? And how far back is that stage setting?

And they've done a lot of papers that are in the top journals of the world. And basically the answer is it's complicated. There is some stage setting. And so the farther back you go, the less likely you are to get citrate. But you definitely get citrate repeatedly if you're like, don't go back too far. And it's some complicated function. So long story short, evolution has both chance, driving where it ends up, and some themes that are consistent.

And so in our own algorithm, we could study that. And that's one of the beauties and the original reasons I started working on this stuff is to address questions like that by repeatedly doing runs of open endedness. You could study how contingent is evolution or whatever algorithm you're using. And that is beautiful. And my guess is you've got a little bit of both. Sometimes you get random magic that causes something that is almost impossible to repeat. But there are some themes like things get more intelligent, they get more complicated.

You can also do the same thing, not with the process. How consistent is the process that led to the phylogeny that led to something really cool, but the final artifact. So imagine if we ran open-endedness in a thousand different runs, we got 800 of them produced general intelligence. But now you can take that final intelligence and look across it. How different is it? What's the math look like over here versus over here? What does the music look like over here? Do they all have music? Do they all have humor?

Do they have a notion of love? Do they have a sense of curiosity? What is the difference in their sense of exploration? Yeah, it's fascinating talking about the space of alien intelligences. But yeah, so themes could be complexity and intelligence. Motifs could be, you know, morphology. There's a great thing, isn't there, called morphologically convergent evolution and carcinization, where, you know, crab-like forms appear in different parts of the phylogeny.

And I kind of wonder, I suppose, whether it could be described through emergence, through interaction dynamics of sort of basis phylogeny functions or whether some kind of physical determinism or something like that. But I did want to ask you another question about the omni-epic thing, which is that

One interesting thing about evolution is isolation, right? So, you know, we have all of these pockets of evolution that just kind of happen in situ and they don't share information with other parts of the phylogeny. And that's very, very important for niche construction and so on. In something like OmniEPIC, this is an operating parameter of the algorithm, which is the extent of information sharing between the environments and the agents and so on. Would that be something that you'd be interested in studying? It's a lovely point.

Before I answer that question, I just want to make one comment on the previous one. There's this great quote which says, if religion did not exist, it would be necessary to invent it. Yes. Right? So you could ask that question. Like, how many of these 800 intelligent races that come about invent religion, for example? Does AI invent its own religion? And any property you're interested in, you could go see. Not just...

Did they invent it? How does it look? Is it different? But how consistent is it? All of this is such beautiful science you could do if we ever cracked open-endedness. One of the many reasons I find it so fascinating. Okay, to the Omni thing, Omni is really interesting because you get to make some choices, as you mentioned. Okay, so imagine you have a library of, so Omni right now is trying to generate, just for your listener, it's trying to automatically generate an endless stream of interesting environments, right?

that an agent, if it trains in, will learn something new. Okay, so you end up, if it's doing its job and it does work, with this massive set of environments that are all interestingly different. And at the time they were invented, the agent was getting some benefit from learning on them. Okay, now when you go pick an environment, like pick the cluttered restaurant scene, for example, you, from an algorithmic design, when we made Omni, we had some choices. One thing you could do

is you could say, hey, take the restaurant scene environment. And oh, by the way, I didn't say this yet, but when we generate the new environment, we're asking a language model to look at the current code of the current restaurant environment.

and make the next interesting environment that would have high learning progress. It's interestingly different, it would have high learning progress. And then a language model spits out the code, and then we check if it's interesting. So when we go to ask, I call that an intelligent mutation operator. It's kind of like evolution. You take a thing, you change it a little bit, and you see if it's good. But you don't change it randomly now, because if you change code randomly, it doesn't compile.

But instead we take the code that specifies the cluttered restaurant, we give it to a language model, we say, "Hey, this is an environment the agent can currently do well. What's next?" And it might say, "Oh, let's do it with more tables or more silverware," or, "Let's add other agents that are actively, like, continuously coming into the restaurant," or, "Let's make the tables kind of move up and down in the float," or, you know, whatever it wants to do. It's going to make some interesting mutation to the environment.

But when you ask the system, can you make the next interesting thing? How many things do you give it? You could give it just the restaurant scene or you could give it the restaurant, cluttered restaurant scene and the kick a ball through the moving goalposts.

And then maybe suddenly it's like, ah, I'm going to combine these ideas and have a restaurant scene, but the waiter doesn't get to pick up all the dishes. It has to throw the dishes across the room into a moving tub or something like that. Right? And so you get to this and you don't have to restrict yourself to two parents. You could give it the entire archive and say, look at all of the environments so far in context as an in-context learning. And now what's next?

Okay, so let's assume that you do the latter. You give it the whole context and it produces a new thing. Well, how do you draw a phylogenetic tree of that? In some sense, everything, the parent of every new thing is everything that came before.

you can't draw the tree but since and and we tried versions of that and that is it does still push in the direction of moving goal posts like football worlds and obstacle course worlds and cluttered restaurant scene worlds because it knows still the human notion of interestingness but there's no longer this direct obvious way to draw a phylogenetic tree

In the current system, what we do, in case you're curious, is what we found worked really well, is you go and you get one type of, you sample one environment randomly, like the cluttered restaurant scene,

And then we don't just want to give that only to the system and have it generate the next thing because it might have already seen the cluttered restaurant scene before and already generated the next thing. So what we do is we get the cluttered restaurant scene, then we use embeddings to say, now give me the nearest five, for example, things to the restaurant scene. Give those five things to the LLM and say, here's a little cluster of environments you've made so far. Generate the new thing.

And it does, but then that complicates the drawing of the phylogenetic tree, because now there's arguably five parents to this one environment. So in our actual graph, we take, I believe when we draw the lines, we take of the five, the one that ended up being closest to it, and that's the one we draw to make a pretty picture. But really the picture is some complicated web of interconnectedness that you don't see. Also, one other fun fact, after it generates a new environment, like we give it the five, we say what's next, it generates a new thing it thinks is interestingly new.

we don't necessarily want to trust that that thing is new because it may be it generated a thing we already have. So once we have that new thing, we go and we, with embeddings, we pull out the five nearest things to that thing and we say, hey, you just generated this. Here are the five closest things that exist in the archive already. Do you still think this counts as interestingly new? And if it says yes, then it's in. And what I love about this is it basically is just like a graduate student or a professor.

You read some papers, you're inspired. Hey, that actually makes me think this would be a good idea. But hopefully if you're doing your job right, instead of going and spending a year on that new idea, you first go do a literature check. Now that you have the idea at hand to say, and is this idea actually, has it been done? And if it hasn't been done, let's go. Beautiful.

even things like multiple inheritance, that there are so many topologies in the artificial world that could actually allow us to extend beyond the natural world in quite clever ways. Jeff, I wanted to talk a little bit about your background in philosophy. So you majored in philosophy, right? And of course, you're a computer science professor now. How has that influenced your trajectory? Well, I mean, from a really practical perspective, I think that one of the skills you don't spend enough time on

if you get a technical degree, is communication. And so just being a humanities major made me spend a lot of time on clear writing, clear thinking, clear communication. And so I think it's been helpful in my career to prioritize, like if you do something really great, but you can't communicate to somebody else why it's interesting and what you did,

then you have a lot less impact. And so I think that's helped practically. I also think philosophy just really trains really great clarity of thought. Like it really forces you to question your assumptions. And so one of the things that I think is a gift when I do science is an epistemic skepticism, which is philosophy talk for like not trusting anything, for being skeptical, as you know. And so there's been so many times in my career where I'm working with somebody

And they say, all right, we did this, we got these results, and therefore we can conclude this. And I say, slow down. Are we sure that we know that? How do we know that? What would that assumption, what predictions does that assumption make? And can we test those predictions? And often people will stop and say, actually, we don't actually know that. And this is the right experiment to try to figure that out. And sometimes when we do that experiment, it confirms what we thought, and often it falsifies what we thought.

So I think being very, very skeptical is a great trait of philosophers and scientists. So how does a philosopher end up as a professor of computer science? It's a great question. And it's actually kind of ironically a long convoluted circuitous path that led me here. It definitely wasn't a straight line.

So I'm kind of, you know, art follows life or life follows art. And so the story is that I was working in the dot-com boom in Silicon Valley and I was actually in a marketing department and I didn't really love what I was doing. So at my desk, I'm reading articles about things that I find interesting. And I came across this article from the New York Times.

And it just set my mind on fire. And what happened in this article is people had created a virtual world where they could evolve virtual creatures using an evolutionary algorithm, real mutation, real survival of the fittest. And when the robot morphologies, which were changing over time, got good enough, fit enough, it would automatically send the design over to a 3D printer. They would 3D print the robots and they would crawl out into the real world.

And I remember like a thunderclap in my mind reading this article. I was like, everything I read was so amazing. The fact that you could instantiate evolution inside of the computer, that means you can study it. That means you could ask questions like how repeatable is it? Why did sexual reproduction show up instead of asexual reproduction? What happens if you have three parents instead of two? You know, how do you get agents that cooperate with each other, that invent language?

Everything you wanted to know about evolution was now testable. It was like amazing. And you can harness it to make really cool engineering designs and maybe even intelligence itself. So that happens. And at the time, I didn't know really, you know,

I didn't know how to find out more. I will say as a philosopher, I didn't train me that these things called papers and you can keep reading. So I couldn't like learn that much more, but it just, it stuck with me. So I was not satisfied with my, my.com job, quit my job, wouldn't travel the world for 15 months, living out of a backpack, surfing, traveling, loving life. And at the end of it, I said, well, I'm coming back from my trip. Like, what do I want to do with the rest of my life?

And the answer is, I said, I want to do that thing that that guy was doing. So I went back, I found the article, which had not left my mind. And I said, okay, it turns out it's a person named Hod Lipson. He's a professor at Cornell University. And I said, let's contact him. So I sent him an email, which I'd love to go find and dig up. And I was like, hey, I'm a philosophy major. I've just traveled the world. I really love that work that you did that was on the cover of the New York Times.

"I wanna join your lab, can you let me in?" And he surprisingly wrote back and he said, "Well, you can't actually get into the PhD in computer science program at Cornell. It's one of the best universities in the world with like a philosophy major undergrad." And she's like, "So that won't work, but you seem really passionate and I'd love to try to work together somehow." So we were trying to make it work. We couldn't really find a great way to make it work.

And so I said, all right, I'll apply to other schools that aren't Ivy League. So I ended up contacting and emailing 85 different universities. Everything that Google at the time-- Google didn't even really work very well. But everything that Google returned for genetic algorithms, every professor I emailed. And every one of them was like, no, no, no, no, you can't get a PhD in computer science with an undergrad in philosophy.

But two responded and one of them said, hey, you can't just go into the PhD program, but there's a philosopher here. His name is Rob Pennock, and he works with people who do evolutionary algorithms and A-life.

And I said, I looked up who these people were and I said, okay, I'm in. And so I had a secret plan. I said, I'm going to go to Michigan State. I'm getting a master's in philosophy. When you're in grad school, they don't restrict what department you take lots of your courses in, right? You get your core, but you can take other classes. So I said, I'm going to take all the AI classes, all the machine learning classes. I'm going to learn the math. I'm going to learn to code and show them that I can do this.

And so I did, I worked my tail off. I learned a program. I learned the math. I took a machine learning class. I had no idea at the beginning what I was doing, but I ended up acing all the courses and publishing papers with,

the artificial life lab there, meeting Charles Ofria, who became my PhD advisor, and Rich Lenski, and Rob Penack, and all these great people working on E. coli and artificial life. And after my master's degree, I had published with them, gotten good grades, and I said, "Now can I get a PhD in computer science?" And because they knew me, they made an exception and they let me in. So I'm in the PhD program. I work really hard. I publish more papers. I meet Ken Stanley. We start working together on open-endedness and neuroevolution and CPPNs and all these great things.

Then I call up Hodlipson after I said, "Okay, I now have a PhD. It's been eight years since I first contacted you. Now can I join your lab?" And Hod said, "We'd love to have you, but you know, I don't have funding." So I went and I found funding, secured that funding. I called him back up and I said, "Now I have a grant from the NSF that says I could do a postdoc wherever I want and I want to do it with you." And he said, "I'd love to have you come join the lab."

So eight years after I read that article in Silicon Valley, having no idea what I was getting myself into, I was in the lab at Cornell and it was like Willy Wonka's robot factory. Just 3D printed parts on the wall were evolving crazy soft robots every day. It was a wild and crazy adventure. And two years later, I became a professor at Wyoming. And you know what? I started getting emails from people being like, I like what you do. How do I get into your lab? So I like to tell this story to people, even though it's a little long, because I

I don't think you should take no for an answer from the world. Pick some crazy thing you want to do, follow your nose for what seems interesting, and it might be a long, circuitous path, but eventually if you just keep going, I think you can get yourself to really interesting places, and it's very fulfilling. Yeah, and from a serendipity point of view, having multidisciplinary research is really important because otherwise you wouldn't have come in with all of these completely different ideas.

Just quickly, as you mentioned, you know, compositional pattern producing networks and Neat and Kenneth Stanley and all this stuff's amazing stuff. When did you first meet Kenneth in person?

I believe it was online. So I was a person, I was in an A-Life lab and a lot of people doing experimental evolution and artificial life. And I liked that stuff. I did some of my early work on like, can evolution maintain, like optimize its own mutation rates? And why does altruism, which is like being nice to each other, why does that evolve? Like hardcore open questions in evolution and biology that we were testing in simulation. But I always had an eye towards,

intelligence itself. What is happening inside of my brain? How can you make an intelligent thing? And that led me to neural networks.

And then I said, okay, I know evolutionary algorithms. I want to know how you could evolve something that will kind of become more and more complex and eventually could produce the human brain. And so I ran across the NEAT algorithm, which is a really early algorithm, which is now something that is now called neural architecture search. It starts out with a simple brain, adds more neurons over time. And it was beautiful work. And so I joined the NEAT users group and I started talking online. I started reading about these people's papers and interacting with

with this guy named Ken Stanley and I keep basically posting really long sets of questions on all of the work coming out of his lab. I start reading everything that their lab does and I'd love to know what he thought of this upstart young PhD student early on that was asking him all these questions. But eventually we had talked enough that I said when CPPNs came out, I was so blown away by how beautiful the work was. I said I want to work on this and I started, I basically went and did a little research project.

It had nothing to do with my normal lab work. I said, I just have to study this new CPPN thing. It's so cool. And I started asking questions that I thought were interesting. And I put together this little lab report. And I basically emailed Ken and I said, hey, I've done some work that builds on your new work. I'm hoping you'll take a look and just give me some feedback. And of course, if you want, then I'd love to work together.

And he wrote me back an email that was like, I initially looked at this and I didn't think I wanted to look at it. I wasn't planning on looking at it, but I thought I'd be nice and just give like a quick read to the top. And he's like, and I couldn't put it down. I just kept reading it, kept reading it, kept reading it. And by the end of it, not only did I read it, but I wrote this long, complicated response to you. So I'm sorry it's so long, but it's beautiful work. And it raises all these interesting questions.

And basically that was the first paper we wrote together. So I started answering his questions and back and forth and doing experiments he suggested. And I ended up winning a best paper award, if my memory is correct, at GECO, which is the conference I was going to at the time. And then we met in person and we're off to the races. What was the name of the paper? I believe the paper, the theme of the paper is effectively how do CPPNs encode regularity

and how do they respond to, how do they produce regularity in neural nets that control robots? And how do they deal with changing that regular, like how do they deal with the fact that sometimes you need to have things that are irregular and sometimes you need to have things that are more regular?

I wish I could actually quote the paper title now, but I could look it up. We could put it in the show notes. Yeah, amazing. Jeff, this has been so brilliant. Thank you so much for coming on. I really appreciate it. My pleasure. It's so fun to talk about all of these things. I do want to give credit, if you don't mind, to all of the tremendous people I've worked with.

So I've mentioned a lot of the people that have been there through most of the journey, like Joel and Ken, but there are so many great people that I've worked with along the way. All of the PhD students in my lab, all the people I've collaborated with, Jean-Baptiste Moret in particular, are very influential on me.

Tim Rockdeshell, you know, and there's just too many to name. So I just want to just kind of have a blanket statement that if you are interested in any of this stuff, please look at the full author list of all the papers because it is not one person. It is a collective community that's been building together all this work. I'd love to get John Baptiste on it. You just reminded me actually, Kenneth Firm suggested we get him on. He's fantastic. He's one of the smartest and most brilliant scientists I've ever worked with, really, truly. He does fantastic stuff and I've never had a boring conversation with him, you know.

Every time I talk with him, I learn something new and I listen as hard as I can. Amazing. Thank you so much, Jeff. My pleasure. Thank you for having me.

Jeff Clune - Agent AI Needs Darwin 02:00:13 Share