At this point, maybe a good chunk of the internet is LLM generated. As AI capabilities kind of increased, like we saw them being used in ideation, people from Stanford have been thinking about this kind of thing. LLMs can generate just as novel ideas as human researchers. In essence, can we use AI scientists as a data generation tool? Can we like get much deeper knowledge than just the tip of the iceberg that's published in a paper? The scale of evolution that we have in
Nature is not necessarily something we can do with GPT-sized models, but we can always try to do evolution on a much smaller scale. I do think that entropy is really important for what we're doing, where we need these really new ideas, some creativity. Also, alternative approach is to look for alternative sources of entropy.
Chris, welcome to MLST. Great to be here. Can you introduce yourself? Yeah, I'm Chris. Chris Liu. I'm a PhD student at the University of Oxford with Jacob Forster. And I also do quite a bit of work with Sakana AI Labs. Also joining us today is Robert Lange. And Robert and Chris have written a really cool paper together that we're about to talk about. Robert, can you introduce yourself? And of course, this is what the second, third time that you've been on the show. Thank you for having us. It's great to have you back.
It's awesome. Yeah, I'm Rob. I'm a final year PhD student at TU Berlin and a founding research scientist at Sakana AI. And I had the pleasure of working with Chris for a long time. We started out getting to know each other at an internship and then basically have been collaborating ever since. And
Yeah. Amazing. And we've also got Kong. Kong, can you introduce yourself? Yeah. So I'm Song. I'm a postdoc at the University of British Columbia and mostly interested in open-ended learning. And I'm supervised by Jeff Klune. I'm a huge fan of Jeff. He's awesome. Yeah. Jeff is amazing. Let him know that we'd love to get him on the show. Oh, for sure. I think he'd love to come. Yeah. Amazing. Amazing. So Tufa Labs is a new AI research lab. I'm starting in Zurich. It is funded from Paz Ventures, involving...
AI as well, and so we are a Swiss version of DeepSeq, so a small group of people, very, very motivated, very hardworking, and we try to do some AI research starting with LLM and O1 style models. What we're looking for now is chief scientists and also research engineers. You can check out positions at tufalabs.ai. Chris and Rob, you guys wrote this paper, "Discovering preference optimization algorithms with and for large language models."
Can you give me the elevator pitch? Yeah. So we write algorithms to train language models to follow preferences, right? To align language model behavior with human preferences.
And we have been handcrafting a lot of algorithms to see which ones seem to work best in order to be more sample efficient or faster or more efficient when it comes to getting these language models to be optimized. A lot of work in the community comes down to figuring out which algorithms are best.
And if you actually look at how we do this, a lot of it is just a lot of trial and error with some intuition behind some math. And language models are quite good at this as well. They do have the same mathematical intuitions through their pre-training, and they're also quite good at writing code. So some questions, why don't we just have language models try to optimize these algorithms that we use to train language models? And that's basically the whole paper.
How does it work now? So we've got things like RLHF, for example, right? That can, you know, shape the behavior of these language models. And the algorithm is designed by hand by a bunch of experts in the field that have strong intuitions about this kind of thing. Are you suggesting that we could potentially automate that? Yes, exactly. So basically these experts in the field, they have a lot of intuition, but also just a lot of trial and error. And language models also have
maybe not as good intuition, but reasonable enough intuition, but can do way more trial and error than a human. And so this allows us to search much broader spaces of algorithms in order to actually find the proper best one. So Rob, how does the approach work?
So basically, we have already been working on sort of using evolutionary black box optimization style approaches for discovering algorithms before. And back then, we usually used a neural network to, for example, parametrize a loss function, right? And then we optimized the parameters of this neural network using some type of meta evolution, essentially.
So the way how this works would be to sample candidate solutions and to evaluate them on a problem, and then to feed back the end performance into the optimizer. And here, we're basically taking a different approach. Namely, we're taking a language model to propose code snippets. So think of your favorite Torch objective function, and the language model basically writes the code. And it does not only write the code, but it also gives a name to the function and gives a thought for how it came up with this idea.
And then we use the written code in an evaluation, sort of empty bench style after doing this preference optimization. And we feedback the result in the context of the language model and basically see whether or not the language model can discover objective functions which improve upon things like DPO, KTO, and so on.
And interestingly, what we sort of found during this process is that LLMs are really good at sort of mixing different concepts. So if you think about it, for me personally, I've read a probably tiny subfield of machine learning papers. While for an LLM, it has not only read machine learning, it has read physics, it has read chemistry, and can basically mix and combine in a complementary fashion all these concepts.
And in the paper, we saw that different of these concepts, like, for example, smoothening or regularization techniques are basically combined throughout this evolutionary optimization process. So in summary, we're basically thinking of the LLM as a type of very strong and intelligent mutation operator that can help us discover new algorithms.
Yeah, this is so cool. I'm seeing quite a few approaches like this, certainly for the Arc Challenge. Ryan Greenblatt famously used GPT-4.0 to generate loads of Python programs and he generated 20,000, 30,000 of them and then we see which ones are good. And Kevin Ellis's group out of, I think he's at Cornell now, but he was at MIT. And very similar thing to what you're talking about. So, you know, like generating loads of snippets and then remixing and then blowing up the remix programs up by an order, you know, like two orders of magnitude or so on.
Why do LLMs work so well for this kind of thing? I mean, they've just seen tons of code, right? And they have a good idea of what works and what doesn't. And so by just leveraging this pre-trained base of code knowledge, it's able to more efficiently explore the space. So Rob just talked about how we used to do this meta evolution technique where we'd randomly perturb some neural network weights that represent some objective function and then sample the fitness of those functions and then use that to update some meta network.
This was really inefficient. This is why we had to use these Jack-style techniques in order to actually get things to run fast enough for this to work. So we have to sample millions of parameters and thousands of population members, things like that. And all we could do is random permutations of these neural network weights. And you can imagine if I just randomly perturb the neural network weights, I'm not going to get anything interesting most of the time.
What's really cool about LLMs is they have much more structured exploration because of the human data that's been trained on. So it does explore more like humans, does do more intelligent exploration. And actually in the paper that we wrote, you can see that actually does very intelligent exploration where it will look at what worked and what didn't, and then kind of use this to do this mixing that Rob was talking about.
So it would be like, hey, let's try this. Oh, that didn't work. Let's try this other thing. Oh, this worked maybe because of this. And then it would just keep kind of building on the things that I previously discovered in order to try to arrive at a final solution. - MLST is sponsored by SenseML, which is the compute platform specifically optimized for AI workloads. They support all of the latest open source language models out of the box, like Lama, for example. You can pay on consumption essentially, or you can have a model which is always working, or it can be freeze-dried when you're not using it.
All of the models that they deploy support the OpenAI API specification out of the box, which means it's just a one-line change in your application to switch over to Sentinel and start saving money and make your application go faster. I mean, Rob, to what extent are these things bounded by the training distribution, right? So, you know, they've got all of the things that we've ever said, but we want creativity, right? We want paradigmatically new things, right?
Can we get there? I think this is a super interesting question. And usually when I talk about this, I kind of take analogies from arts, right? So oftentimes people think of Picasso as this genius, right? Who just woke up and had sort of all the ideas and style for cubism in his mind and he started working, right? But Picasso is a child of his generation, right? So there were many surrealist artists during that time, like Salvador Dali or Joan Miró, right? And basically,
Picasso found his style within sort of this convex hull of different artists, right? And oftentimes when we talk about LLMs and LLMs for discovery, we're speaking about interpolation versus extrapolation. And I think the power of LLMs, even if they might not necessarily be able to do
crazy extrapolation is that they can still interpolate between all of sort of the pre-training corpus, right? So this was what I was talking about before. There are these fields like chemistry, physics, economics, where many of sort of discovered concepts can potentially be brought into the field of machine learning. Chris, at the moment, you're measuring success against downstream tasks, if you like, but would it be possible to change the architecture to have more abstract concepts
forms of optimization, like for example, for fairness or bias or anything else we might think is important. Yes, that's definitely true. I think currently, yeah, we are only optimizing for the ability to predict future preferences. We could try to push it towards favoring things like fairness or more open-ended ideas than just some single number metric that we're looking at. In the paper, we can kind of try to break down the,
different qualities of the model across like a spider plot where you kind of show like this is how good it is at like reasoning or math but you can also add things like fairness and try to do something with like multi-objective optimization. What might be even more interesting is like trying to expand it from this like single number well-defined metric to what we do in the future work at AScientist where we kind of just try to generate interesting papers right
And this is kind of a general problem that we have in generative AI. You look at things like Sora or DALI or these image models or video models, and the question is like, how do you actually judge these, right? There's like no great metrics in the field. And a lot of the stuff is now kind of comes down to like taste or feel. And I think that's how things like AI scientists might also play out or these future meta-optimization techniques where it'll output a lot of things, but we need to kind of go through them and pick what we like the most.
Yeah, Rob, I mean, how important are vibes when we measure these things? I mean, do you think that the metrics are being good-hearted, being saturated? Are they useful? So I think every ML researcher has their favorite set of vibe tests, right? To see sort of firsthand the capabilities beyond sort of MMLU benchmarks, right? And I think this is super important because in my experience, many of the models are
can overfit them, right? Like the benchmarks that are commonly used. And I guess this is helpful for getting attention on social media when you sort of promote your new trained model. But to me, like actually doing the vibe checks on problems that I care about is usually much more important. Like for example,
Yeah, LAMA scores are really, really great. But for the things that I want to use frontier models for, usually LAMA struggles. And this is something we, for example, also saw in the AI scientist. So nonetheless, I think it's a really, really valuable contribution to the community in order to essentially assess the capabilities and also to work on fine tuning methods to improve these things.
Very cool. Chris, the loss function that you guys discovered in Disco Pop, you said that it exhibited convex properties. How much of that do you think was a necessary condition for the performance of the algorithm? What was actually really interesting was non-convex. Almost every loss function is convex because you want a single optimal point.
What's weird about this one was that there was a discontinuity, right? It's non-convex. There's like local optimal and then there's a global optimal. And we hypothesized, I believe we kind of show some evidence for the idea that maybe this non-convexity
is useful for noisy data, where you can maybe capture some of the bad data in this local optimum and then actually optimize the rest of them in the global one. This is a hypothesis, we have some evidence in the paper for it, but it's really hard to rigorously prove this. One really interesting thing is a similar type of non-convexity appears in a prior work we did called discovered policy optimization, where there are a lot of these features that when we look at this loss function, we didn't quite understand. And one of them was this non-convexity. So
This theme seems to have appeared multiple times, so it'd be really interesting to actually explore this in more depth. Very cool. Rob, when we start meta-learning, you know, loss functions or anything else for that matter, because if you think about it, you can take any part of the prediction architecture, you can stick a model in there and you can learn it. Maybe we lose the interpretability, we lose the why. Is that important?
I think like Chris and I, we've both been doing work where we discover sort of function approximator versions of loss functions and so on, or evolutionary optimizers. And then oftentimes, depending on the parameterization of the discovered objective or system, you can try to reverse engineer it afterwards. You can try to capture the majority of the variance with an analytically expressible
equation, for example. And this usually helps in interpreting afterwards. So I think there are probably limits to discovering systems that are really, really capable while still interpretable. But you can even think of settings where you set up the system to remain interpretable. So it's really a question of what's the substrate that you're trying to meta optimize. And some of them lead themselves to be easier interpretable.
post hoc than others. So I would hope that the final thing that we might discover is still interpretable, but there's no guarantee to it. And as long as you can show empirically that they are bounded in a certain sense, I think this is probably as much as you can hope for oftentimes. Yeah, one really interesting thing is we spent a lot of effort trying to reverse engineer a lot of the algorithms that Evolution discovered when we did this meta evolution technique.
With Disco Pop, it was not as hard because it would explain what it was thinking. And so we just used his explanation, and that was the basis for our analysis. A lot of times, even in academia, people will propose algorithms and maybe not fully understand why it works. And it's up to other community members to go and reverse engineer perhaps why certain algorithms work better.
And I think it's something similar here, where the LLM proposed this loss function, it maybe has a maximization where it works. And our job is to maybe confirm this or maybe come up with extra reasons for why it might be a good loss function. And I also think in the context of Disco Pop versus the previous meta evolution projects, the substrate that we're getting out is code. So it's human readable. You don't have to sort of--
fiddle with the neural network. And this can be helpful at the current point, right? Who knows, maybe in the future there are going to be objective functions with thousands of lines of code, which is going to make it harder to interpret. But you can still sort of try to read it, right? Because code is a medium that we humans are sort of capable of working with. Can we focus on that just for a little bit, though? So what does the code look like?
Do you see weird examples where it's just unbelievably complicated or does the language model already intuit or know to give a reasonable, understandable answer? Right. I mean, usually the code is only like a handful of lines, maybe like five to eight lines of code. So it is usually pretty interpretable, but it's also pretty creative. So there's a lot of possible loss functions
in this space. And it's usually some type of combination of these with maybe some interesting extra loss functions that it found during its search. So for example, I believe in the ultimate disco pop loss, there's this exponential loss that I don't think I've seen used very much before. And I don't think it's very good on its own, but it happens to work really well when combined with other loss functions, apparently. - Very cool. And how are you, you know, like you could do beam search, you could put the temperature up and so on. Have you experimented with all of this kind of stuff?
We didn't do super thorough explanations of this in the paper, but we might get better results if we actually try to force it to be more creative. Yeah, really interesting. I mean, Rob, how do you feel in general though about delegating creativity to a machine?
So I think we as humans, we have a bias to our own creativity, right? In the sense that oftentimes, especially when we talk about the AI scientists later on, people ask me, do you think this is just dumber than humans? But I think even if it's a little bit dumber, the throughput that we can get by automating these processes is tremendous, right? So even if the efficiency of such a
automated discovering system is like let's say 50% worse than what a human would do, we can scale it much more gracefully. And I think with Disco Pop already, but also with the AI scientists, we're getting to a point where
basically we can turn money and compute into like really useful insights for the next generation of ai so we have the self-referential nature of it which would be much more slowed down if the human was supposed to be the only creative engine basically yes yes chris how do you see the role of of humans though do you think there's a middle way where we can have humans in the loop supervising the process i mean what would that look yeah i think that'd be honestly like
better if you wanted to produce better papers faster in some sense. As in, if there's a human in the loop, it's kind of like a supervisor for a project, right? So right now I would say the models are maybe as good as an undergraduate student or a first-year PhD or a very young starting researcher. I think supervisors are really helpful in this scenario where they can help prioritize what problems are important, they can help prioritize what strategies might work better and what might be worse.
I think that humans can play that role in this scenario where maybe in the future PhD students will be more like professors are today where they are advising a big group of AI scientists or something like that. I think this type of taste making is really important when it comes to academia. After some point, I think putting out papers is more like an art than a science in some sense where you're trying to figure out what people would find useful or interesting and there's no objective measure for that.
In this type of setting, a lot of the problems become similar to those for image generation, video generation, where we can generate tons of images, but who cares about most of them? Somebody has to put in some prompt or something that they're interested in and get the thing out. And so I think the AI scientists should hopefully use something similar where someone really cares about certain topics. So they want to get as much research as they can on that topic, and they'll be the ones making the taste and designing it.
And so they will be the ones kind of introducing the problem parameters and what type of like outputs they want and specifying everything for the AI scientists kind of explore in the same way that a professor might do that for a lab.
very cool what about this idea of an infinite regress so let's say you find an amazing loss function and you give it to open ai and you tell them to train gpt4 again from scratch with the loss function and then you you know you then sort of met to learn a new loss function and so on and so forth over many generations what would happen you know would it would it kind of get better or do you think it would mode collapse in some way i i think there's
probably a broader question than the specific version that you're stating, right? So I think given that these systems are now everywhere, right, and we're generating a lot of content on the internet with GPT and so on, and it's probably going to be used in future generations, it's hard to tell whether or not there's going to be a mode collapse. But I think even if there is, the current systems are already generating a lot of value to society, at least in my experience of working with them day to day.
if this is going to happen in the next five years, once we reach that point, I think we're already really well off as a society in terms of using these systems later on.
I think with this specific case of discovering an objective function with an LLM and using this then downstream, I think there's a separation where the objective function is essentially only an implicit tool to shape the LLM instead of baking into the LLM that it should give this objective function as an output in the next generation. So I think depending on what you discover and feedback into the loop,
there are going to be clearer signs of this than with other things. Maybe your take on that, Chris, as well. I interviewed Ilya from Google DeepMind, actually. He wrote that Nature paper about the model collapse over successive generations. And the elevator pitch is after about the fourth generation, you lose all of the entropy in the distribution. And for creativity, surely we need entropy. So at some point, do we lose access to the source of entropy?
I think it's possible, but at least at the scales we're looking at right now, it seems as though we have enough entropy on the internet where humans are still saying random things and being kind of funny or being unexpected. I do think that entropy is really important for what we're doing, where we need these really new ideas, some creativity. I think
One alternative approach is to look for alternative sources of entropy. For example, you might say, "Hey, I'll just randomly sample two fields and say, 'Hey, come up with a cool idea about these two fields.'" Now we introduce external entropy into the system, and that might be a way we can continue dealing with possible mode collapse.
Yeah, it's really hard to tell where the models would go if you keep kind of training on data that itself is outputted on the internet. Like at this point, maybe a good chunk of the internet is LLM generated.
Yeah. A lot to see. Maybe to add to that, we started this with thinking of LLMs as evolutionary operators, as mutation operators. And I think what Chris just described is basically taking this even one step further and thinking about crossover between concepts, fields, and so on. So I think there is a lot of inspiration that can be taken from the evolutionary community to essentially improve metageneration of outputs from LLMs. So yeah, I think this is a really young field, but there is work on combining LLMs with QD-style
algorithms, and so on. And I think these are really promising when it comes to squeezing out the most creativity out of these LLM systems.
I guess I usually don't like the comparison between neuroscience and the central nervous system in deep learning, but to me at least evolution is one of these processes that has led to intelligence in many, many different forms. And mixing in some of these inspirations from natural biology into synthetic biology might be helpful as well. Yeah, it's almost as if there's an abstract concept we're talking about here, which is this philosophy of combinations.
Right. And we take a language model, we train it, and it learns the statistical distribution. And the first time we do it, there's a nice long tail and we capture a lot of the complexity and that erodes over time. But language models have this property, though, that even after the fourth generation of model collapse,
You can still, as you say, Chris, put diverse inputs in and because you get the combinations of the representations, there's a cone. The output space is still much larger than what went in. And then there's the meta level, as you're talking about, Rob, which is that we have all of this mixing at the cultural level as well, which creates another blow up of complexity.
So I suppose maybe you could make the argument that it doesn't matter to some extent that we lose so much of the complexity in the model because it's the way it's used is where much of the complexity comes from. Yeah. I think the input distribution itself has enough entropy for like, we can keep going for a long time, I think, with the current paradigm. I don't know if model collapse will happen like in practice, just because like it'll be training on like
things humans say, ask, do. And this is just...
I think some irreducible entropy that will happen. I haven't read that paper you mentioned in depth, but I'm not sure if they introduced any external entropy in their sampling system. Whereas in practice, they do. Okay. Only they take random samples from the distribution. Yeah. So it's all the same distribution, IID samples, exactly. But if you can then use the human distribution of inputs or human distribution of entropy,
then we might be able to keep going for much longer than the paper might predict. Very cool. Have you guys thought about a curriculum learning addition to the paper?
In what sense? So you're generating all of these code examples. What you could do, for example, is have a library and do retrieval augmented generation and sort of have it seeded with a bunch of things and learn some optimal curriculum of how all of the knowledge gets combined together. Do you see what I mean? Like there might be some avenue of research there. Yeah, for sure. I mean, like you're right. In the end, we're presenting individual objective functions, right? But there is like an archive of different possibilities that you could use.
I think something that's also really interesting, which goes back to some of the related work Chris and I have also been doing is sort of to look at whether or not you can augment these objective functions with like a temporal dependence, right? And tell it basically, OK, you have a training horizon, which is this many gradient descent steps. And then to see whether or not the LLM can discover essentially something that performs this.
automated curriculum sort of implicitly, right? Or explicitly, if you will. But in some sense, another direction one could take is use essentially the archive of generated objectives and try to essentially see new runs of this LLM discovery with like a new knowledge bank that is fed in context, right? Or subsets of it, right? Very cool. So Rob, we're going to talk about your paper now, which is Large Language Models as Evolution Strategies.
which you also refer to as the LLM black box optimization paper. Can you give us the elevator pitch? Yeah. So I think this whole project started out with a different paper, which was written, I think, by the robotics group of Google DeepMind, which was called Latch Language Models as General Pattern Machines.
So in that paper, they interestingly sort of looked at this phenomenon of in-context learning on fairly abstract input sequences. Like for example, they optimized a policy for a cat-pole control where essentially the states were just represented as integer sequences. And when I saw that LLMs could do policy improvement, I was wondering, OK, maybe LLMs are also capable of doing more general black box optimization.
And so the starting point of that was basically to look at whether or not we can represent numerical black box optimization problems in an abstract fashion that allows large language models to basically apply in context learning and to optimize these functions. So what we show in the paper is that this is indeed possible and that depending on
the combination of language model, prompting setup, and in context information that you give, you can see different performance results. And yeah, that this can, for now, scale to medium-sized problems. So as always, evolutionary optimization is not suited to optimize transformer-like architectures. But in settings where there is no gradient accessible, you can apply this to, let's say, up to a 50-dimensional optimization problem. MARK MANDEL: Very cool. Can you sketch out the architecture?
Yeah, so basically the way how this works is we're using a set of prompting strategies, including least to most sorting and providing sort of the fitness and the evaluations that were done on the function in context. And by sorting the information in a sort of improving sequence, the LLM can sort of infer which steps were beneficial in previous evaluations.
and then sort of continue this going forward, right? And interestingly, yeah, we saw that this can outperform sort of traditional algorithms like used for black box optimization. And I think this sort of comes back to the previous paper that we spoke about, Disco Pop, where LLMs seem to have a very good sort of inductive bias mechanism
for doing intelligent exploration and exploitation. Yeah, I suppose another interesting thing here is that we're talking about using language models. I mean, these are linguistic beasts and you're now getting them to give intuitive guidance for very abstract things like numbers and so on. I mean, could we think of LLMs as somehow being a universal form of representation for many modalities?
Yes. I was really excited when I read the paper because I was like, okay, this is not just the stochastic paradigm because in order to infer these improvement sequences, you need to do abstract reasoning. So to me,
I think as long as you can represent things as strings with some structure, LLMs are capable to identify, given enough context, sort of patterns. And I think for some settings, this can be useful. For others, there might be sort of better suited manual algorithms. But for black box optimization and optimizing code in a closed loop, this is for sure a promising paradigm going forward. You spoke about stochastic parrots.
What's your current philosophy on that? Kind of in the similar vein of what we discussed before, I don't actually think it matters that much. So given that they're stochastic and we're sort of in the driver's seat of choosing how to meta-generate new concepts, and they can interpolate between concepts just using sort of prompting techniques, we can probably generate a lot of new knowledge just by doing this, right? So coming back sort of to the arts analogy that I made before, I think interpolation
at this point, in a super large high dimensional space, is already quite a lot in terms of value that we can bring to society. Very cool. Very cool. In your paper, you said that in certain circumstances, smaller models actually outperformed bigger models. Tell us about that.
In the paper, we sort of compared different language models as you do, right? So this was sort of beginning of this year, end of last year. And back then, we were working with GPT-4. We were working with sort of the LAMA-2 suite of models. And we found that when comparing them on a sort of vast set of black box optimization tasks, that the smaller models tended to perform fairly well besides sort of GPT-4.
And to us, this was maybe a very implicit evidence for the size of the mixture of architecture, a mixture of experts setting of GPT-4 potentially. That individual experts might be actually much smaller than 400 plus billion parameters. And yeah, it's interesting to reason why smaller models might be better at doing this in context learning on abstract sequences.
there might be something related to overtraining and under training of these systems. But at this point, I can only speculate over this. But it was an intriguing finding because this was something that happened across LAMA and Palm II models from Google. You discovered that fine tuning on these teacher algorithm trajectories made the models go better, right? And
i suppose it's not really possible to do it any other way would there be another way of doing that i mean if you actually had access to the entire um you know model training architecture could you somehow imbue that knowledge somewhere else in the prediction architecture i think there's
There is related work from Johannes van Oswald, who shows that basically transformer models can learn how to do gradient descent, basically. And they explicitly or somewhat explicitly train for it. It's not like taking a language model and deploying it and showing that it does gradient descent, but it's sort of training a transformer, which then implicitly gives rise to gradient descent. So I think there are for sure ideas of how to train sort of training distribution or the training paradigm to make these systems
more capable of doing abstract and context learning. But I also think that this might sort of take away some of the creativity, right? It's always the question of how much do you actually want to bake in naturally versus how much you think or hope they are just going to
employed or discovered during their own training process. So in the paper, like you said, we used sort of optimization trajectories generated by a teacher algorithm and performed a little fine tuning with them. And this helped on certain tasks, but not on all, right? So I think there is clearly like
a factor of us sort of um maybe baking too much inductive bias in by just changing the training distribution there is this matter of i mean when we're dealing with um you know with with numbers or discretization in general um even like tokenization could be a problem i mean
Tell me about that. For sure. Like, for example, when you look at, I think the Lama2 tokenizer, certain numbers are sort of more represented in the corpus that generates the tokenizer than others. Right. So, for example, I think the numbers 1 to 50 are presented sort of all with their own tokens. Same for like 1950 to 2020. Right. Which makes sense because they are more frequently in the training corpus.
But especially if you work with floating point numbers, this can lead to sort of artifacts, like certain numbers being assigned more tokens to than others. And when your goal is essentially to squeeze out as much in-context learning out of the LLM as possible, you kind of want to have standardized sequences of tokens.
So this makes it easier for the system to basically infer patterns than if there is a flexible number of tokens. So as a designer of such a system, the question then is like, how do you set up sort of the representation or the abstract representation to maximize in context learning from the LLM? And we found that certain integer discretizations worked really well instead of using floating point numbers. But more recently when I played around with this, some of the more
And new frontier models are actually also capable of directly working on the floating model numbers. So this might again change as we progress in making the systems more capable and more robust.
Yeah, these insights might also become outdated later on. There's also this matter of, you know, the language models, they learn all of this rich knowledge during their pre-training. And how much of that is necessary? So what's the balance between its internalized knowledge and, you know, doing chain of thought, for example? And does that imply that we could, in principle, have smaller, dumber models, but really clever prompting? Yeah.
I think this is a super interesting question, given that now there's also hype around small language models and the question of whether or not the big frontier models are going to rule the entire economy, or whether or not small players with small models can play a role there. I think actually when it comes to very specific tasks like BlackBox
optimization, for example, smaller models, as shown in our paper, tend to perform well or even better. So I think there is a chance that we might end up in a world where smaller models can be more specialized and we don't need to spend as much compute on it. But on the other hand,
yeah, you might at the same time cut off certain types of knowledge and things which you as a designer of the LLM might think or might deem to be useless. But the LLM in the end might still kind of be thankful for having sort of in its training data for doing small perturbations, mutations later on. Right. So I think in general, it's like
At least at Sakana, we have this notion that learning always wins. So ideally, in an optimal world, we find a different way how to target these larger systems to behave more specialized. And I think in the AI centers, but also in Disco Pop, there's some small amount of prompt engineering going in, which can oftentimes go a long way.
In your work, you're talking about discretizing continuous parameters. What does that imply philosophically? Do you think that the discrete world is the best way to understand reality and that's how we understand the world? In our case, the discretization implies a certain resolution of the search space. So basically in 2D you have a grid, and each grid is represented by a number combination.
And as you scale this up, the volume is increasing exponentially. So there are inherent limitations to this discretization, which make it hard to scale to high dimensions, for example. But if you think about how we train or--
like large language models we're basically treating everything as bits right and yeah i think there are limitations to it in the sense that you can't easily scale to high dimensions and sort of using floating point numbers and potentially low dimensional projections might be might be easier or better but there are also advantages in the sense like you constrain the search space in a certain way right so
I think the jury is still out whether or not discrete representations are better or continuous. I think the whole deep learning revolution kind of showed that continuous representations might have an advantage, at least if you're doing gradient descent. For other things, it might be different, right? So if you do a black box optimization or evolutionary optimization, discrete structures might be easier to fiddle or to perturb, right?
Yeah. What's your philosophy about just the complexity of this kind of, you know, evolutionary meta-optimization in general? I mean, if you look at a lot of the hyperscalers like OpenAI at the moment, I mean, yeah, they're doing this insane engineering, building a globally distributed system, but in a way it's simpler. Yeah.
Right. You know, they just have one model and they're just doing stochastic gradient descent and so on. What would it look like for a hyperscaler to deploy the kinds of methods you're talking about? I think actually, like at this point, there's also some type of evolutionary optimization going on for GPT style training, right? In the sense that there is a community. Oh, yes, yes, yes. Of PhD students who have sort of discovered certain default hyperparameters and sort of have seeded the search space in a very specific way, right? Yes. So I think...
We can talk about evolutionary optimization on the per project scale, but we can also talk about evolutionary optimization in the sense of collective intelligence of people investing time and resources into these systems. So I think if OpenAI had to start from scratch and the Atom Optimizer had not been published open source, they might not be where they are right now. So I think this is one aspect or one answer to that question. The other is--
When you think about our Disco Pop paper, you don't actually need large populations of candidate solutions in order to discover something new. Chris already said these proposals are using very intelligent exploration. So we can even just use a single candidate solution, evaluate it, and then do this in a loop, basically updating the context, getting a new--
new candidates. So I think the scale of evolution that we have in nature is not necessarily something we can do with GPT-sized models, of course, because it's too expensive. But we can always try to do evolution on a much smaller scale and then see whether or not what we discovered on the smaller scale generalizes to the larger scale. So for example, in previous work that Chris and I did, we looked at--
meta-optimizing evolutionary algorithms themselves, sort of in a meta meta loop. And we found that you can do this on very small tasks and essentially optimize the algorithm on these sort of low dimensional tasks and then transfer them to more high dimensional tasks later on. So as long as you sort of choose the meta task distribution on which you find your signal of improvement on in a smart way, you can go on and sort of transfer this to more complex settings. I think
There is especially room for things like evolutionary optimization when it comes to optimizing things like data mixtures for large language models and thinking more about how can you set up data that incentivizes chain of thought reasoning or reasoning more generally. So you could think of settings where you train very small language models on a task distribution that is generated by some type of transformer or algorithm, and then
you optimize that transformer to give the downstream trained transformer basically in context reasoning abilities. So I think especially in the context of synthetic data and the smaller evolution regime, there is a lot to discover that can potentially transfer to the larger scheme.
I'm completely on board that what's really exciting about work like this is we can work on the small scale. We can discover just entirely new methods of doing things. And then we can kind of transform it into a classical method, if you like, and then productionize it at scale. Yeah.
But I'm really excited about sort of biologically plausible intelligence. I want to have this kind of thing running at scale in the first place. I want the, you know, the antithropics of this world to be building this living, breathing system that does, you know, several levels of meta-optimization. From a software engineering perspective, that seems like a nightmare to me.
And I wonder whether it's a threshold we'll ever cross. I don't know. It's hard to say at this point, right? With Moore's Law continuing to a certain degree, at least in the ethics sort of sector, I think there might be a chance that everyone can have their own personalized AI assistant, right? And we're going to see these forms of collective intelligence. I don't know.
I think I also prefer that future in the sense that it allows for much more sort of customization and sort of neat adaptation for the user. And seeing sort of evolution not only run on like the biological scale, but also on the synthetic scale is something I'm very, very excited about. Because ultimately, like I already said before, natural evolution is the only process of which we know for sure that it has led to general intelligence in the form.
of us, right? So I think there is a lot to learn and a lot to be transferred, but also probably a lot to be discarded, right? You don't want to copy everything. And I think something that I also learned from Sakana CEO, David Ha, like,
They had a paper called the sensory neuron, where they basically looked at how transformers can work or learn or meta-learn generalized policies on pixel mutated representations. So humans in many ways could not do such a task while computers can, right? So there's a whole set of tasks which we humans with our sort of inductive biases imprinted by evolution cannot really do.
Because it would take us a lot of time to adapt, given that we're already in a, let's say, local optimum of our cognitive systems. While machines can do them and potentially, yeah, there's a different type of evolution that has to happen for these machines to improve basically endlessly. I can't believe we didn't talk about this before, but you're working for David Hart. Yeah.
David Haar, of course, was at Google Brain, right? And he created this startup called Sukana. And it's the kind of AI that we love. And fans of the show will know that we love it. What's it like working for David? And just generally, what are you guys doing? I can tell you I've never been happier in my life. Like, I'm not faking anything. It's really great to be in a place where sort of creativity and unorthodox ideas are being promoted.
And also, there is a lot of freedom to execute them. So Sakana is a Japanese-based startup co-founded by David Ha and Lyon Jones, who was a co-author on the Transformer paper. And yeah, it's been quite the adventure, not only learning on the technical level, but also on the business side of--
yeah, this company and in this crazy, crazy era. So I think something that at least for me separates like Sakana from places like Google and so on is that we're trying to do things other people are not doing, right? So we're not in the language model pre-training game, but we're essentially trying to
go down the path of ideas other people are not necessarily super willing to commit a lot of resources. And I think we're thereby filling in a big gap in the community right now. I'm really excited about this work though. Why are other people not doing this? I think we as a community are probably to a certain degree stuck in like a meta-scientific local optimum.
The linear path forward is just to try to do scaling and taking VC money and burning it by pre-training the SOTA model for two weeks and then having a new SOTA model. It's hard and it's also not easy because you need to be frustration tolerant in some sense, right? So if you think of it like pushing scaling laws is in some sense the easy thing to do. It's a very linear thing, right? It's hard on the engineering level, but conceptually it's straight forward.
forward, I would say. So yeah, maybe that's the reason, but I can only speculate. I think David has had tremendous effect on my PhD in terms of inspiration and sort of bringing like partially ideas from Jürgen Schmidhuber into life and sort of doing a lot of exciting stuff himself.
And to me, he's a good person to guide such an effort. And Leon is a co-inventor of the Transformer, right? So there's a lot of technical expertise at the company and a lot of smart and sort of outside the box type of people work.
working there. Yes, I was about to mention Schmidt Huber. He worked on things like Godel machines and artificial curiosity. I mean, he was just thinking about stuff like this a long time ago. And of course, David's done a bunch of work with Juergen as well. I suppose maybe the dividing line is
I believe, and I think you believe, that we should look to the natural world for inspiration, right? You know, just collective intelligence is a big one. And, you know, it's kind of like biomimetic intelligence. And the other school of thought is that intelligence is this kind of pure, simplistic thing.
set of principles that can be scaled and scale gives you everything and that seems to demarcate the two views we can talk sort of about intelligence on the individual scale and we can talk about intelligence on the collective scale right and um on tuesday like allison gopnik gave this talk about sort of um yeah child development and how sort of societal structures sort of really um
give room for causal experimentation and learning about the world. And this would not be possible with other agents available. And I think if we really want to get systems that are capable of sort of adapting on a lifelong horizon, you need some social structure that provides safety, but also sort of information asymmetry. So a baby can't learn about everything in the world
at the same time, but it needs the parents to essentially safeguard it from certain things in the world. If you think about it, I thought the world was a completely beautiful and bliss place until I turned probably 10 or something. I couldn't have imagined that criminality exists. And these safeguards also enable a very focused style of learning, which is probably only
possible if other agents are in the loop. So I think collective intelligence is something fascinating in complex systems as well. And I would think that there is some parts of intelligence that can only be unlocked
sort of going in that direction. - Anyway, Chong, so you wrote this paper, "Automated Design of Agentic Systems." Can you give us the elevator pitch? - Yeah, for sure. So this is a fun piece of work led by Seung Ran in Jeff Klune's lab, together also with Jeff Klune. And this kind of work fits very neatly into this kind of general Cambrian explosion of LLM-driven discoveries. So super related as well to works like "Disco Pop"
Also stuff like Eureka, Prompt Breeder as well from Fernando et al. And the kind of gist here is that, you know, how far can we take this current paradigm of like LLM driven discovery
And the kind of use case that we wanted to investigate in this paper is that LMAgents are kind of ubiquitous everywhere now. So things like AI Scientist that we're going to discuss later, that's an LMAgent system that we handcrafted painfully over months. And things like Cursor, things like these literature survey tools, all of them LMAgent systems, and crucially, all of them pieces of Python code.
And LLMs are now at this point where it's become quite meta. The fun way I like to describe this is like LLM agents are now writing LLM agents. Together with LLMs writing preference loss functions, it's getting super meta. So with increased capability of things like Sonnet 3.5, GPT-401, these systems can now discover pieces of code that are hundreds of lines long.
And this involves like all the genetic systems. So, you know, we have traditional structures like chain of thought, debate, rag type systems for answering. And the question we want to answer is, you know, can we just design all of this from scratch and evolve these systems with respect to something like, for example, MLU performance, benchmark performance?
In the limit, we'd like more abstract things, maybe like human preferences as well. But can we just design all these systems from the ground up? And we've got some pretty promising results, including, for example, in the ARC challenge,
and we show that the kind of agents that pop out of this are like super non-intuitive and you know it's like we have maybe like some intuition about like how we design lm agent systems you know we always think you know let's think step by step might be a good one but there's like weird stuff as well like you know i'm gonna be really angry if you don't like respond like well that you can also like put in and like can like lms sort of search this like
entire space, this Turing complete space of all prompts, all workflows, invent new tools, new ways to use code that perhaps humans haven't come up with. And there's all these kind of themes as well. And like all of the evolutionary type work that we discussed previously, you know, aspects of recombination. I think like one
thing that hasn't been covered so far is this like aspect of serendipity so you know by stochastic sampling say we like sample like a thousand times for example can we randomly get like something surprising really good that we can like archive and then like in later iterations of like this evolutionary loop can we like build on that as a stepping stone towards uh future stuff yeah i'm a huge fan of stepping stone collection yes yeah i love that so so the first thing is
all you do is you're using code as the primitive now certainly solutions to the arc challenge they they seem to be split down the middle so some of them are inductive code generation approaches and some of them are just transductive approaches where you just do the thing you don't have an intermediate explicit function and you said something else which was fascinating searching touring space so these models are not touring complete
Right. But they can generate code, which is so in some sense they can search Turing space or can they? I don't know. What do you think? Yeah, for sure. So like going towards, I guess the transductive stuff, like you can imagine that in code our agent,
it might not even be like LM calls, it might like, you know, load some diffusion models off of Hugging Face, try a little bit there, get LLMs, like parse the output of the diffusion model or whatever. And, you know, in the limit, like we talk a little bit about this in the paper, but like,
you've got this like Turing complete space. What if, you know, uh, the code writes code to train another LLM. Obviously it's like super unrealistic now for like current, uh, compute budgets, but theoretically, you know, it could train another LLM to train another model, come up with new stuff, um, to like solve the problem. Uh,
Far beyond just pure like LLM calls. I'm a huge, you know, I'm a bit of a neuro symbolic guy. I, you know, code has compositionality. Yeah. Right. You know, it's, it's, it's too incomplete. It can do lots of things that neural representations can't do. But, but we are increasingly seeing that like neural networks can do a lot.
I mean, they're limited in ways that we understand. They can't copy, they can't count, you know, they're stupid in lots of ways. Maybe we'll figure out ways to improve that. But lots of people now who did have the intuition that we need to generate code are now just saying, let's skip that, guys. Let's just do the thing. I think for stuff like, you know, for example, like tool use, like integrating like web search into like...
These like agent reasoning workflows and integrating things like calculators, for example, you know if like some kind of agent computation requires like tons of computation multiplying like loads of numbers doing lots of complex math. Like possibly like when we get to like super intelligence levels of like neural networks will be able to do that, but I think these things.
are just obviously far more suitable for coding, like integrating these kind of things, possibly with also like neural network-based solutions, I think is likely going to be the future. Yeah, I think that's a fair shout actually. And another thing I was thinking of is,
Symbolic code is more compositional if you want to do multi-step reasoning, right? Because if you think about it, if you just do like a whole bunch of like, you know, transduction operations that might not gel together quite so well. But in a multi-agent system with tool use and so on, what we want to have is lots of reasonable building blocks that are intelligible that can be chained together. And I suppose that does seem more amenable to code.
Yeah, for sure. So like one of our kind of inspirations actually was some was things like Langchain, for example. So loads of like reusable building blocks, things can be combined together. And in many of the discovered agents we find actually is sort of like two agents kind of like sandwiched.
together, maybe you've got one agent that makes an initial prediction, another one that refines it. And these two blocks are very modular and just previous agents actually from the archive. How do you balance exploration and exploitation? In terms of exploration, basically just tons of sampling
and mutation so we sort of explicitly encourage you know think out of the box you know use knowledge from like adjacent fields and machine learning uh trying to integrate these kind of concepts
into your next design. I think there's a lot of work as well, quite related to what Chris was saying about changing the context. You can inject a lot of entropy. For example, we've got new stepping stone accumulation algorithms, also from Jeff's lab, like Omni, that basically have an unlimited size of archive, for example, and you just keep sampling new things, hoping for new combinations.
And that kind of strategy, we find kind of like scales almost infinitely. We've run systems that can discover novel artifacts for maybe over 5,000 generations and judged by a language model, even by generation 5,000, like some of them are still like noticeably different than previous ones. So I think injecting that kind of entropy into the context helps a lot.
In terms of balancing exploitation, I guess there's this worry, what if we're overfitting to benchmark scores, for example? Interestingly, we find that even with these humongous agent systems that we discover, they transfer exceptionally well. So in our ADAS paper, we try and make the search as efficient as possible. We've got a high-level agent that's the strongest coding agent that we can find.
And in the inner loop, we've got something like GPT-3, exceptionally cheap, super fast to run all the evals on. And we find that, for example, we optimize an agentic system for GPT-3 on some math problems, for example. It generates this very generic, super robust reasoning loop that then transfers to GPT-4, transfers to other math domains, and shockingly transfers to literature tasks. So just things like chain of thought, debate,
like rag based tool use, like these kinds of things are super generic and like applicable to all kinds of like tasks. And we find that our agents as well similarly transfer quite well. So we've got this kind of like extreme like exploration that we think we can scale super far. I guess at some point I do believe that you do get like diminishing returns from like trying to like scale this and like, you know, at some point you're trying to squeeze blood out of a stone and get like capabilities out of foundation model that probably aren't there yet. But
As far as we've seen already, you can go super far and discover super general reasoning structures. You mentioned debate, which is a great little digression. I interviewed Akbar Khan. His debate paper was paper of the year, I think, I see it this year. Amazing paper. That's like a really good meta-discourse.
isn't it, for agents? Maybe explain what that is. But I mean, are there other similar things like that? I guess debate in general is that, you know, instead of just asking the LLM a question and then just receiving a response, you might, for example, try and do a multi-turn kind of thing. So, yeah,
you basically have agents set up perhaps in a kind of adversarial sense, you know, one person proposes an answer, another person criticizes, and this kind of loops each person criticizing and see if you can kind of reach a consensus towards the end. And we see these kind of structures actually being discovered quite a lot in our agent optimization. Another is kind of related one that I really like that our
optimization discovers a lot is like specialized experts. So for example, you know, a lot of the time in like science, you know, we get a lot out of interdisciplinary collaboration, people with different kind of views. Our agent discovers, you know, you prompt people
Kind of like in a debate sense, you prompt agents to specifically criticize efficiency, specifically criticize readability, specifically criticize the accuracy of the solution. And then this kind of targeted debate rather than just a generic kind of criticism. It's honestly even hard to describe some of the systems that our system generates as like
20 calls all chained together. In the middle, there's an efficiency expert, a readability expert, an accuracy expert. And then after that, there's debate as well. I look at these designs and I think, how on earth would I even begin to come up with these structures? And I feel like one of the
kind of main hopes is that, because we're operating this kind of Turing complete space, like, are we going to be able to, with this kind of framework, discover the new debate, discover that new kind of framework
that's going to lead to, for example, you know, next year's best paper award. So I'm a huge fan of building agent systems. I use the actor pattern. So I'm always doing stuff like this from my intuition, you know, so an actor has a manager and I quite often have critics and I have like an address actor and, you know, like I have, you know, anthropic actors and I have like Google actors and so on. And, you know, even though I'm only ever working one actor at a time, I'm building this distributed system of information flow. And, you know,
there are sometimes problems, right? Because if you think about it, it's this living, breathing system. And sometimes you can get these loops or you get failure modes and you get weird things. And I suppose over time, you build intuition about topologies, about design patterns of ways of doing this. And you're saying that your tool can generate these patterns automatically. But I'm just thinking...
because these things are like their code, that they could run forever, that you could get these like weird, you know, like degenerative behaviors. Do you validate for that? Or what happens? One easy solution against the sort of like degeneracy issue is perhaps, you know, setting a time limit. And this is also one of the things that we want to add into systems like ADIS. Like, you know, not only do you like return an answer that's like,
maximizes benchmark scores, but also with like a set cost, uh, maybe like a set, like runtime as well. And I, I really like the point as well. Like, you know, we're really kind of like discovering, like, how do we like organize computation in like an efficient way here? It's kind of like, uh,
If you imagine all these agents as workers in a company as well, companies evolve structures that best make use of their available agents. And we see hints of that kind of thing as well. In the limit, I think things like ADIS, you could have...
you could basically try and work out how to use, say, like a thousand different actors all prompted to do different things and intelligently combine their outputs in the right way. Exactly like, I guess, how you would organize a company, for example. So we seem to have happened upon certain topologies that work.
because we have this knowledge transfer bottleneck, right? It's really difficult for us to transfer information efficiently. So, you know, most companies are quite hierarchical. But I just wonder whether that's a natural thing, because if you think about it, AI agents, they can transfer information really high bandwidth.
Right? Yeah. So and maybe we can have our cake and eat it in the AI world, we can have every topology running at once and somehow sharing information. So did you think it would resemble the real world or it will go in a completely different direction? Absolutely. And I think this is like the kind of the importance of this kind of like evolutionary loop. Like we don't exactly know, like, we just sort of import like our human intuition about how we work best.
into like structuring computation, I don't think it's going to transfer well. Like for example, you know, like humans are super limited, for example, by like working memory. Like, you know, I think we have a working memory of like six or something.
And you can see this in the structure of our mathematical theorems. You can see this in the structure of reporting structures. But AI agents, I really don't think are limited in this way. So for example, Gemini these days has millions of tokens context length. Feasibly, this could be like a human manager that has a thousand reports.
And this kind of thing, like integrating like, um, outputs from thousands of things that, you know, you could imagine, like if a human could do this, you collapse like the hierarchy of a company, like by like, I don't know, many, many levels. And, uh, this would probably be better for everyone. Um, I think AI agents stand to be able to like use these kinds of structures much better. And I think, yeah, we do need to evolve. We need to kind of, uh,
basically just rerun that kind of like cultural discovery loop that we had in like human society to like make all these company structures, like write all these blog posts about how to make your startup or whatever and let like AI agents discover this for themselves.
How can we bring Kenneth Stanley's ideas in here? Because one thing that worries me is that if the bandwidth connection is very high between agents, it will kind of lead towards building more monolithic systems. And if Stanley were here right now, he would say, we need to have agents that are searching for novelty, so they need to follow their own gradient of interest for many, many steps. And how, you know...
i suppose is that something we should code in or are you seeing that kind of thing coming up we have like elements of that you know we do say you know think out of the box but ultimately we are kind of like guided by um like benchmark scores and um like one of our dreams for example is to really go like um
uh down this sort of like full open-endedness abandoned objectives like follow your like nose for interestingness so like one kind of like thought that we have is like you know we've got this system now for like optimizing agents for like uh one particular task can we have another system that you know proposes um challenging tasks like uh so really not like hard focusing on one objective like really trying to do this kind of like gold switching that ken sally talks about a lot um
And can we have, say, like a proposer agent that designs more and more challenging tasks? Maybe that agent's also like evolved. Can we evolve systems like meet these new challenges? Can we run this in a loop? And, you know, with harder and harder challenges, the full sort of like dream of like open-endedness, can we like co-evolve these kind of things together? And I think that kind of approach works.
It's going to be very fruitful in the future when we get a compute to make that happen. Yeah, I'm so torn on this. I love reading software engineering books about design patterns. I use them in my code, you know, like the mediator pattern or the observer pattern. And...
you know, maybe this is just wrong. You know, maybe these models are just smarter than us and they can come up with better topologies. One thing I thought of as well is the meta agent itself is an LLM agent. So could you not like, you know, create a meta meta agent? Great question. Maybe I'll answer that question in a few months when we get that result. Interesting. But yes, totally, totally. So for example,
And this actually really relates to stuff that Chris and Rob were saying earlier. They were doing meta meta optimization.
I think actually there's a piece of work recently that's called Stop, I believe, that got an oral at Calm. And the idea was there, they've got a really simple task that you design an LM agent for. You search for better programs for this task. So there's one level there. And then you have another agent that tries to improve the search process.
And that's like super like really going towards a kind of like Schmitt-Huber, Godel-Machine-esque ideas. Like you like improve the inner task, you improve the outer loop, go on forever, recursive self-improvement. I tend to think that at some point we are going to be like squeezing blood out of a stone. I don't think this is like the full recursive self-improvement loop that's going to go
Go ASI, we're very much bounded by model capabilities, but could you have this meta meta optimization, go as far as this model can take us, then do something like gradient updates, optimize for the best thing that we found, keep continuing that loop on and on. It does get to the core of intelligence though. I'm a big believer that it's quite situated and specialized.
And certainly with the experience of generating these agent systems, you see topological bottlenecks and you see like, you know, locales of specialization and so on. So, I mean, what would it mean to you to have an ASI in this kind of setup? I think my vision is quite similar to the kind of what I proposed earlier, just sort of
Like human culture and society, we continuously find new and interesting new challenges for ourselves and work towards those. I think this goal switching, having no predefined objective,
really is like critical for this. And I kind of want to see if we have like some kind of system that can basically continuously propose, like just accumulate like a tech tree or like a skill tree of like things that it can do. And then just like continuously build this out, hopefully forever, just like human society and culture has.
So perhaps not just... It's really hard to tell even what this kind of thing would find. This is way beyond benchmark optimization. So it's just kind of human culture. Kind of really hard to predict what the next wave of...
the you got the next wave of innovation um and then stuff that builds up that stuff that builds off that and then you know sort of looking 10 years future it kind of looks like magic and sorcery and like completely unimaginable so i couldn't kind of predict um what that will end up looking like but i think that this is the kind of system that we kind of need to like realize full like kind of like asi like capabilities let's talk about your your other paper um that we're going to cover today so intelligent go explore can you explain
the paper. Intelligent Go Explore was also a collaboration with Seung-Ran Jeff, also wonderful authors to work with. And we come from, I guess we start off with this like Go Explore algorithm. So often in reinforcement learning, the problems like exploration, how do we find good paths through the environment that get us like into good states?
and go explore super influential work from 2017, I believe, where the idea is that actually very similar to these evolutionary algorithms, you have an archive of discovered interesting states. You keep selecting states to explore from, for example, taking random actions. You then put those states that you find promising back into archive and loop until you find something good, and then you can robustify these trajectories
afterwards. And one of the key kind of snags of Go Explore is that you have to have a really good like interesting function in your environment. So for example, Montezuma's Revenge, this used to be one of the grand challenges in reinforcement learning.
And GoExplorer was one of the first algorithms to kind of solve this. You had to have bacon, like, you know, going down levels was good. Finding keys was good. Having like more agency in the environment was good. And with the advent of modern foundation models, like we have this insane opportunity, which is like,
A lot of these games conform to human intuitions. We have this nose for interestingness. We go into a video game and we know that I want to go forward, I want to get new stuff, I want to collect interesting things that might be useful to me later. Can we use that as the interestingness engine in place of hand-designed heuristics?
So we set out to do that, basically using, very inspired by works like Omni, which basically tried to also use a language model's nose for interestingness in sort of like task selection. But this is sort of like using its nose for interesting to discover new states in an environment. We kind of get sort of really nice results on a variety of hard exploration RL environments just by following a foundation model's nose for interestingness.
And I think like foundation models for complex exploration environments is going to be quite a big challenge in the future. And you can imagine, for example, reframing even things like scientific discovery as like exploration through a very large search base. And you need to archive like stepping stones. All of this is so related to like LM discovery works like AI scientist.
And R1, I guess, is like the very specific application of that kind of principle to hard exploration reinforcement learning environments. Yeah, it just blows my mind that LLMs are so good at creativity because Sabaro Kambahati is coming in a little while. When I spoke with him last time, you know, I was kind of saying to him, I think there's a creativity gap in LLMs.
and indeed a reasoning gap and the zeitgeist is shifting so I've spoken to so many people in Europe so they're saying no actually they do do creativity and they do do reasoning to some extent and just kind of like how do you think this works how can it generalise the representations it has to something like Montezuma's Revenge For sure so Montezuma's Revenge is like not something that we tried before tried already but we really want to but
LLMs have so much human prior knowledge in games, like what's important. And in something like Montezuma's, the key is to explore more, to gather useful objects, to descend down the levels, to my recollection.
And I think these kind of priors for exploration are really baked into the model. We know these things are good, humans have a nose for this kind of thing, and I think Foundation models have acquired a lot of this. Because these games are super represented, or discussion of these games is really represented in the training data of LLMs, we stand to do quite well.
Is then another question, for example, say you took like current LLMs and then, I don't know, you put them, say like on a field of science, a field of science that's going to pop up in 10 years, which, you know, no one even knows the terms anymore. I think that's going to be a much different question.
Then, you know, I guess it's more like you've got to retrain and transfer your intuition to this new setting. I think, you know, if you took a guy from ancient Greece and plopped him down, it might take them maybe like a year to readjust to the 21st century. But I think eventually they would also be able to make the same kind of, develop the same kind of like nose for interestingness.
with a bit of effort though i mean one thing i'm going to ask you guys about the scientist paper in a little while is that if you know if you travel back in time to newton or whatever and you gave him a 21st century physics book he wouldn't know what to do with it he wouldn't understand it right because it's using all these terms and so on but but it comes down to abstraction you know there is fundamental you know kind of principles you know which explain how the universe works
And even if you look at some of the Arc solutions, the art of it is describing in language what the problem is like.
you know, just and so maybe there's an element of in the prompt you're doing some kind of analogical reasoning. But even to do protein folding and future scientific discovery and so on, surely there is some kind of a map that would describe it in a way that a language model can understand. 100%. Yeah, I think it is all about like adapting like the right abstractions for the task at hand. And like even for
you know, like future science, like I think like these kind of generic reasoning structures are kind of like, like things like debate, I guess, like transfer so well, you know, throughout the ages, as long as you can kind of like make the right abstractions for the current problem at hand, I think we can apply a lot of,
these kind of structures that we've learned already. At the moment, it's just text, right? What about other modalities? So our initial version was just text, but now we've also got image-based environments as well. So now we show that actually our algorithm can operate in, for example, visual grid worlds. And in many ways, actually, I think that it might be easier. So for example, for a human playing the game,
some of these tech representations for these grid worlds are super complicated. And it's like, you're down here, you're in the center of the map, you see a door one block to the east, key two stops to the west. That's actually really hard even for humans to reason about. But you got this image, for example, of the board. You can very clearly say there's something on the left, you just move to the left.
There's like an interesting thing that you haven't seen before in front of you so I think
There's still a lot to be done for VLM reasoning in video games, but there's some really nice recent work trying to benchmark LMs for these, and hopefully we'll see future work trying to do RL on these models to make them better for them. I think there's already a ton of work actually adapting VLMs, et cetera, for these kind of RL environments.
And our thing really just sort of sits on top of that. It's like you've got this VLM that can act in an environment, for example. Our thing is sort of like a higher level loop that basically says, you know, cache stuff that you find was really interesting. It's sort of like composable on top of any agent architecture. I'm just thinking in the future, so when we use algorithms like this in the real world,
I suppose it becomes less about an objective and more about taste, style, alignment, ethics, and so on. And perhaps even in the current setting, you know, it might beat the game, but it might not do it in a very aesthetic, you know, way.
pleasing way. So what are your thoughts on alignment and sort of putting style, aesthetics and value into the algorithm? I think, um, like following like a language model is like nose of interestingness. Like we kind of like in the paper, we then project that down to like, you know, achieving high reward, uh,
in the environment. So it might be the case that the language model finds a lot of things that it finds interesting, loads of trajectories to an environment, but then it maybe needs to pass through a filter, for example, like human understandings of style. We can design reward functions to score things that it creates. And I think this really speaks to the values we impose on these systems, the human supervision
for example, if we, I guess it's kind of like, you know, you set off like a grad student, for example, on like a science problem or whatever, they find like 10 different paths. And then like a supervisor might, for example, select one for the grad student to pursue further. Another thing is that it's probably not
something that is an issue yet because we're dealing with quite abstract forms of reasoning. But at some point with real world applications, do you think we would have a problem with some of the cultural biases and the language models kind of pulling it in a particular direction? Yes, 100%. So like,
kind of like failure mode that um a kind of failure mode that we kind of um hypothesize for example you know what if the is a very contrived example but what if the lm just is trained green and like all the interesting paths in the environment are green and then you would expect that something like intelligent go explore like really just like amplifies your client biases and doesn't find
the right thing. So I think this really relates to work in sort of like debiasing language models. Perhaps like there's maybe some like kind of training element of this, some kind of supervision. We've got to like collect like corrective data for these kind of things. But very much we go to sort of like continuously like monitor these kind of things. And I think there's like no right answer as well. Like
I guess, like, biases all around us. And I guess we need to, like, sort of, like, correct for it adaptively as we see it happen. So you guys wrote the AI scientist towards fully automated open-ended scientific discovery. Now, this did the rounds on Twitter. I mean, this was picked up by, you know, newspapers all around the world. It was very, very exciting. There was criticism of it as well. But why don't we just start with the open-ended piece, like,
Tell us about the paper. In one sentence, it's we try to use LLMs to write new papers that are hopefully helpful to the community eventually. I think something that's important is that it popped out of essentially all of the work that we've been discussing before. So in many ways, like this go pop showed that we can write code and do discovery on that level. And basically, Chris had this insight or like this intuition, like maybe we can go beyond. Maybe we can automate the entire loop starting from idea generation
to writing codes in terms of experiments, executing them, and then finally writing a paper based on the log, the numerical results. And I think this is one of the key insights that I had on the meta-scientific level. It's all about timing and realizing what is just reachable with these systems. And I think--
yeah the impact is still to be seen in some sense um but we're all pretty excited because we had many moments where we were sort of mesmerized by what we found i think that was like one of the key themes like we were constantly kind of shocked like uh what you saw like i think actually uh christen has like twitter thread had like some early examples and the the early stuff is like really not that inconceivable it's like
propose one experiment, run it, write a few paragraphs. And, you know, what do you need to get like a reasonable-ish paper? Just, you know, chain that kind of a few more times and explore something a bit more deeply, try and connect that all together with the story. And I guess like one of the things I guess by the AI scientists is, yeah, exactly as Rob said, it's like kind of a function of the time. So we already see like,
as AI capabilities kind of increased, like we saw them being used in ideation. People from Stanford have been thinking about this kind of thing. LMs can generate just as novel ideas as human researchers. There's like review agents, there's like coding agents together. Why not just like chain that all together and like,
I think we demonstrated that kind of proof of concept of chaining that all together. But thousands of researchers are now working on every single component. So in the next year, I think as
kind of all that knowledge kind of like comes back to us and we chain that together again, I think we're going to see wonderful things. Yeah, I think Song also likes to talk about this as like the GPT-1 moment of scientific generative AI for science, right? And I feel it's not only like the community coming together and sort of improving each
sort of module of the system, but also, yeah, the increased capabilities of frontier models going forward, right? So, yeah, we're at a great sort of Cambrian explosion, I think, for these types of systems. One interesting thing is, so I looked at a bunch of the papers and superficially, at least, they look fantastic. They look fantastic. I mean, if you dig into them, you can see a few problems. But one interesting thing is that
If you get a language model to generate an entire paper, so you say generate me in LaTeX an entire scientific paper of zero shock, it will be...
Just pure banality. It will superficially resemble a scientific paper, but it will look terrible. And then the next step is, and this is something that many of us do because we learn how to use language models, right? So we use cursor and we generate a sketch. We select some text and we say, give me more, you know, and you bring some source data in, double click on that, generate a table, improve it, improve it, improve it. And what you're doing is it's a little bit like, you know, the Google Maps analogy where you zoom in,
And then the tiles get smaller and smaller, and we supervise this process, and we just add more detail. So it's almost as if the implication is it's not necessarily that the language models can't do it. They just can't do it all at once.
And what you need to do is like, you know, zoom in and zoom in and just kind of like, essentially what you're doing is you're leveraging more effective computation where it needs to happen. And your work demonstrates that potentially we could actually automate that entire process. Exactly. And what you're saying would also be really cool if there was a human in the loop. We kind of just want to push them as well. What could we do fully autonomously? I think part of the original idea came from the fact that Disco Pop, like,
the results of itself could be its own paper, right? Like the discover loss function, if you could describe why it works and all the interesting things could be some paper. And the only thing I was really missing is the write up, right? And LLMs are good at writing. So it was just a pretty easy thing to put together. So I think like, yeah, while there are issues with the generated papers and the details, I think the fundamental capabilities are like
almost or mostly there. Well, onto the doubles advocate. So I was reading on Hacker News and the most uprooted comment was this. So both the community as a whole and the people within it don't learn by simply reading papers. We learn by building things, running our own experiments, figuring out how other contexts fits in, discussing with our colleagues.
That's why it takes an eighth of a lifetime to go from the world standard of knowledge to being a PhD, basically. And this is quite a common argument with generative AI, that it's not so much the output artifact. We were saying I could go back in time and give Newton a physics thing, right? It's this kind of cultural memetic knowledge transfer. It's the physically embodied process of doing science, doing exploration. So do you think we still need that? 100%.
And what I would really like to see is our next generations of AI scientists being able to incorporate all that months and months of scientific exploration that knows that it builds up.
So for example, as PhD students, say even if we fail at a project for six months, fail again for another six months, we have so much intuition about what doesn't work, what does work. And a lot of this is, I think, kind of hidden as well from the scientific community. We only publish positive results. And something like the AI scientist, for example, is really valuable. In that sense, can we use AI scientists as like...
as just a data generation tool, like wide explorations, like intuitions about what might work, what might not work. Can we like get much deeper knowledge than just the tip of the iceberg that's published in the paper? And can we have that? Can we like distill that back into a system? Like one thing I like thinking about is, for example, you know, we've got systems like O1, the open source one like R1, that like RLs based on, you know,
performance on things like grand truth reward like math and coding for example um could we for example chase benchmark scores even though with all the problems that they have could we optimize ai scientists to do better and better this rl task this diffusion task this nlp task
And this will allow it to sort of like incorporate all of that knowledge from like exploration, all that failed experimentation into the process and sort of truly kind of like go towards the steps that we would go through in our like kind of PhD process.
I think maybe one point to that. So in the AI scientist paper that we put out and in the system more generally, the experimentation is fairly linear. So you have an idea, you implement sort of an ablation or whatever in code, and you run that ablation, and then you get a result. And if the result is not positive, you still write a paper about it basically.
But if I think about my own scientific process, it's oftentimes like I spent 80% of the time trying hypotheses, rejecting them, coming up with a new plausible hypothesis, and doing essentially causal modeling. And then in the end, I spent the remaining 20% on writing the paper, getting more empirical results, and so on.
So something that we really want to see going forward is this more iterative approach of specific hypothesis testing, integrating the knowledge of that experiment into the next experiment. And I think once we unlock this and go to a more open-ended system that can really reason about the results that are collected using some ground truth code evaluation, for example, I think we're going to make major steps forward.
I suppose it comes down to the philosophy of what science is. So is the purpose of science, I mean, obviously it's epistemic foraging, basically. So we're discovering new knowledge. And is the purpose to enhance the knowledge of humans, or is it just to enhance knowledge in of itself? And there's this weird thing I see with generative coding, which is
It's getting so good now that I can write software completely automated and I don't even understand what it's written. But when there's a problem, we have this understanding credit card problem, which is that now I need to go back to first principles and understand the code that has been written. And do we risk that happening with the AI scientists that it would just create all of this knowledge, which is still like weirdly quite far and hard for us to understand? Yeah, I mean, I think...
That's kind of the point of papers, to be honest, in the academic community. So Rob talks about how he spent so much time building intuition and like 80% experimentation and things like that. And if that's just all sitting in Rob's head, that wouldn't really be science, right? He ultimately has to put it out in a paper to share with everyone. And then hopefully we can get some glimpses of the intuitions that he got there.
It's the reverse of the commentary here. Another set of commentary was like, why do we have these things write whole papers? The real nugget is really maybe a couple sentences of information. I think the ultimate artifact here is the reason why we do papers is because this is what the scientific community is designed to be human-interpretable and it's how scientists communicate with each other.
And so ideally, a really good paper would clearly explain the idea enough for us to understand it. So ideally, if the paper reviewer was really good and was similar to humans, then the papers should be pretty good.
if the paper was really complicated and we couldn't understand it, it'd be rated as a bad paper regardless of how good it was theoretically, I suppose. Maybe along those lines. So something that we also have as an artifact is the code generated the results, right? So there's not only the paper, but also essentially a reproducible pipeline for generating the results.
And I think more generally, maybe a bit contrarian, once we get to the point where an AI scientist system can get papers accepted at a conference, for example, we're really at a point where we have
probably like an equation that says money compute API calls equals paper. I think we as a scientific community really need to rethink whether or not the paper is the right meaning or the right medium, or whether or not we need to think much more about what a scientific contribution ultimately at its
kernel is, right? So if you, for example, think about like residual networks and residual connections, it's like a two line code change that essentially has diffused into almost everything from machine learning. And I think these types of like how well does an idea diffuse is much more of a good metric for a scientific contribution than just getting a paper accepted at a conference.
Yeah, that's very interesting. I mean, for some reason, and I'm probably being puritanical and old-fashioned, I feel that there's kind of like there's the memetic plane and there's the output plane, you know, so the papers are the output plane, the software is the output plane. And yes, we can actually increase...
you know the cultural transmission between papers and code by doing what you said we can have a basically a transaction log so as well as like producing the output artifact we can also have a log of like you know all of the reasoning because you know there's the missing information problem there's the why you know why did we do that maybe we explain some of it in the paper but it's not all in there but we can have a transaction log we can share it with all of the other ai scientists and everything's great right but i still feel that there's some kind of you know you said you have a high
You don't write it in the paper. You share it with your friends. You speak with people at DeepMind and all over the place. And there's this weird cultural memetic plane that somehow isn't captured in the system. Yeah, I agree. But like, so similar to what Song said, like we're basically lacking these types of logs to directly chain on, for example, right? In the sense that, you know,
our system is set up so that credit assignment is only done for positive results. But if we had a log of all the negative results and we would train the system on, it could probably do much better sort of reasoning and hypothesis testing in an iterative loop. So I think one thing that is for sure also a contribution of the AI scientist is essentially
a lot of logs and a lot of data on things which failed. So yeah, it's going to be interesting to see how these types of data generating and synthetic data generating mechanisms can also be fed back in order to increase scientific discovery abilities of foundation models.
models on what you were saying about like the cultural aspect of like rob talking about maybe the parts that are in the paper like in theory the ai scientist logs are far more comprehensive than what rob could communicate to his friends um so in some ways like ai also seems like it would be able to do such a thing as well that's true but then conversely that there's the stanley effect which is that you actually don't want it to be that transmissible right you actually want to you want to create islands yeah i think somewhere in between um i mean
I think there's a lot of different ways you could approach this. But I guess there's different levels of granularity here where the paper is maybe the most refined version and then
talking about like the things that didn't work is like the slightly less refined and then like the raw logs like the least refined thing maybe one more thing I wanted to add is like a vision that I'm really excited about is sort of fully AI conferences so basically we're now at a point where you can generate papers in a open or closed loop basically and
And we not only did the AI scientists, but we also developed an AI reviewer, basically, which scores the paper and was validated on sort of previous iClear 2022 papers. And you can imagine sort of using the open review API, uploading sort of papers that were generated by the AI scientists and doing like full rounds of individual reviewer driven by LLM.
debate as well as like an AC decision, and then have a conference that is sort of only been put together by AI generated content. And then you have best paper words and orals. And this filtering basically gives a good set of or subset of these ideas, which can then be used by humans to actually validate again and to see how well do they diffuse. So I think this vision of open ended research and filtering and human sort of top level control is one that excites me quite a bit.
Yeah, for sure. Can we also capture that human debate between authors and other authors at a conference? Can we get that conversation, how papers relate to each other? And in many ways, they are very related to what Chris and Rob have been saying. How do we see these ideas then propagate into the next AI conference? And then ultimately, perhaps the best score of...
the quality of a contribution might be something like, you know, like an AI test of time kind of thing. Like what has influenced the next generation of AI scientists the most in their paper writing? Do you think we ever might get to a place where
I mean, Tim Rock Tashaw spoke about this in one of his papers about, you know, like an ASI might just be like so alien and unintelligible. And just on a sort of deflationary point of view, this AI scientist community could just develop a weird subculture. And, you know, if you...
Should humans come along for the ride or would we just see it as quite strange? Hopefully the grounding for the work would be in the reviewer pipeline that Rob was talking about, where there is some grounding in comparisons to human papers and human feedback and ideas. In theory, we could try to remove that or maybe try to extract some essence from the reviewer about some core scientific contribution that maybe surpasses current human culture or language or understanding.
But there's questions like, "Why would we do that?" There could be some reasons as in, what if it develops some completely unimaginable technology that is just really hard to explain to humans, but it would just require a lot of trust that maybe would take a very long time to build. But I also think more philosophically speaking, there are things in nature which our cognitive systems might not be capable of understanding, right? So I think there
there is a limit to what we can compute and what we can do while systems like the AI scientists might be able to understand more, but we might not understand the output. So I think there are pros and cons to it, but in principle, this is something we already have with other instruments in science, like telescopes and so on. We can't
sort of have the resolution that these devices have, but we can still sort of try to make sense of what comes out afterwards. -This is very much like the right level of abstraction that we don't understand ourselves, but we have a, you know, like kind of like nice interface that we can exploit. And I guess like there's also a kind of like nice
kind of like proof in that, you know, like from ancient civilizations to here, we really are like superhuman and somehow we've still managed to construct the right abstractions and still understand and use things as best as we can.
I think one more important point in that direction is that we all, I think, believe that sort of the AI scientist and publishing the AI scientist is something really important at the current state of time so that the community can come together and discuss many of these philosophical questions like what is science at its core? What is a PhD student going forward? And I think
it's important to do this type of discussion early on right guys it's been an honor and a pleasure thank you thank you so much thank you so much my pleasure this has been great