We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

2025/4/8

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Kevin Ellis

Zenna Tavares

Topics

Kevin Ellis: 我认为，构建更智能的AI的关键在于结合基于规则的符号推理和基于模式的直觉学习。这两种方法的结合，能够有效解决复杂的难题，例如ARC挑战。此外，组合性和抽象性是构建复杂模型的关键，但同时也面临着组合爆炸和信息过载的挑战。我们需要构建能够探索、实验和构建世界模型的AI，就像人类学习新事物一样。在DreamCoder论文中，我们使用了wake-sleep策略，让模型能够通过“做梦”来扩展知识，并在清醒阶段将这些知识整合到模型中。这使得模型能够适应分布变化，并更好地解决问题。在与Zenna Tavares合作的论文中，我们比较了归纳学习和转导学习两种方法。归纳学习通过生成程序来解决问题，而转导学习则直接输出结果。这两种方法各有优劣，可以结合使用。我们发现，在某些问题上，仔细思考并用语言表达解决方案会降低效率。因此，我们需要构建能够结合这两种方法的系统，并能够根据问题的性质选择合适的策略。在未来的研究中，我们将继续探索如何构建更强大的组合性模型，并解决组合爆炸和信息过载的问题。我们还将探索如何构建能够从少量数据中学习抽象知识的模型，并将其应用于现实世界中的问题。 Zenna Tavares: 我认为，构建更像人类的AI的关键在于从更少的例子中学习更抽象的知识，并能够主动探索和实验，而不是被动接收大量数据。组合性是一把双刃剑，它既强大又容易让人不知所措。我们需要构建能够有效引导搜索空间的模型，并学习组合语言的基本原子。在构建世界模型时，我们需要避免构建那些硬编码大量知识表示和启发式的系统。我们应该构建能够像人类一样学习与新事物互动的系统，而不是通过大规模模仿学习。在我们的研究中，我们使用了结合不同方法的组合方法，包括归纳模型和转导模型。归纳模型输出一个程序，转导模型直接输出结果。这两种方法各有优劣，可以结合使用。在未来的研究中，我们将继续探索如何构建能够进行贝叶斯推理的系统，并从第一性原理出发构建智能机器。我们还将探索如何构建能够学习和使用抽象知识的系统，并将其应用于现实世界中的问题。

Deep Dive

Shownotes Transcript

Translations:

中文

You're building a model on the fly from very small amounts of data, but you are not passively receiving the data. You have to go out there and poke things and push things and try things out.

Compositionality is a kind of double-edged sword. We saw this in the first wave of symbolic AI when people tried to build these kind of compositional production systems to solve problems. So the issue is you immediately encounter this combinatorial explosion in the number of things that you could represent. You don't necessarily have any way of steering yourself through that space toward the kinds of concepts that are probable or which make sense in your current situation.

So you can kind of think infinitely many things, so things might be very out of distribution. You're both immensely powerful but immensely overwhelmed by possibilities. Can we build things from first principles? You know, I think it's hard to say what is thinking and what is not thinking. It does seem that some core part of thinking is, you know, kind of a step-by-step process. Where if you ask people to think carefully and verbalize a solution, they actually get worse.

There's something obviously right about scale, right? And there's something obviously right about learning. I think it's kind of true and like obviously true that programs are, programming languages are compositional, right? We build more and more complex programs by understanding the parts and then combining them together to build more and more complex parts.

Most of the hypothesis development that we do as humans is of this form, both as an individual and of science as a whole. Science as a whole collectively finds hypotheses and then we revise them communally in conferences like this. So it seems natural to me to say that we should build systems that have this kind of refinement process.

One of the hard things to do in trying to mechanize this is like, how do you kind of guide or reinforce good refinement paths? And another axis I think is important, Kevin said this kind of tower of abstractions, that

I think a key thing which is somewhat overlooked in the current discourse on world models is that there isn't a single world model. You can understand things at multiple different levels. There's multiple different models you can build of pretty much anything. There's a camera right here. I've got a model of how this camera works at the level of I press this button and an image is taken. But I can also understand the internal structure of the wires in the circuit. So I can go down to the sensor in the camera.

All of these are different models that are useful for different things that you need to do with that model. We want to avoid building kind of like these Frankenstein systems where we are hard coding a whole bunch of different knowledge representation and heuristics. But in those cases, as you're kind of alluding to, it's required, you know, smart people to look at the world and say, OK, here are the inductive biases. Let me, you know, kind of encode these into the system.

And we think that the principles underlying this everyday science, like how do we form hypotheses, how do we revise those beliefs, how do we take actions to learn about how the world works, are the same principles in everyday science that apply to real science. And so we want systems that can do that, that can learn how to interact with new toys and devices and interfaces in a way which isn't, you know, large-scale imitation learning, right? Which is like thinking.

I'm Benjamin Crousier. I'm starting an AI research lab called Tufa Labs. It is funded from past ventures involving machine learning. So we're a small group of highly motivated and hardworking people. And the main thread that we are going to do is trying to make models that reason effectively and long term trying to do AGI research.

One of the big advantages is because we're early, there's going to be high freedom and high impact as someone new at Tufa Labs. You can check out positions at tufalabs.ai. I want to make machines that learn in more human-like ways, especially learning more abstract knowledge from fewer examples. And we've been working a lot recently on world models and on things that try to actively discover symbolic knowledge.

I'm not interested in everything people do, but I'm interested especially in the kinds of things that people are especially good at, but that the kind of AI that we're building today tends to not be so good at.

So learning from a few examples, generalizing to situations that are very different, learning knowledge you can communicate as opposed to being just embedded inside of a weight matrix, and also learning things that can cooperate between weight matrices and symbolic kinds of knowledge. Yeah, so hi, I'm Zena. I'm a co-founder and co-director of Basis. Kevin and I went to grad school together. We share a lot of the same interests broadly in trying to understand and build intelligence.

Within BASIS, we have that as a goal, but we also care about scientific and societal problems. And so I've got many, many interests, and in some ways I've built an organization so I can do all the things that I find enjoyable. So I think there's a lot of good lessons from the scaling round. We do actually want algorithms which can make use of the new hardware that we have.

At the same time, it's not the full story. I think our personal belief is that that's not going to carry us all the way. And also, pragmatically, there's other people that are clearly trying to carry that torch. And if it does get us all the way, then maybe we should kind of hedge and do things that draw on the amazing GPUs and amazing pre-trained models we have, but which also bring to bear ideas from cognitive science, classic AI, and so on. Zainab, how do humans learn from examples?

I think there's, you know, many possible answers to that. You know, I, you know, to somewhat

believe in the Bayesian paradigm, right, which is that you have beliefs about how the world works. And when you kind of observe examples, you incorporate that knowledge into your kind of current hypothesis about how the world works. But that's a very abstract theory. And the question is, how was that actually implemented? And, you know, the reality is it's probably approximate. It's probably, you know, we're not doing exact Bayesian inference. So there's a nice kind of foundation to think about how you should, you know, how an ideal agent should incorporate knowledge.

And then there's kind of practical questions about how to build systems that do that, right? And so as Kevin said,

we're very much like not anti you know large-scale deep machine learning we're building these tools we're using these tools uh i would say like the main kind of distinction between how i think of the work that we do and what we want to do is can we build things from first principles right so a lot of you know current mainstream machine learning is basically like large-scale limitation learning and that's because it's so effective right uh and it's you know producing all of these amazing tools

But both from my scientific perspective and also from an engineering perspective, can we understand some of the principles about how to build intelligent machines kind of from the ground up? And I think there are some of the pieces, right, uncertainty, causality, reasoning in general, but there's many things that we don't know, right? And so part of having a research program is to try and figure out those missing parts. Kevin, what is the importance of compositionality and what is compositionality? So compositionality has a lot of different meanings.

in different situations. The way that I think that we're using it in the work that we're doing right now is that you have atomic knowledge that you maybe learned in one situation, but then you can build bigger structures out of those little pieces of knowledge in order to extrapolate to new situations that might even be out of distribution. So I think that's really important for being able to learn in environments that

are not like totally new. It's not like you could plop yourself in a totally new world and immediately be fully competent. But if you're in a world that is changing, but where the building blocks are the same, the causal mechanisms are being reused, then compositionality and ways of having knowledge that can be broken into pieces and then recombined for new situations is really crucial.

Very cool. I mean, just as an extension to that, Kevin, I watched your amazing YouTube talk, which you published about three months ago, and you quoted Elizabeth Spellke from Harvard. She's a psychologist, and she said, the possession of infinitely many concepts that were expressible in an innate language would lead to a curse of a compositional mind. Can you explain what she meant by that? Yeah, so Spellke is wonderful.

And what she's pointing out is that compositionality is a kind of double-edged sword. And we saw this in the first wave of symbolic AI when people tried to build these kind of compositional production systems to solve problems. So the issue is you immediately encounter this combinatorial explosion in the number of things that you could represent.

And you don't necessarily have any way of steering yourself through that space toward the kinds of concepts that are probable or which make sense in your current situation. So you can kind of think infinitely many things, so things might be very out of distribution. And so you're both immensely powerful but immensely overwhelmed by possibilities.

And so I think what we're seeing now is that you can address this curse of compositionality through ways of learning to guide searches over program spaces.

and also by kind of learning the basic atoms of a compositional language and treating them as neural networks. When we design AI algorithms that kind of navigate compositional spaces, how does that work? And maybe how does that compare to how humans do it? One of the, in my view, most interesting examples of compositional languages are just programming languages, right? And so I think it's kind of

It's true and obviously true that programming languages are compositional. We build more and more complex programs by understanding the parts and then combining them together to build more and more complex parts. One of the differences between programming languages and natural languages is that programming languages are strictly compositional, whereas in natural languages there's all these heuristics, phrases where you have to just learn the thing.

But how do we build systems that build compositional structures? There's many ways. A lot of old-fashioned ways were like start with a grammar, expand that grammar, search through it, and try and find a program by kind of search for a grammar space. Now people are using obviously language models to generate programs and compositional structures. And so there's this kind of interesting...

I would say like spectrum of approaches where, you know, there's a question of like how much semantic knowledge about the compositional structure you incorporate into your model versus it's kind of purely like data driven from examples.

And we've both worked on both and continue to work on both. But as Kevin said, the challenge is if you've got a compositional structure, it's often of unbounded dimension, a very hard space to search through. And so you need smart methods to try and find the structure or the program or whatever it may be within this vast space.

On that then, Kevin, I mean, is there a principled way to find a set of primitive abstractions which can be composed together to build these high-level abstractions? I think that some kind of basis sets of primitives might be strictly better than others.

So as an easy example, so as Ana said, one of the best examples of compositionality is programming languages. I think programming languages have gotten better over time. And also, it's really easy to make a language worse, right? I could break Python in a million ways. So obviously, there must be kind of a partial order between compositional systems where some are better than others.

But I think in the fully general case, it does depend on the kinds of problems and environments with which you're going to be confronted. And I think you see this reflected in the fact that we have tons of different programming languages. We don't actually have

a single best programming language, we have a kind of Pareto frontier. In a neural network or class, particularly neural networks is an architecture, right, which implements some algorithm. You can implement an algorithm in Python, right? The Bayesian paradigm is a normative model. So it says like, what should an ideal system do?

given essentially without consideration of compute. You could also say, what would an ideal system do if you had bounds on top of compute or if you had to reason about how much compute you should use? And so I don't think of it necessarily as bottom up versus top down, at least on this axis.

But you can think about composite methods which combine different kinds of systems. So the paper that Kevin started, we joined forces to work on and complete, used a combination of different methods. And so Kevin will talk more, I'm sure, but we have an induction and a transduction model. And the induction model is then in a program, in the case of Arc, a program that transforms the input to the output.

like a Python program, and the transduction model is directly outputting the output grid given the input.

And so I think it's important to consider, again, different axes. One is what is the representation of the function? Is it a Python program or is it a neural network? And then the other is, in some sense, what is the type signature of the thing that you're finding? Are you directly outputting the final solution or are you outputting a function which you can then apply to the input? And so we've explained

explored a little bit of this kind of grid of possibilities, but I think there's more we could think about. Adil, do you want to kind of... So I like how you described it as like the type signature. So in a sense, the inputs and outputs of the symbolic and neural ways of solving problems are often very different. So if you try to solve something which is a pure transformer, I mean, inputs to outputs, like as we did within the transduction model, you're not constructing these intermediate hypotheses.

And when you have these symbolic compositional languages, like when you're doing program synthesis,

the actual type of the thing that you're trying to learn is different. So when we think about comparing neural and symbolic methods, they often kind of confound these different factors. When we did this work, we were partly comparing neural and symbolic, but also partly comparing these two different styles of problem solving. One where you look at a problem and think hard about an explicit way of solving it, a way you could

maybe verbalize in a symbolic language, either in code or in language, and then contrasting that with this more intuitive, implicit, more transductive way of making predictions. It just so happened that it was convenient to map onto this neural symbolic divide. But you can totally imagine inductive methods that kind of search for like a vector that describes how to solve a problem. And in fact, there were ARC teams that did this, and it actually kind of works, which is really cool.

So as Zen was saying, there's really kind of a grid of possibilities. And we're trying to do a very careful study of these different ways of solving these inductive learning problems. There was an amazing paper, by the way. So as I understand it, there's two LAMA 8 billion models. One's an induction model. One's a transduction model. In one case, you produce an explicit function. You do like, you know, test time inference. You greenblat it. That's my verb. I use Ryan Greenblat. I generate loads of

example programs and take the good ones. And then you've got the transduction approach where you directly compute the solution, but still with some transductive active fine tuning by augmenting the test examples. And the ensemble approach is you try and see if one of the induction functions works, if not failover to the transduction. There's this beautiful Venn diagram in the paper where you show that transduction works for some types of problems, induction works for other types of problems,

Help me understand. So interestingly, this does relate to some classic findings in cognitive science where we know that there's certain kinds of problems where if you ask people to think carefully and verbalize a solution, they actually get worse. So for instance, if you try to have people infer a rule with exceptions, or if you have people try to do more of a statistical learning task, like learn kind of associations between different symbols,

having them think a bit and explicitly verbalize a solution can degrade performance. So we know empirically this is true, and there's a really nice paper from Tom Griffith showing that LLMs also have similar kind of splits. And I think we were kind of rediscovering this within the abstraction reasoning corpus. So we are finding that there are some kinds of problems where if you have the system, kind of think hard and churn through many thousands of different possibilities and test them systematically.

that's actually worse than having the system just blur down an answer and so because transduction induction so systematically sweeping through possibilities versus just blurring out the answer um because you can check the correctness of induction

This means that you can think systematically for a while and if you fail to find anything that seems like a good explanation to you, you can just fall back on your intuition. So there's a very natural way of ensembling or combining these two methods. We don't really need a way of looking at a problem and deciding if you should think explicitly in symbols or if you should use your intuition because

we can validate the correctness of symbolic hypotheses. Is one thinking and is the other not thinking? So for example, if I do lots of transductive active fine-tuning and augmentation and I give a sub-symbolic intuitive solution versus doing thinking by generating lots of explicit functions, is one thinking and the other not thinking? I guess it depends on what you mean by thinking. What do you mean by thinking? Well, now people say thinking is what O1 does, so maybe that's thinking.

I think it's hard to say what is thinking and what is not thinking. It does seem that some core part of thinking is a step-by-step process where you go through some kind of internal mental computations, revising your beliefs. I think it's hard to put a strong circle around it. Again, I do think there's a distinction between representations of knowledge

In this case, you can have a symbolic program or you can have a purely neural connection to the system, output and answer, versus the kinds of procedure that you produce. And again, as you can combine and compose all the different possibilities,

We just talked about the fact that when you have an inductive model in this kind of terminology, we have a system that outputs a Python program. We can then use this Python program to check whether it's consistent with the training examples. And this is a very strong signal whether it's going to be a valid solution for the test example.

that is not currently true for our transduction model, right? But you could imagine a system where that is true, right? Where you could have the messiness of a neural network, but you still get the ability to go through and check is it consistent with the examples. But on the question of like, you know, is that,

Thinking, I think all of these are different versions of thinking. I think there is a kind of a slow deliberative hypothesis forming notion of thinking, which probably isn't quite captured fully within any of the methods that we used in ARC, and probably is something more like what O1 is doing, where you've got a kind of a variable time computation where

where you're not blurting out things in one step, but you're understanding how much you have to do to answer a question, which is something like thinking. Maybe a more narrowly constrained version of thinking is reasoning, where you have the purely classical forms of reasoning where you've got some axioms and you go from that to some conclusion. And then you've got the common sense reasoning that we do every day as humans to figure out how to carry out our everyday lives.

And I imagine that there's some way to think about, you know, reasoning in general, which is, you know, broadly construed as forming beliefs given our knowledge, but allows for the messiness of the world as we see it in a way that kind of classical formalisms don't quite allow.

In the ensemble approach, you biased the induction. So you wanted to find an inductive solution first and then failed over to the transduction. Does that hint that you kind of think there's something special about the functional version? I think that the functional version is more regularized. Like it's harder for it to overfit.

It's not impossible, but it's pretty tough to overfit with these modern high-level programming languages. They're just designed and they just want to express general-purpose computations. Neural networks, they can do that. Particularly, as you were saying, in the kind of large data limit, they tend to eventually learn representations that start to really capture the kind of stuff you want.

But that's not always true and often they kind of interpolate around their data points. And so I think if you can come up with a clear, explicitly verbalizable description that actually works for the problem you're trying to solve,

then that's likely to generalize a lot better than just going with an intuitive kind of vector-based interpolation. Zener, is there a better way that we could, you know, maybe in a principled way, combine transduction and induction? Several different ways. So again, I think one thing you could think about is could you have a kind of a transductive-inductive model where the underlying transformation representation is something like a neural network, but...

you can apply it kind of point-wise to each arc instance as an input, right? And this would allow you to get perhaps this maybe stronger inductive bias about is this the correct example by testing it on the training set. Maybe a more interesting kind of, I don't know if this is more principled, but more interesting space of things to explore

are more on the representation side. Like, what is it that kind of a normal program gives you? Like, you know, fundamentally, like, what is the benefit of using a Python program in some cases over a neural network? And what is, like, the fundamental benefit of a, you know, of a neural network, uh,

in terms of encoding transformations, right? So we talked a little bit about maybe the neural network can capture these messy parts of the transformation. In the case of Arc, there are some things that are hard to describe with Python programs, right? They're hard to actually, if you sit down and try and write a program, it's actually quite difficult because of just weird, complicated things that happen.

And so, you know, there's the whole paradigm of like neuro-symbolic programming, which is, okay, let's take some neural components and let's take some kind of classical programmatic components and combine them together. But you might try and think like from the bottom up, like, is there a way to restructure programming languages in general, right? If you think of them both as kind of different programming languages in a sea of all possible programming languages, and we've kind of touched on a very small number of languages within the space,

what other things exist, right? And I don't know, but I think we really haven't scratched much of the surface of different representations of computation and programs in general that we could. Part of what we're thinking about now is like, can we find not just deeper integrations, but just like, you know, more fundamental things about like, what in principle do we want from a programmatic representation versus what we don't want, right?

That's fascinating. Maybe you could expand on that, Kevin. In the DreamCoder paper, and we'll go into that shortly, it was using a DSL. And of course, we could use something like Python. And maybe you could argue that there's some kind of computational equivalence between high-level programming languages because they're too incomplete. What is the difference between these different expressions?

Yeah, so in the first generation of AI, there's a saying, you can't learn what you can't represent. And I think that, you know, to tack on that, like, Spokey quote earlier, if your computational language explodes too bad, maybe you could represent it, but it's not really, in a practical sense, learnable. So in DreamCoder and in a lot of other languages that use domain-specific languages, often you're just not able to represent all the stuff you could do in Python.

Now there's a simple fix to that. You can introduce a kind of escape hatch, which is you have some primitives that upgrade your DSL, your domain specific language, to actually be Turing-complete. And then in some formal sense, you actually do cover all the stuff that you could do in Python, but then this curse of compositionality bites you. You couldn't actually learn any practical sense. So in my opinion, having worked with these DSLs with Lambda calculus,

Python is just a lot more practical for the kinds of problems that we really care about. This is true in Arc, I think it's true when we're building LLM agents, I think it's true when we're having visual question answering systems. There's been interesting kind of convergent evolution between the design of software engineering languages, which has caused them to converge on something that's actually really great for a lot of

of problems we care about in AI. They're not perfect. And I think that in this transduction induction paper, we do see that Python is not actually covering all the stuff that we care, but it's miles better than Lambda calculus. Yeah, I think it is really important to recognize that there's an evolution of programming languages. Kevin said this at the start, right? The original programming languages

were basically just like, you know, long streams of instructions, right? And then we built in more and more forms of structure. There was a whole structured programming movement. And then we have like, you know, a variety of different families of languages that we use in modern software engineering today, you know, kind of very mainstream languages like Python, you know, maybe slightly less mainstream languages, but popular in other circles like Haskell and functional languages.

And so I guess the key thing for me is that beyond just expressing computation, which is not that hard to do, we can express computation in many different systems, we've started to build more and more kinds of systems to help us build better programs, right? Class structures, type structures, different modular structures. And they allow us to encode more about the world within our programming languages and more, you know, just more useful, right? And so if you think of this as an evolution,

One, as Kevin said, we've got to a point where they're pretty good, but also there's room for development. We probably wouldn't expect that in 100 years, everything would look exactly how things are now. And so I think an interesting question is, how do we expect the evolution of programming languages to develop? And then when you add in just AI in general into the picture, it becomes a lot harder to predict. Now we're seeing these interesting compositions of

of chat-gbt calling out to Python and Python calling out to chat-gbt, it seems to be in this moment of flux of the design space. For me, it's interesting to see how this will evolve over time.

Yeah, I guess, should we have some kind of meta programming built into the algorithm? So, you know, start with maybe an iterative process of rewriting rules and improving hypotheses rather than trying to immediately happen on the final? The answer is yes, right? I think it's hard to like,

kind of provide an argument why that has to be true, right? But at least intuitively, to me, it seems quite hard to like write exactly the right program in one shot all the time, right? And so there's some value in, you know, constructing a

model, a hypothesis, or a program depending on the context and revising that with more information. And that more information could be, oh, it doesn't work in some particular use cases or it doesn't support the evidence. And most of the kind of hypothesis development that we do as humans is of this form, both as an individual and then of science as a whole, right? Science as a whole collectively finds hypotheses and then we revise them communally in conferences like this.

So it seems natural to me to say that we should build systems that have this kind of refinement process. One of the hard things to do in trying to mechanize this is how do you guide or reinforce good refinement paths? How do you say, "This mode of reasoning or this mode of refinement or this path of refinement is the good path versus this other path, which is bad."

Right now, in let's say like modern machine learning, people are doing this mostly based on either reinforcement from human examples, or if you've got a well-defined objective function, you can back propagate the signal from the correct solution if you find it through the reasoning paths. But that's not always the case. That's not always possible. And so I think one of the big open questions is how can we build systems that

reason and part of reasoning is refining hypotheses where we don't have kind of a well-defined, easy to compute objective function that we can evaluate at the end. I think if you can solve that, you can, you know,

do pretty well. Very cool. And any thoughts on that, Kevin? Yeah, I just want to reinforce exactly what Zana said right there, which is a lot of this really hinges on being able to check that you're heading in the right direction. Yeah. So we see this in our own work that we're doing together with when we're trying to learn world models or programs for arc tasks because we have some data we're trying to fit and we can check our fit to the data.

And I think you also see this in things like O1 where, to the best of my knowledge, they're mostly training it on like math problems where you can check if you got the answer right.

And it is a big open puzzle, and I'm not sure I have great ideas here on how to do this when you can't really check if you got it right. So I'm not really sure how to do that, but it does seem very important. Interesting fact that when you're trying to learn a model of the world, you actually can check if you're getting it right. You can look at your data, what you've seen earlier, and you can say, OK, does my model correctly predict that data?

Kevin, in the DreamCoder paper, you helped pioneer this wake-sleep fine-tuning strategy, essentially where a model can dream. It can kind of expand what it knows, and then in the waking phase, it can reincorporate those dreams as hypotheses. It's absolutely amazing. And of course, you're using it in a slightly different way in this new transduction paper. But can you just tell us about the philosophy there?

Yeah, so the philosophy there is tightly connected actually to what you see in a lot of machine learning where you train a bottom-up model based on synthetic data produced by a top-down process. So often we're trying to solve in AI some kind of inverse problem. Like in vision, we're trying to look at an image and then infer the three-dimensional structure.

But the forward process of inferring the image from the three-dimensional structure is a lot simpler. That's a rendering function. So in wake/sleep, you're taking some forward process, you're imagining or dreaming possible ways it could run forward, and then you're learning how to go backward.

So for learning a function or a program, this means you're imagining programs, you're running them, and then you're saying retrospectively, well, I just imagined this thing and I did this thing. So when I see this kind of behavior, I should infer this kind of program. What Wake Sleep brings to the table on top of that, on top of just training on top of synthetic data, is that it's a kind of back and forth between learning from your own synthetic data and then learning how to make better synthetic data.

So it's not just this big batch amount thing of lots of dreams and then some money from dreams. It's that you actually wake up, you go into the world, you try to solve problems, and then you adjust your distribution of synthetic data based on the problems that you're solving. And this allows you to adjust to distribution shifts. So you might think the world works a certain way, you have certain kinds of dreams, you learn from those dreams, but then you wake up, see the world is different. And then during the next sleep cycle, your dreams shift to better match the world.

So there's really a kind of cooperation between the wake and sleep cycles, and they're interleaved inside of this kind of big batch of synthetic data. I also think there's an abstract component here in intelligence systems, certainly in all AI models. There's this iterative expansion and compression, expansion and compression. We see it even in neural networks. You actually want to be able to cover a little bit more of the distribution. And I think this is where compositionality comes in and why it's good to do wake/sleep with programs.

Because when you have a compositional language, you can take two atomic pieces of knowledge you learned earlier, you can glue them together in a way that you never actually saw, but is kind of plausible, and which you might see, and you should equip your neural network to be prepared for that composition. So I don't think this is really a thing you want to guard against. You don't want to say, oh, I need to

tightly fits my dreams to match my waking experience, you actually want them to go a little bit beyond that. And of course, it's interesting that in DreamCoder, it was an explicit process. And I suppose you could actually think of the induction-tranduction paper as being the same thing, but it was using remixing and retrieval augmented generation. And, you know, because in DreamCoder, there was this notion of a library and kind of like expanding knowledge. And now the process is becoming slightly more diffused. I mean, can you

Sketch that up. Right, right. So we're moving more and more toward ways of implicitly doing the kinds of things Dreamcoder was doing, but building them on top of large foundation models. So instead of having an explicit library of symbolic concepts that the system is learning over time,

Instead, it has code that it knows was good in the past and it uses a neural network to produce similar code. So this is effectively what library learning was doing. It was always saying, you know, I've written some code in the past, I'm going to learn some functions and that will allow me to write similar code in the future. Except now it's done in a softer, more probabilistic way. So what you, so I think actually you want to do both.

There's something really valuable about how software engineers write reusable libraries, but it's tough to really write the correct library right now with AI systems. It's tough to automatically debug not just a program with AI, but a full library of programs.

And so as a kind of halfway measure, you can use in-context learning to approximate the kind of abstraction learning you would get from a library. But in the end, I think they are complementary. And we're not ditching libraries. We're just saying there's a kind of middle ground that's easy to implement and works reasonably well.

we want to build a library of knowledge and there's an exploration-exploitation dilemma. How much entropy, for want of a better word, how much of that library do we keep hanging around? Yeah, I think one way to think about it is just as a programmer, why ever build a library function? I think there's a few different reasons. One, by having a function that can allow you to kind of

express your current program that you're trying to write more compactly. So I've got some kind of shared structure and I can reuse that in a variety of different ways within my current kind of tasks that I'm trying to solve. But another related reason is more future looking. So people build libraries not for their current program, but in the expectation that them or other people are going to use that functionality into the future.

So I think you might be able to think about and perhaps even formalize when and why you want a library. And it's something like, in the expectation of current and future uses, it's going to make my life easier. I'm going to be able to use this thing that I'm kind of caching now, right now, but perhaps sometime in the future. And we do this informally, again, as software engineers. We build things that we think other people will eventually use.

And I think the mind probably does, you know,

Something like that, in a kind of very hand-wavy way. We're building structures that we expect we're going to be able to use and that we'd like to use sometime in the future. And we have this kind of cached computation that we can reuse at a future point. So that's broadly how I would think of it. And you can perhaps even try and formalize that and cash that out in some kind of rational decision theory. Kevin, how can we test that these abstractions that we infer kind of represent the causal relationships in the real world?

You need to actually go out into the world, you need to have an agent that's in an environment, and it needs to be testing that its model faithfully describes real causal mechanisms. So it needs to have an action space, be able to do interventions, and so on. So when we're just trying to learn functions, it's kind of hard to tell that your abstractions are the right ones, right? In a sense, there might be lots of equivalence ways of describing the same function space.

But as soon as you put your agents in a world where it has to achieve a range of goals, where it has to plan and intervene on things,

then you could actually kind of falsify these hypotheses. In contrast, in program synthesis, you don't falsify a library. You just say, this is not very useful right now. Maybe it'll be useful later, but it's not very useful. In contrast, if your program is making hard claims about how the world works, that's where you can actually say, you know, this is not faithful to the true causal mechanisms.

Very interesting. I mean, we should bring in your autumn paper, Zeno, but I suppose you could say that something like ARK is non-agential. ARK is not agential in the sense that there's no, you know, it's a regression problem, right? There's no, like, interactivity. You can try and solve it in a kind of internal agential way, right? I intuitively feel when I'm solving ARK, I'm doing little mental experiments to try and, you know, look at that, test that, you know, form this hypothesis. But the actual domain itself doesn't require me to take

any kind of sequence of actions. In contrast, most of the things in the real world are not like that. Most of the things in the real world require you to actually interact with the system through time. The canonical example of just like games, right? If I'm playing in some video game, I'm taking some sequence of actions to try and explore that. So the autumn paper, which was, you know, the first student was a graduate student called Ria Das. It was a joint paper with Ria, myself, Josh.

And the goal there was to build a system that could synthesize essentially the source code of a video game after observing interactions with that video game. And the background idea or the background proposition is doing so is a kind of science, right? If you can observe some dynamics and you can infer the source code of the underlying world behind the dynamics, that's something like understanding, that's something like building a model of how that world works.

And so the paper that we kind of did there was to build a little DSL for a certain class of

interactive games, if you will, the way in which they're not games is that there's no external reward function. It's just like an environment which a system can interact with. And from kind of traces provided by human of these games or these environments, the autism, you know, synthesis method infers your underlying code. And one of the key things we wanted to explore there was what you might call latent estate, right? So things that are true about the world, but you can't directly observe.

And the real world is full of things like this, right? Pretty much everything has some hidden state that we can't observe. And this hidden state is often very complicated, right? And dynamic, right? And it's probably best described by some kind of program evolving over time. And so kind of a key contribution there is like, can we infer this hidden latent state in addition to the entire kind of full program? But one thing like maybe going back to the previous point about abstraction is in that work, you know, the programs or the models that we kind of inferred

are not abstract, right? They're kind of ground truth models of the world. And we don't really think this is how, you know, thinking, human thinking works or even could work, right? Like there has to be some abstraction. There has to be some parts of reality that we omit or discard from our models.

And so, you know, kind of a big question is like, how can you infer abstract models? How can you infer models that kind of omit the right parts of, you know, of the world in order to, you know, be a kind of a practical and useful thing? So we didn't really explore that within Autumn, but this is, you know, kind of very high on our minds of things that we want to explore and not just us, other people in the community too. So we think in abstractions, you know, she's on top of the world and...

there must be some kind of a hierarchy of abstractions. And when we're dealing with perceptual input, for example, I mean, how do we kind of navigate that abstraction hierarchy? So one thing that's interesting about how people think about problems at different levels of abstraction is that the abstractions are often defined on the fly for each problem. There's not one ground truth abstraction.

The world gives you data and you could do the kind of Sora or Genie type of world model where you truly model the full data, you capture all the pixels, or you could do the kind of thing that we're advocating for where you deliberately discard a piece of information. And when you do that, I think the problem just becomes under constraint. In order to introduce those extra constraints that are needed to tell you what abstractions are valuable,

the easiest thing to do, and something that certainly works, is to introduce reward. So if you give a reward signal in the environment, then you can say that a good tower of abstractions, even if it's on multiple levels, is one which allows you to plan to achieve reward.

You see this in Mu0, where they're learning kind of an abstract world model that doesn't, it's not fully generative, it just predicts reward, policy, value, and so on. In some of the work we've done recently, we had these kind of simulated robot environments where robots interacting with an environment to use like a tool or a mechanism, and it sees pixels, but then it tries to define some abstractions on top of that.

and the abstractions deliberately ignore a bunch of details. So this was work led by Yichao Liang. It's called Visual Predicator. And it's, you know, I was taking this kind of museo perspective that someone gives you a reward signal. What I think is really interesting is that even if you don't tell someone what the reward is,

someone can still play with a new object or a new web app or a new appliance and form an abstract model. We're still thinking about exactly how that would work in a program synthesis context. A lot of the collaboration that we're planning right now are kind of trying to answer that question, but I think it's very open. And if you don't have reward, clearly humans can still figure out

abstract models that, as Zenno was saying, omit a bunch of details. But it's trickier and it might connect to certain normative theories of like intrinsic motivation or it might be something like you want to be robust to a wide range of possible reward functions. There's a bunch of possibilities here. Is there a principled way though of kind of in the situation detecting which is the best level of abstraction? Maybe. The framework that I find to be quite compelling is

the framework of resource rationality. And this is basically saying that you should try to do the best you can with the resources that you can. And so maybe that's a slightly convoluted statement, but the idea is that you have some kind of belief, some distribution of a possible tasks or uses of a model. And you have kind of computational constraints. You can't run things forever.

And so I think a way in which you can kind of cash out this question of like when you know When should you choose the right abstraction or what you know, what kind of abstraction should you construct is to say well Again, I've got some beliefs about how I'm going to use this model The questions that needs to allow me to answer the task that allows me to do These incur computational costs and so I should consider all of those things and do like the optimal thing.

And another axis I think is important, Kevin said this kind of tower of abstractions that

All of these are different models that are useful for different things that you need to do with that model. There's no single correct answer except, you know, physics, right? And so I think a key goal for us is to say, well, let's embrace that plurality, right, and try and find representations of models which incorporate

a plurality of different models within them, right? So I've been using this term polystructural to kind of capture this idea. We'll see if that, you know, if that term, you know, sticks. But whatever you call it, like we need to encode multiple different models of reality and the relationship between those models, right? As a human modeler, you know, let's say like a formal scientific modeler, let's say I'm modeling COVID, I can say, well, um,

I don't know, like hair color doesn't matter, right, in my COVID model, right? But this relationship between the model and reality is encoded in my head as a scientist. We want that relationship to be within kind of the computational formalism itself. And that is, in my sense, like a hard thing

just like scientific computer science question, which I think hasn't fully been explored. Maybe it will just emerge from scaling data. I don't know. It's kind of a question of whether we have to build these things in or whether they'll emerge. How can we automate this process of epistemic foraging? So I think what we want to avoid is we want to avoid

building kind of like these Frankenstein systems where we are hard coding a whole bunch of different knowledge representations and heuristics for reasoning with those representations. Instead we want something which looks more like rational analysis from first principles.

And when you do that, you do immediately run into hard computational problems. You get a big search space. It might even be hard in the inner loop to evaluate how good a model is or how good an abstraction is because you need to then retrospectively say, well, would this be good for the kinds of reasoning tasks I expect?

So that computational problem is a place where I think it'd be good to insert learned neural networks that have good intuitions about, you know, everyday common sense of fractions and so on, which can propose them. It can say like, you know, this code would be valuable probably in this situation. But whereas proposing a bunch of different alternatives, as you were saying, the kind of Greenblatt style,

like more as a heuristic, so that we can still have this first principles way of saying, you know, this would be a good collection of abstractions for the kinds of stuff I expect. But then we can still take advantage of the kind of prior we get from pre-trained neural networks. I suppose the broad question is how much human engineering and seeding is required? Yeah, so I think if you look at kind of the history of, let's say, Bayesian computational models of cognition,

a lot of which was done by psychologists and cognitive scientists like Josh Tenenbaum and others. There's a really compelling and strong history of expressing some prior knowledge and showing that humans do something like approximate posterior Bayesian inference conditional on the data.

But often in those cases, as you're kind of alluding to, it's required smart people to look at the world and say, "Okay, here are the inductive biases. Let me kind of encode these into the system." And I think the bitter lesson of AI is when you can, you shouldn't encode explicit inductive biases. This will lose out relative to learning these from the data if you've got lots and lots of data and lots and lots of compute.

So in my mind, there's something kind of obviously right about the fact that you need priors and you can incorporate data to revise your beliefs. And again, I think Bayesian theory is a good normative theory for that. But that doesn't mean you have to, you know,

adhere to the classical tradition of explicitly encoding these inductive biases in. And so I think there's a potential paradigm of saying, well, let's encode priors, but let's try and learn these as implicitly as possible. Where do you learn these from? Well,

It could be something like the standard paradigm here in modern machine learning, where you're learning these biases from large corpses of data. They could be kind of richer corpses or corpi, corpses of data than just internet data, right? There's all of the things that you do as a human. I can observe you and infer some of your beliefs.

And so you can imagine richer sources of data than just internet data that would allow us to get closer to the inductive biases that humans have. So I think it's tricky, as Kevin said, to actually implement these systems. You kind of face hard computational problems.

But I do think, broadly speaking, doing inductive inference over large corpuses of data to learn implicit inductive biases as opposed to explicit hand-coded is a promising path to pursue. Kevin, maybe we should have started with this. What is an abstraction? Well, it means different things in different contexts, for sure. But there's always an element of hiding details.

So in programming languages, abstraction is often synonymous for a lambda expression. So it's a function, it has variables.

and it's ignoring what values those variables take on. And that is a sense in which a Lambda abstraction is an abstraction. In the kind of stuff that sometimes called like causal abstraction and causality, there's also it's, as I was saying, like kind of analogy or relation between two different causal models. And the more abstract one is the one that's ignoring details but still preserving some kind of essence of what the underlying causal model is.

So it's a word that means different things in different situations, but the kind of analogy we hear in all of them is that there's some hiding of details, but some retention of the essence. What if we had a richer ontology to start with? So, you know, we're using like these, you know, like symmetries, rotations, translations and so on. You know, what if we started doing some galaxy brain stuff like, you know, causality and time and just put some different basis functions in there? Do you think that could have an uplift?

I guess one thing that's true at the moment about the primitives that we put in is that they correspond directly to transformations to the arc grid. Conceptually, I think it'd be cool to add, as you said, these galaxy brain principles, but what do they correspond to in terms of the actual transformation that we're trying to construct?

I think one actual promising area for kind of new Arc approaches is precisely abstraction in the model, let's say the transformation program itself. And so what do I mean by this? So right now, our approach and many approaches synthesize something like a Python program or literally a Python program. And then we apply this Python program to the input to get the output, right?

um but that python program is fully formed right it's not abstract it has all of its details uh there right you can run it and that's great because you can run it right you can see if it if it works intuitively when you solve an art problem uh at least for myself i can't speak for other people uh you first find some abstract part of the

rule, right? You're like, well, I know that this object translates into this other object, but I'm not quite sure like what the actual color transformation is, right? For example. And then I can go from there and say, okay, what could be the actual color transformation? So, you know, conceptually you can think of it almost like you write a sketch of a program in your mind with some holes, and then this gives you a direction to try and fill in these holes, right? You might fill them in one way, evaluate, and then go back and say, that's not quite right.

We don't quite have at the moment, just like in terms of the actual methods that we're producing and other people are producing, these abstract program representations. And I think that's something we could actually build,

representation of arc transformations which has not all of its detail filled in but it's still useful as a partial solution on the way to a full solution and so I think a lot of I think there's a lot of potential approaches of that form where we've enriched the knowledge representation and this isn't quite what you're saying of like building a new kind of ontological ideas it's like saying well we could actually abstract our current

representations, and that could be a powerful thing to do. Kevin, when you solve arc puzzles, can you talk through your kind of conscious strategy? Yeah, so science is very intuitive, and I can't quite...

describing words exactly what I'm doing precisely and it might be something more like well I just kind of denoise the input and imagine like what it should be in the parts that I can't really see. So there's some things that are definitely just perceptual and difficult to say precisely except saying well you know you just kind of see, kind of denoise it. Other times it is a very like systematic thought process. I kind of jumble up in my head different ways of seeing it.

I see if it looks like it's on the right track. I have sort of half-formed hypotheses. It's a much more perceptual and much more dynamic process than just the greenblatt style of spamming out thousands of programs, which, I mean, to be clear, we do that also.

So I do think it sounds a bit dangerous to introspect too heavily, a little misleading. But I think even if you look at the kinds of mistakes that people make, they don't exactly like the kinds of mistakes that these AI systems make.

And that means that maybe there's something about the dynamics of how we're constructing the solutions that we're not really capturing with any of these approaches. Well, I suppose just comparing transduction and induction, one thing I think is good about the induction is that it's more compositional. So I could mix and match programs together. It doesn't intuitively make sense to me what would happen if you composed the transduction model. It kind of feels like it wouldn't compose very well.

but we could take this convolved program and instead just think of it as functional programming. So like a data flow graph or something like that. And having those data flow filters as first class in the algorithm seems like a good step to me. Yeah, so you could imagine almost like iteratively applying the induction model and the transduction model in sequence, all parts of it, ensuring that the types match essentially.

And you can also imagine almost like a, like a REPL style approach to trying to solve ARK, right? So I suppose that, you know,

you were given an Arc solution, also an Arc problem, and you had a REPL, like an interpreter environment, and then you could write code, write evaluator, check, write more code. So instead of creating one big transformation that you run, you do it in a more step-by-step process where you're continuously analyzing your current solution, writing some more code, checking. So I imagine there's some approaches to Arc, and we're pursuing some of these, which look a little bit more like that.

where you're doing a step-by-step process where each step is producing code, and that code could be normal Python code or it could be a transduction-like transformation, ultimately to get to a solution at the end.

Yeah, I love that idea. I mean, I think there's something powerful about iteration, something really magic about sort of like refining a solution over time. What do you think about that, Kevin? I strongly agree. So I think that in some sense,

You don't need it maybe so much in ARC because you're solving just one problem at a time. But if you think about agents in a world that's learning how to interact with many different causal mechanisms, then your agent needs to accumulate knowledge over time. It needs to revise its beliefs.

And so if you had something which was more factored, like the DAG that you were saying, or just anything that breaks up the knowledge even more compositionally. So it's not just one program, it's lots of little programs that are all cooperating.

then I think it would probably be better at Arc and also would be closer to what you need for something that can grow its knowledge over time. So Zenas, some of these solutions to Arc, we had Greenblatt and of course we had other people with Wending and Kevin. Essentially there's this expansion where we do loads and loads of test time computation. And I think in your paper, Kevin, you kind of justified it as like amortized dreaming

I guess the question is, is it in the spirit of ARK, right, to be doing this massive expansion and all of this computation? Is that what Cholet wanted? Well, I think we have to ask Cholet what he wanted. Maybe we should. But what would you think he would say? I think, you know, Cholet said several times that ARK is an imperfect machine.

benchmark. I think there are ways to try and solve ARK which don't necessarily lead to the fundamental insight that you might want or that Sholei might want. And there are ways which are more fundamental. And I think the approach that we've taken is like a mixture of both, right? We've certainly got some ARK hacks in there to try and make it work. And there are also certainly some fundamental ideas in there that we're trying to pursue. And so, you know, in terms of

this kind of expanding horizon, I think the more that you're trying to specialize to the particularities of Arc and build a DSL by going through and saying, "Okay, this is a useful element of Arc problems. Let's include that. This is another useful element." I think the more you're diverging a little bit from the essence of the intention of Arc, but there's an open question of how much of that is necessary, right? You need inductive biases to solve Arc.

they have to come from somewhere. And so I asked Francois Chalet this several months ago, do you think that a kind of a tabular raster approach could solve Arc just from the examples within Arc? Is there enough knowledge, information in the Arc dataset as a whole? He thinks there is. He thinks that if you condition on all of the

ARC problems and that's sufficient to solve ARC. You don't need to pre-train on the internet of data, but it's empirically true that the best solutions are at least partially pre-trained on the internet of data. So yeah, I think the ideal solution to ARC would be simple and elegant and wouldn't require lots of ARC related hacks and tricks.

And I think a robust way to try and get there is to kind of

introduce other problems which are kind of related or in the spirit of Arc or capture some of the same things but are not quite Arc and kind of force solutions which work on Arc and this and that to kind of push against kind of Arc-specific kind of domain hacks that humans might encode. But Kevin, if you could design a better Arc, what would you do? Well, that's what we're trying to do. It's upsets.

Yeah, yeah. So this is part of this new project, Mara, that Zen and I are doing. So we want to have something which is in a lot of ways in the spirit of Arc. So we're trying to learn something from very few examples and then generalize new situations, but where you get to interact with something. So it's not quite an MDP. It's not like reinforcement learning. It's more like a model building exercise.

And I think that this makes it a lot harder in some ways to just generate synthetic problems, which we did, because it's a little trickier to generate lots of synthetic interactive environments. You could do it, and I'm sure that we will try to do it.

But it's at least one way of introducing a forcing function that causes you to not overfit so much dark. Zainab, philosophically transduction and induction seem like jewels to me. They seem like, certainly from an expressibility point of view, the function space is the same. So why do we see empirically differences between the two classes? A lot of it just comes down to representations of models or transformations.

Again, I think programming languages are a good way to think about it, right? Where a neural network is a program in a class of programs, which is the class of neural network programs, right? And a Python program is obviously a program. And for any particular representation, there are some things that are going to be easier to encode and some things that are going to be harder to encode. If the languages are universal, then everything is possible, right? But some things are easier and some things are harder.

And it turns out to be the case that at least in some class of arc problems, like some are easy to encode as Python programs and some are easier to encode as direct neural transformations.

But again, I think it is important to separate two different distinctions within transduction and induction. One is what I'm just talking about now, the kind of programmatic representation. And the other is, what is the type of the objects, the type signature of the object that you're constructing? Is your system producing a function which takes its input and arc instance and outputs an arc instance?

Or is it a function that takes as input the entire arc problem and then directly produces the solution, right? And I think, you know, we have somewhat confounded those two things within the submission, but you could separate those things and explore the different kind of combinations. And I think that would lead to different trade-offs in different ways. So, yeah, you know, I...

I think maybe to be a little bit more concrete, I think obviously Python programs are good at expressing loops, deterministic computations, things where you've got like maybe even unbounded or variable bounded number of computations, whereas a transformer model is like a finite model, right? It has one pass, right? And it produces an output. And there's a lot of interesting work showing that this corresponds to a particular class of computations and there are things that you can express within that and things that you can't.

So yeah, I think a lot of it comes down to representation of programs. Yeah, interesting. Because I suppose, Kevin, philosophically they're duals, but as Zeno was just saying, from a computational point of view, one is a transformer, so it's a finite state automata in the class of automata, and a Python program is Turing-complete.

But they are fundamentally different. But there's this weird kicker that the types of things that this limited form of computation a neural network can do can express programs that no human knows how to write. Yeah. So that I don't really have a theoretical handle on. I can't.

justify theoretically why a neural network should be able to do computations that are you know really tough to do in Python. It's an empirical fact. I you know obviously Python can do things that neural net is gonna really struggle with so as I was saying like in the paper we found things like you know counting or we need to like systemically the same thing every single object like Python's great for that and then

The other stuff, it's hard to really theoretically justify that. I can just say it's empirically, it's definitely true. Like we ran the induction transduction model with different random seeds. We wanted to make sure this was not just an artifact of randomness. It really is just empirically the case that there are certain kinds of arc problems, I think many other kinds of problems where Python in principle could do it, but a neural network is just much better for the job.

I think it was both of you, I'm not sure, but there's this program induction by example paper which was at NeurIPS. I spoke to Winding about it. Are you both on that or was that just you Kevin? It was me and Winding. Oh amazing. Tell us about that. So that paper in some sense was our first attempt at trying to do something that was like wake sleep in the DreamCoder style but in a modern LLM setting. So what it does is it starts with some human written programs.

on the order of 10 to 100. And this defines implicitly a generative model over code because you can prompt an LLM with these example programs and then it will make up similar programs. So this means we have essentially a forward model. We can imagine programs and this gives us the dreaming phase of sleep because we can imagine these programs, run them, see what they do, and then train a program synthesizer based on these programs that we generate.

And we found that just doing this was pretty good at program-sensitive problems. It was frankly a lot more effective than most of the kind of lambda calculus stuff that I did in my PhD. And it substituted a lot of symbolic machinery with neural machinery. And in doing so, it just was able to take advantage of a lot of the advances in scale that we've had in recent years.

So that thing that I've described so far, where we make synthetic data and then try to synthesize on it, that was what gave us the induction model for the ARC paper. In the paper with Wen Ding, we also introduced a full wake-sleep cycle, where it then tried to solve problems, and then it remembers those solutions, and then it dreams about variations of those solutions. And this means that if your prior is mismatched to what you really care about,

So imagine you only think you need to write short programs, but the real world has long programs. Then you can pre-train on short dream data, go out and encounter the harder problems, solve a few of them, and then during the next sleep cycle, you'll fine-tune your model to this new data distribution. It doesn't have the library learning component of Dreamcoder. It has this kind of softer neural in-context learning type of generative model.

But we're trying to bring this wake-sleep cycle into program synthesis, but in a more kind of modern, more scalable setting. So one thing I'm seeing here is pragmatism prevailing. So the connectionists are embracing hybrid models and neuro-symbolic and so on. And maybe even like folks from your camp, you're kind of embracing connectionism as well. What are your reflections on that? So I think there's, as Kevin said at the start, there's

There's something obviously right about scale, right? And there's something obviously right about learning, right? Like a connected architecture allows you to learn from large amounts of data. And there's something obviously also right about symbolic architectures in the sense of most of our modern society is built on top of them, right?

And I think there's also something right about general normative principles of intelligence. So given that, then the question is, what should you do?

And can you take the things that we know to be right and kind of compose them in a way that makes sense? And I think you're seeing that from a variety of different ways, right? So from, you know, let's say, you know, the large labs, right? Like, you know, OpenAI, they have, you know, large models that they've trained on huge amounts of data.

And again, but they have these kind of scaffolds around them that these models call out to Python to do some computation. And why is that? Why is that a smart thing to do? Because Python's an effective language for doing a certain class of computations, much more effective than a neural network for a large class of things, right? And so they've come to a kind of

business decision, right, that it's useful to have this kind of hybrid system, right? From the other perspective, right, you know, there's a kind of a history of symbolic systems, and it's, you know, from a different perspective, I've realized that, you know, you can't model the entire world through a list of propositional formula, right? The real world is a lot more messy and complex than that. And so we want to do kind of nice reasoning, but we have to handle the complexity of reality. And so this kind of immediately leads to

architectures which can do both of these two things. And so I think just both the complexity of the world and what we want from, say, reasoning and learning systems is leading to kind of a convergence of ideas. Right now, I think that convergence is kind of composition, right? Taking these systems,

kind of plugging them together. And I think the real question is, can we do better than that? Can we go from the ground up and re-engineer systems that have the functional things that we want, but may not look architecturally like the systems that we have today? Yeah, I'm very excited about that. Building systems with LLMs, with the best components of everything. Zener, your building basis, can you tell me about the Everyday Science project?

Yeah, so Project Mara is a project within BASIS. We have a few projects within BASIS, but Project Mara is a new project and it's led by myself and Kevin. And so this is a three-year program where we're trying to build upon many of the ideas that we've been talking about today. Really focused on...

I guess four components, right? Which is what MARA stands for: modeling, abstraction, reasoning, and agency or acting, right? And so, you know, as a kind of a first approximation, you could think of it as like

active arc, right? Like can we build systems that model the world, abstract the world, reason with these models or reason to find these models, but they do it in an interactive way, right? They have to kind of be part of the environment and take actions to learn about how that environment works. And so the way that we've structured it is to do two major class of things. One is to develop new benchmarks and new kind of problems to solve.

and the other is to develop new algorithms to solve those problems. And the first thing we've done is to take an existing benchmark, which is Arc, and do our best effort to try and solve that, right? And we'll continue to do that, most likely, but we're also developing new benchmarks. And so within this kind of high-level goal of building what you might call like a general Mara system, right? A system where you can plug it into the world and it takes actions to learn about the world, build an internal model,

we're focused on a narrower subset, which is what we call everyday science. And again, we've discussed this today in various contexts, but the intuition is, you know, there's real science, right? That's what we do as, you know, chemists and biologists and computer science, you know, learning about the kind of the physical and the artificial world. But then there's also everyday science, what we do as just like normal humans, adults and children, right? We learn about, you know, a new AC system in my hotel or a new microwave, right, or a new toy.

And we think that the principles underlying this everyday science, like how do we form hypotheses? How do we revise those beliefs? How do we take actions to learn about how the world works? The same principles in everyday science that apply to real science. And so we want systems that can do that, that can learn how to interact with new toys and devices and interfaces in a way which isn't, you know, large scale imitation learning, right? Which is like thinking, going to your earlier question, right?

as part of a kind of a, well, I would say like a real approach to AI to science, even if it's not actually discovering new kind of useful science. And so, yeah, so joint projects and ARP was our original or our initial kind of output, this ARP solution, this transaction induction model.

And we're just at the start of kind of planning a whole program and building a whole team. And maybe Kevin can kind of give us his flavor and interpretation of what we're doing. This is very Arc-like in the sense that you're building a model on the fly from very small amounts of data.

but you are not passively receiving the data. You have to go out there and poke things and push things and try things out. So Zen and I both did grad school in the cognitive science departments, and I think as a cognitive scientist, well, cognitive science adjacent, it's very exciting from that perspective. Like this is the kind of thing that is science-like that humans do in everyday life.

But I think it's also practically important because increasingly we're building these AI agents, both in digital and physical worlds. And they work well as long as

Their prior is well aligned with the kind of environment they're already in, but when they're faced with a new kind of webpage, or if you imagine like a robot that has to figure out how to use a new kind of dishwasher and has to experiment with different buttons, that's actually quite hard. The more abstract kind of knowledge you need to learn, that I think is a really exciting challenge. Before we go, I mean, are you looking for investors? Are you looking for researchers?

We're certainly looking for researchers. If you're a research scientist, a research engineer, and you want to work on hard and interesting problems that are a little bit outside of the mainstream of what some of the larger labs are doing, then we'll get in touch. Investors?

That's a little bit more complex. Certainly, you know, BASE is a non-profit and so if you want to donate, feel free, you know, feel free. But you know, the project is funded and it's, you know, we have a kind of an ambitious three-year program. We can obviously make it more ambitious, but I think that would already be quite hard.

But yeah, we're just excited to get things started and move quickly and find the best possible people to work with. So also, other collaborators, if you're doing adjacent things, that'll be a cool area to connect with. It was such an honor to meet you guys. Thank you so much and keep doing the great work. This was fun. Thank you for having us.

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares) 01:16:55 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)