We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How Do AI Models Actually Think? - Laura Ruis

2025/1/20

Machine Learning Street Talk (MLST)

AI Deep Dive Transcript

People

Laura Ruis

Topics

我研究了大型语言模型在推理任务中的表现，发现其性能提升不仅源于规模的扩大（能够记住更多类似内容），更重要的是模型学习到了一种更有趣的、与数据量或参数量相关的质变。我的研究使用了影响函数来分析预训练数据对模型推理的影响。结果表明，事实检索任务的影响函数得分集中，而推理任务的影响函数得分分散，这说明推理依赖于更多更广泛的数据。此外，在推理任务中，相同类型的文档对不同问题的答案影响相似，这支持了程序性知识的观点，即模型并非简单地检索信息，而是综合运用多种知识来解决问题。代码在大型语言模型的推理过程中具有显著影响，这可能是因为代码中包含了大量关于程序和步骤的描述性信息。代码的影响既有正面也有负面，目前尚不清楚其具体机制。我的研究结果表明，大型语言模型能够从代码中学习步骤式推理过程，这为数据合成提供了新的思路。大型语言模型的推理能力并非始终如一，而是存在多种模式，有时是基于检索，有时是基于推理。模型的推理能力受限于其自身特性以及输入数据的限制，但这并不意味着它完全缺乏推理能力。数学推理能力可能可以迁移到其他类型的推理，但数学推理只是推理能力的一个方面，其他类型的推理，如归纳推理，则更难评估。

Deep Dive

Shownotes Transcript

Translations:

中文

If I understand correctly, you created queries that resemble reasoning and ones which resemble some kind of like, you know, fact retrieval. Yeah. And after that paper, I was left with this question, like, to what extent

Is the scale that makes these models get better at these tasks? To what extent is that driving the performance or how is it driving the performance? Is it just that the model is seeing more similar stuff and therefore can memorize more? Or is it really doing something more interesting and learning something sort of qualitatively different from more data or with more parameters? If language models are doing something which is akin to approximate reasoning,

What's the difference between that and formal reasoning? So I believe they're in very controlled setups. We have already shown that connections models can do formal reasoning. So I think empirically and theoretically we have shown that they can do a form of systematicity or symbolic computation, although it's still limited. But the question

with my most recent paper was can it also learn to do something in that direction approximately from data in the wild. And I think it can and my paper doesn't show that exactly. It just shows that it's doing something generalizable that it can apply to many different questions but intuitively I think it would be possible.

I guess you would agree that agency could emerge even if we're not explicitly trying to make it. Yeah, I think that's the interesting case. So there's this definition from Zach Kenton from DeepMind. They also have a safety interest in agency and a couple years ago they made this definition of agency that an agent is something that changes its policy when its actions affect the environment in a different way.

You can kind of trivially make a system of LLMs in an environment or something where the environment is also an LLM such that it adheres to this definition. So I think the important thing is like when does something like that emerge from something as simple as next token prediction? And that's kind of what I'm interested in. So Tufa Labs is a new AI research lab I'm starting in Zurich. In a way, it is a Swiss version of DeepSeq. And first we want to investigate

So LLM systems and search methods applied to them, similar to O1. And so we want to investigate reverse engineer and explore the techniques ourselves. MLST is sponsored by SenseML, which is the compute platform specifically optimized for AI workloads.

They support all of the latest open source language models out of the box, like Lama for example. You can just choose the pricing points, choose the model that you want. It spins up, it's elastic auto scale. You can pay on consumption essentially, or you can have a model which is always working or it can be freeze-dried when you're not using it. So what are you waiting for? Go to sentml.ai and sign up now.

Laura, it's amazing to have you on MLST. Welcome. Thank you. Amazing to be here. Can you tell us about yourself? Yeah, sure. So I'm Laura. I'm a PhD student at University College London, supervised by Tim Rocktashel and Ed Grafenstetter. And I'm also part-time at Cohere.

And I'm broadly interested in understanding language and its relation to human cognition and how we can evaluate that in artificial intelligence. To what extent can like pillars of human intelligence also show up in artificial intelligence? Things like reasoning, both mathematical reasoning, social reasoning, that kind of stuff, and also physics.

specifically trying to understand how state-of-the-art models are doing what they're doing. Very cool. I'm a huge fan of Coheir, Ed and Tim. Nice. So this is very cool. Me too. Okay, so you've just written a paper. Now there's this huge controversy. I've been speaking with Sabah Rao, for example. He calls LLMs...

approximate retrieval engines and O1 approximate reasoning engines. So he is saying they're doing a little bit of reasoning, whatever that means. But you've written this paper. It has generated loads of interest on socials. Procedural knowledge in pre-training drives reasoning in large language models. Give us the elevator pitch. Yeah, so I was doing evaluation in language models, trying to understand how they were doing social reasoning before.

And we designed a benchmark and evaluated models on their social reasoning skills.

And after that paper, I was left with this question, like, to what extent is the scale that makes these models get better at these tasks? To what extent is that driving the performance or how is it driving the performance? Is it just that the model is seeing more similar stuff and therefore can memorize more and seems to have more capabilities? Or is it really doing something more interesting and learning something sort of qualitatively different from more data or with more parameters?

And of course, the way we evaluate machine learning methods in the past is we separate test from train, but that's not possible anymore these days because models are just trained on everything. Test is in train now. So we wanted to understand when language models are producing zero-shot reasoning traces. So for example, for simple arithmetic, it can produce the steps to reach an answer.

Is it kind of relying on having seen those exact steps before in training? Or is it doing something generalizable? Is it sort of taking the steps itself and getting to the answer?

And that was like the motivation for this paper. Very cool. So you used influence functions to do this analysis. Can you explain what they are? Yeah. Yeah. So I was very happy when I stumbled across that tool because it's this method from robust statistics that tries to answer a counterfactual question about the model.

So the question it tries to approximate is: what if I have to take this pre-training document out of the data set and I retrain the entire model? How does the behavior change? How do the model parameters change? And with that, the log likelihood of completions. And that's what influence functions estimate, and that's the tool we use to determine how pre-training data determines reasoning steps by models.

Very cool. So if I understand correctly, you created queries that resemble reasoning and ones which resemble some kind of like, you know, fact retrieval. And you kind of compared what the influence functions did on those queries. Yeah, exactly. So we used this factual task as a sort of grounding because influence functions are very approximate. We don't actually retrain the model for every data point because that's going to be too expensive.

So you want to have some kind of idea that what you're finding actually makes sense intuitively. And factual retrieval is a natural task for that because for those factual questions, the only way to answer them is to retrieve the relevant documents. And we compare this to the influence scores for the reasoning traces. And those tasks are simply like the zero-shot reasoning kind of prompts and the model generates the reasoning steps itself.

So if the model were to be doing retrieval for those types of reasoning, it would really have to retrieve each reasoning step from the pre-training data because it outputs zero-shot itself, the reasoning traces. I don't give it any examples.

So if I understand correctly, like the way the intuition behind your work is that when we are doing, you know, fact retrieval, it seems quite focused. So it's just going to a document and it's retrieving the fact. And when it's doing reasoning, it seems very diffused. It's looking at loads and loads of documents that have reasoning like processes in them. Yeah. Yeah. That's sort of the abstraction you can take from it. But of course, in reality, even when it's doing factual retrieval, there's much more going on.

because it needs to adhere to syntax. There's all kinds of stylistic elements that are going on. But...

Still importantly, I think the most striking finding from this paper to me was that for factual retrieval, whether a document is influential for a factual question is not predictive of its influence for another factual question. So they rely on very distinct sets of documents. Whereas for a reasoning question, if it underlies the same task, if it's both, for example, calculating the slope between numbers, but for completely different numbers,

the influence over the documents is very similar. So the same documents can influence these questions in the same way. And that we didn't see for factual retrieval. And that is really the basis for why we call this procedural knowledge. Very interesting. So for the folks at home, like an example of a reasoning task, it'd be like two-step arithmetic, calculating slopes, solving linear equations. And factual retrieval might be something like, what is the tallest mountain? Yeah.

Yeah, so what is the tallest mountain? What is the largest ocean? In which year did the Bienecker Library open, which is the Yale Library? Those are examples of factual questions. And then we have three different reasoning tasks. One is simple two-step arithmetic. So you can imagine seven minus four times eight. That's like...

two-step arithmetic you first have to calculate seven minus four and then do three times eight calculating the slopes requires i think more steps three steps you have two different points in a in a 2d space and you have to calculate the difference between the y points and x points and divide them by each other to get the slope between two points and yeah and then the linear equations task is you have a linear equation and you have to solve it for x

which also requires three simple arithmetic steps. So what we are observing then is that when we're doing reasoning tasks, the models are synthesizing knowledge in some kind of abstract way from all of these documents. Yeah. Is that reasoning?

I would say yes, first of all. But I'm not so restrictive in what I call reasoning. As in, I don't think only formal, step-by-step logical reasoning is reasoning. I think deep neural networks can do that kind of reasoning, but our paper doesn't show that that is what's going on here. But I think the important point is that it is seemingly...

taking knowledge from many different documents and applying it to the same task. So that's a generalizable strategy, and it's using that to generate step-by-step knowledge that solves some kind of problem, and that is reasoning to me. But that doesn't mean that it has a lot of any bearing on other forms of reasoning, like inductive reasoning, for example. Yeah, I mean, it's...

I suppose we can go into what knowledge is. Dago and I had this discussion. It's said that it's a justified true belief, but he said it's a justified useful belief. So you could say in some sense that the templatization of the information in these documents is kind of like creating useful knowledge. Yeah. And what is this distinction between useful and true?

Well, I think it's about whether we can know something is true just based on a bunch of data in a corporate. There's always this epistemic gap, isn't there? That like, can we have models that actually give us facts? Yeah, yeah, true. The really interesting thing

from reading your paper is that you found that when doing reasoning, like the documents that were, you know, things like stack overflow and code and stuff like that, that was really like almost had a lot of influence on the reasoning process. And that's weird, isn't it? Because it's code. It seems different. How do you think about that?

That's a good question and I spent a lot of time looking into those results. I really spent days trying to understand what was going on there because I think importantly we find a lot of evidence for documents influencing similar reasoning questions like one document influencing many slopes questions and another document influencing many linear equation questions. But the only documents that seem to be influential both positively and negatively for all types of reasoning is code.

And I tried to look into what about code makes that it's so influential.

And I couldn't find any patterns. And importantly, we don't only find that it's both, that it's like good for reasoning, but also bad for certain, in certain cases. So it was, of course, conventional wisdom that code helps downstream capabilities. OpenAI knows that. Entropic knows that. They initialize their models with purely co-trained models. But we don't really know what's going on there. And that's essentially what I'm working on now. I'm trying to understand that better because...

Yeah, I couldn't clearly find patterns in the data that we found in this paper. It's weird, isn't it? Because code feels like the perfect materialization of human cognitive processes, right? You know, we're solving problems and then we manifest that in code. Does that have implications for how we design datasets for training these models? Yeah, yeah, I think it does.

The trend is adding more and more code into the pre-training corpus for models to be trained on. So I think it definitely has implications for that. I think importantly, a thing that we find in this paper is that it seems like the model can learn to do these step-by-step reasoning traces to output them from descriptions of procedures in code that are purely descriptive.

So a piece of Python code that calculates the slope between two points is highly influential for actual questions, prompts, asking the model to do that in text. And if that is something that

generalizes if you can train a model on procedures and it can from that learn to execute those procedures I think that can be pretty influential for how we should how we should for example synthetically generate data it could be helpful to generate lots of procedures and instead over like step-by-step applications of those procedures or you know focus a bit more on both

I suppose because of this diffused nature, so there are many, many examples in code of solving slopes and stuff like that. In a way, that's a form of robustness.

Did you see what I mean? So in a way that that gives us many many ways of doing that type of problem Yeah, yeah, I see what you mean that it's if you don't just see like the application the step-by-step reasoning But you also see the procedure that gives you more robustness to different ways of expressing it or yeah Or even just in terms of just having more redundancy or having many many expressions of the same thing It would it would make it

robust to potentially different selections of the dataset, it would still work. Whereas perhaps fact retrieval, if the fact isn't in the dataset, it's just not going to work. Yeah, that's definitely true. I mean, in that sense, it's a form of abstraction that can generalize better. So on this abstraction thing,

I was speaking with the guy who wrote the GSM symbolic paper earlier, and Douglas Hofstadter says that an abstraction is a bag of analogies.

Right. So, you know, we have these concepts in our mind, like the concept of a chair. It's really difficult for us to describe what a chair is because I can give you like a million different descriptions or even the letter A. There was a book he wrote called Surfaces and Essences where he was talking about like how all the different ways that A's could be written. So it might be the fact that the case that our brains work.

don't really have these high level abstractions the way we think they do. Like actually all of these neurocircuitry activation pathways are firing and we sort of like know an abstraction via a million different perspectives. And do you think that could actually be analogous in some way to the way a neural network works?

Yeah, and I think that's also how language works. I think... I mean, I didn't come up with this, like Wittgenstein did, but he wrote this whole book where he was just like page after page trying to show that you can't define a thing. There will always be a situation where it doesn't exactly apply like that, right? It's all fuzzy and meaning is used, essentially, and it can change based on context and stuff like that, and

I think that's the strength of language, this kind of abstraction that is not...

formal or like purely symbolic but very fuzzy in that there's no clear boundaries of meaning or concepts of abstractions. Yeah, we were speaking about Montagu. So he argued that, you know, we should model language like it's a formal language. Of course, it's gnarly and it's very, very kind of constructive and whatnot.

Do you think LLMs are actually an appropriate tool given that natural language is not a formal language?

Yeah, I think that is what we have seen in the past couple of years because Montague tried to formalize language and that famously didn't lead to the most simple formalization. It's very hard to formalize language. I think Montague came up with this very strict form of compositionality and that has been very useful because there is definitely something in language where the meaning is...

composed from the parts. That's definitely true. But the very strict way in which Montague defined this is probably also not right. Probably there is... If you want to make this strict form of compositionality work in language, you have to come up with really roundabout functions where the meaning of a word is a function of the whole sentence or something and goes back into the word itself. Whereas if you take a more

lenient form of compositionality or systematicity like Feuder came up with that roughly just says like there's something predictable about the way we use language if you teach someone a new word like flips and you say I had really good flips last night then you can immediately

sort of estimate that this word is probably food or something and it was at night so maybe it was a dessert and you can use it in many different sentences and that is a form of compositionality and systematicity that almost seems like it's formal and predictive and like it there you can describe that formally but actually we've tried and that hasn't really worked that's probably precisely why language models work better because they can approximate such systematicity but they are not

pure formal systems. So you've said that language models could develop a causal understanding of the world and this is really interesting. I suppose it comes back to semantics in general. So you know John Searle said that the reason why humans have semantics is basically because we're physically and causally embedded in the world, right? And

Lots of linguists like Piantadosi are talking about things like concept-role semantics and you know, there's a whole intellectual school of thought now around how we could build semantics just in language models. What say you? Yeah, I love Piantadosi's work and it has inspired me in many ways and I agree with him but

I would say he probably also, or I don't know, I shouldn't speak for him, but there is of course a role for reference in the world. When children learn language, they start off by, this is again something I've heard from Pianta Dosi, is there's

phases of language learning where initially a princess is just like a nice woman that has nice dresses and is always kind to you and the child can point to it in the world and it has a clear reference. But as language evolves and as the child becomes an adult language speaker, this reference becomes less and less important and it becomes more and more abstract.

And now as an adult language speaker, I can talk to you about the COVID vaccine, but I would not be able to pick it out. If you give me a bunch of substances and you ask me which of these is the COVID vaccine or what is it made of? I have no idea. And there's many examples, of course, of

of things we discuss that don't have any reference in the world. But the COVID vaccine is just one where it does have a reference, but I don't know how to pick it out. And I still think I understand what a COVID vaccine is and have some sense of its meaning. But my meaning could also be further developed if I would know how to pick it out in the world, right? That means I understand it better and I have a better world model. How much of a sharp boundary do you think there is between these kind of facts that you're talking about and reasoning?

Probably not really, because when I was thinking about factual retrieval and I was building these tasks, I often struggled to come up with pure factual questions. So you can imagine if you ask someone like, what is the largest ocean in the world?

maybe this person is retrieving all the oceans in the world and retrieving their sizes and comparing them and then saying like oh it's the pacific ocean and then they did some reasoning so there's no clear boundary and i really tried to make these questions very factual like in what year does the biannual library open again you could come up with a way in which you could reason about the answer but you really need to have some atomic knowledge to answer this question but

Yeah, it's all fuzzy. So coming back to Seoul, you know there were so many replies to the Chinese room experiment, like the robot reply, the systems reply, you know, this kind of stuff. And I guess at some point, when does mimicry, functional mimicry, just become so good that it's a distinction without a difference? Yeah, that's a good question. And I think then you're... That's why it's important that people like François Chollet come up with things like ARC, right? Yeah.

His definition of intelligence is really about acting in novel ways and using your knowledge in novel situations. And a system that's just mimicking could never do that. Could we design a way of measuring the depth of understanding, whatever that would mean?

We're trying and I think evaluation is one of the hardest parts of the field. This week I heard a funny characterization of moving the goalposts. Someone characterized it in a positive way and I totally agree with that. They said people are constantly moving the goalposts and people are saying that as something that's bad. But actually what we're doing is collectively refining our definitions.

So first we're saying like, oh, if a system can do chess, it must be intelligent, right? But then it could do chess and then we're like, oh, that's not what we meant actually. Wait, let me move the goalpost. And that's not a problem. It helps us refine our definitions and it helps us... No one knows what intelligence exactly is, but...

designing more and more complex benchmarks and keep on moving the goalposts makes for a clearer view of what it actually is and what we're all talking about.

Yeah, I think experience is helping us carve the space up a little bit better in our minds. Like for example, I think we used to have quite a puritanical view about understanding and reasoning that you either, you're reasoning or you're not reasoning. And I think what we're starting to see with these models is, there's this Swiss cheese problem that sometimes you're in a hole in the Swiss cheese and it goes bananas and sometimes it's retrieval and sometimes it's reasoning. It's almost like there are these different modalities of function.

and sometimes it's doing more reasoning and sometimes it's doing less. Yeah, exactly. I mean, that's also how I think about them. There's this view that if you can show that a model trips up, it necessarily means it cannot reason, but I don't think that's true. I think it's such a complex system and

And if you prompt it in a certain way, it might use a completely different sort of function or program or whatever, how you want to conceptualize what it's doing than if you prompt it in another way. And if you give it tokens that are so foreign to it that it fails to reason over them, that doesn't mean that it cannot do those actual reasoning patterns and the rules that underlies that kind of reasoning. But it's just a limitation of the system. And it is a statistical model.

So you focused on specific types of mathematical reasoning. Do you think that they would transfer to other forms of reasoning like, you know, solving ethical dilemmas or something like that? Yeah, that's a good question. I think they do.

But of course, reasoning is such a multifaceted concept that mathematical reasoning cannot nearly cover it all. So mathematical reasoning is very formal, it has rules, that's why we chose it. The type of reasoning we look at is so simple that you can actually find the answers in the pre-training corpus.

But there are forms of reasoning for which we, like inductive reasoning, you can't find the answers. If you only see, if you only observe white swans, can you deduce or induce from that that black swans don't exist? I don't know. That's a form of reasoning that actually underlies most of science and that is...

more difficult to see if a language model can do that. But I think fundamentally, it probably can. It becomes just in such cases much more important to do some kind of verification of what's going on, why is it making this induction possible.

And can we like do experiments to verify it? If language models are doing something which is akin to approximate reasoning, what's the difference between that and formal reasoning? And do you believe in principle that connectionism on its own could scale up to formal reasoning? Yeah.

I think it can, and I think my recent paper gave... So I believe they're in very controlled setups. We have already shown that connectionist models can do formal reasoning, that they literally can learn to apply systematic rules in a way such that they achieve 100% accuracy on novel problems. There is a good paper by Lake and Baroni in Nature that does this,

There are other papers that show that, for example by Andrew Lampinen, Passive Learning of Active Causal Strategies, that show if you set up the problem in such a way that the model can learn to do the task as opposed to latch on to sort of...

unimportant things in the data it can probably it can learn to apply tasks in novel situations so i think empirically and theoretically we show we have shown that they can do a form of systematicity or symbolic computation although it's still limited for sure like it can't handle completely novel tokens but the question with my most recent paper was can it also learn to do something

in that direction approximately from data in the wild. Because language models are not trained on data that is so carefully curated that the only way you can make the laws go down is learn the underlying rules, because that's what these papers often do. And the question is, can it also then learn to do something that's a formal reasoning or symbolic reasoning? And I think it can.

And my paper doesn't show that exactly. It just shows that it's doing something generalizable, that it can apply to many different questions. But intuitively, I think it will be possible. So there's always been this notion of a gap that people talk about, especially in respect of creativity, adaptability, dealing with novelty and so on. In fact, many people think that the definition of intelligence is, you know, dealing with novelty. Yeah.

So there's always this thing that we can do combinatorial creativity, right? So we can do reasoning by like recomposing bits we already have. But people say that this inventive creativity, like being able to, you know, train on all the data up to 1945 and then invent some new theorem that came after that. People feel intuitively that the models wouldn't be able to do that. Yeah. What do you think? That's really like the goal, right? Like that kind of stuff, it would be really cool. And I

I don't feel like current language models can do that, but I don't feel it's technically impossible. Even in the current regime, if we were to find so much data that the model can learn the causal underlying data generating process that is relevant to come up with novel information, then

it can do that but of course like we we have used most of the data that were created over the past couple thousand years or at least we're trying to um and it's probably not feasible to scale up to such intelligence in in this way but um yeah i i i think it's not not theoretically impossible it kind of gets at whether einstein came up with some like stroke of genius that doesn't

compose stuff he has seen before or whether he actually also stands on the shoulder of other scientists and reasoned about things for a long time and use that to come up with new knowledge and I think it's probably the latter and that that's not so special that we cannot I mean okay I don't want to say that Einstein is not so special he is but probably we can in some form recreate that process

So Tim Rock Tashall, he's got some great work on open-endedness and creativity and whatnot.

And it's interesting because Ilya Tsutskevar, he gave a talk at this conference and he said that we are hitting a data wall. And to me, that doesn't pass the sanity test, right? Because if you think about it, there are an infinite number of ways you can make more data. You can transform the data we already have and you can generate lots of data. But this is where it gets into Tim Rock Tashel's domain, which is that it's not just about generating more data. It's about generating interesting data.

I find Gini and stuff incredibly interesting and I agree that the intelligence of the system is purely limited by the complexity of its environment. So I think that's an interesting approach. And I also think that, yeah, so I think scaling up data helps because it makes it less and less possible for a model to latch on to spurious correlations.

it's going to be more and more useful to learn the causal world model that generates this data, the more data you get, because it's likely going to be less semantically similar than what you've seen before. But if you were to somehow be able to select from all this data that we have, data in a way that's sufficiently diverse,

for you to learn this causal mechanism quicker without seeing, I don't know, trillions of tokens, I think that that should maybe also be possible. And I think that is informed by these controlled studies that show that you can train a model to do something systematic in one task

But how do we train a model to do something systematic in as many tasks as we want language models to do? What's your philosophy on scaling in general? So do you think that if we just scale current approaches up that we will get dramatically better results? Or do you think we're missing something significant? I am not going to bet against scaling because that seems scary. It has worked pretty well.

But yeah, so I think scaling is cool. I think there is issues with it and there's probably more data efficient ways to do it. Just because theoretically you can train a model to do many different complex tasks with next token prediction doesn't mean it's the best way to do it. And maybe there's something about intervening on an environment and generating your own data that can help there and that can make models more data efficient and

I could see how that could be important in the future. And I think Ilja also mentioned not specifically that, but agency or agents. And maybe that's getting at the distinction between sort of passive learning and active interventional learning.

We should save the agency discussion for a bit later because we've got lots of things to say on that. Why don't we just talk a little bit about that Fodor and Polition paper from 1988. This was their famous connectionist critique and they said that the way that humans think is very formal. You know, we have these rules and we have this compositionality so we can

We can generalize Mary loves John to Mary loves Jane. And we can also take a sentence and we can kind of invert it. We can decompose it back to all of the constituent parts and we can figure out what things mean. And neural networks on their face, they don't do that explicitly, but perhaps they might do it implicitly. What are your reflections on that?

Yeah, I think Fedor and Felician's argument has definitely stood the test of time. Although there's been some theoretical work showing that it's not impossible in the connectionist regime to learn symbolic functions like Smolenski's work in the 90s, in tensor product representations. That was a theoretical work showing that you can do some symbolic functions

computation in the sub-symbolic regime that connections networks represent. But the argument nonetheless stood the test of time because this systematicity that we also spoke about earlier is definitely something that's present in language and that is necessary to explain if you want to understand how humans can produce something that's so varied with such little

examples or you know memory and it was a challenge for i don't know 30 years or something like that and it probably still is uh it probably still relates to this concept of intelligence as being able to process novel information but i think there has been a lot of empirical work showing now that actually sub-symbolic models like neural networks can

can do symbolic computation albeit not explicitly. I mean, yeah, explicitly in the sense that they can maybe output some symbolic computation in the form of language and explicitly reason over that. That's probably a good idea. But probably they can also do it implicitly

Yeah, I suppose that there was a bit of a theme of just having strong theoretical tools, especially around that time. So, you know, this idea of productivity, being able to generate an infinite number of sentences. I mean, Chomsky said the probability of a sentence is an oxymoron. It just doesn't make sense to say that. And yeah.

It certainly feels, as you say, that our language is compositional. And again, Chomsky said it's a language of thought. So if our language is compositional, then surely our mind must be compositional. So it might have just been like almost an intuition pump to reason about how our brains work. Yeah, I do think, though, that... So this gets at this question of whether language is thought, right?

And I think that has been pretty rigorously debunked at this point. And I think maybe language is as useful to us precisely because our thought is not compositional, because we can use it as a compositional tool that is maybe a bit harder for us to do systematically in our brain. And I mean, there's been work by F. Federenko in 2020, for example, showing that people with aphasia

can still be chess grandmasters. So when your language system is completely messed up, you can still reason perfectly fine, which kind of, in my view, debunks the theory that language is thought.

You said to me earlier, like, well, what's the big deal? Why do we need to have invertibility? And when I say invertibility, I think I'm kind of saying decomposition. So, you know, they were talking about compositionality, but I think decomposition is really important. That's being able to go back to the constituents. And it's not only about being able to explain things

what I'm thinking, it's also about parsimony and reuse. So we see in McInturk, for example, on that scaling mono semanticity paper, that like the representations for the Golden Gate Bridge, it was scattered throughout all of these circuits throughout the neural network. And it feels like, certainly at a psychology level, it feels like our brain doesn't work that way, but maybe that's just a bit of an illusion.

Yeah, it's hard for me to say as I mean, I don't want to comment on neuroscience because I have no idea about that. But what I could say about this is that it seems to me that it's pretty useful that the model is representing it in this way. And maybe that's also as in

It's doing a very distributed representation, right? And that's essentially the core reason why people in the 90s believed in connexionist models, this distributed representation where all neurons can essentially light up for all different tasks as long as there's some shared structure. And that makes them so flexible and that makes them so good in novel situations, actually. I suppose there's a bit of a broad theme as well. So certainly 20 years ago,

We used to design AI systems with explicit strategies. So planning was an explicit thing. Reasoning was an explicit thing. Even certain architectures like DreamCoder, Kevin Ellis's DreamCoder, it had an explicit wake dream state. So when you dream, you kind of expand your hypothesis space. And then like, you know, when you're awake,

you kind of select the ones that work and neural networks do this expansion and collapse all of the time. But what we're seeing though with the newer architectures is that they kind of do the same thing, but they do it more and more implicitly. Like we don't hard code it in. Exactly. And that's what we've learned the past couple of years that

That's the way to go, probably, because that's what we've learned from the LSTM to Transformers, which is quite funny, actually. One of my first papers is on compositionality. We designed this benchmark together with Brendan Lake and others where we held out systematic experience in the data and we showed that a human can easily do this, but a LSTM couldn't. And this was all pre-

transformers, LLMs and ChagGPT and that kind of stuff. And someone this week told me that actually a transformer gets almost 100% performance on most of the tests we designed in that paper. Not all, but most.

And that's just one example of the transformer being a much better fit for compositional tasks than LSTMs. And the lesson maybe we can take from that is that LSTMs have this explicit recurrence, which seems very useful, right?

Because there is clearly a recency bias. Clearly what we have just talked about is more relevant than what we, I don't know, you and I talked about last time we saw each other. But if this recency bias is so obvious, why would you build it in? Because the model can easily learn it from language. And this is what we've learned in the past couple of years, that if something can be learned, don't build it in.

Or use Excel STMs. Yes. Yes. Use Spoke to Set up. Spoke to Set the other day. Yeah, he was saying about their new exponential gating scheme allows them to kind of like, you know, overwrite their memory. Yeah, very cool. It's kind of weird though, isn't it? Because...

There is this notion, I was saying to him, like, when are we going to see industry adoption of Excel STMs? And I think in industry, the perception is it kind of doesn't matter. Like, it's just about scale. Exactly. But that's the thing also with open AI. Like, they don't care about these, like, compositional, like, is it auto distribution? Are they...

Are we holding out the right things? Has it seen this before? No, they're just like, we're going to make it in distribution and we're going to scale it up. And that's kind of their genius, essentially. No matter what the architecture is, no matter how much it looks like the brain or why it should theoretically work better than something else, if you can...

you know, use more flops, it's better. Let's quickly talk about Smolenski. I always mispronounce his name, so I'm going to say it very slowly. So around 1990, I guess it was a response to this photo pollution thing. And he said that with these, you know, connectionist models, you can still implement the essential capacities of symbolic processing, you know, such as representing variable bindings and structured data and compositional operations. What did he propose?

So he proposed a mathematical, it was a mathematical framework for variable value binding. And that's like this very intuitively symbolic computation, right? No matter what the value is, the variable can take it and you can do processing on it and the results will be good.

reliable and the same and um yeah further inflation says uh connections models can't do that and that produced a decade-long back and forth between connectionists and symbolicists and smilensky gave this answer with a tensor product representation to say not like look you can actually represent variable value binding in a purely sub-symbolic connectionist way

And that's what he shows in tensor product representations, where you represent the variable and the value both in a distributed sub-symbolic way. And you can do processing on them, and they all become embedded in this continuous space, this distributed space, but then you can still extract the value from the variable after processing. Unbinding is what they call that. So what were the drawbacks with that approach? And also, I think there's a bit of a leap of faith here that...

neural networks could in some way approximate what he was talking about. Well, and I didn't read this paper, but Tom McCoy published a paper together with Smolenski, I think, that's titled RNNs Implicitly Learn Tensors Product Representations.

So that seems to indicate that they can, but that's just based on the title. I think that one has been on my reading list since I found out about the tensor product representations. But you're right, like what's the limitations of this method? It's a purely theoretical argument, right? He's saying to failure inflation, look, actually you can do this.

That doesn't mean that it's practical. That doesn't mean that it scales. Tensure products, the way he proposed in the 90s, doesn't scale at all. Because it explodes in the number of variables that you're representing. So let's say variables are positions in sequences and values are the tokens.

then the tensor product representation will, I think, is squared in the, or no, like, explodes in the number of positions that you're trying to represent and in the number of tokens. So that's not feasible. I think Smolenski is working on this at Microsoft, so I'm sure he's working on making it more scalable. But another thing that I took away from reading that paper is that

to get actually this value back from this distributed representation, something needs to be linear, independent, like the

the rows in a matrix or something like that need to be linear and independent. And that seems like a very hard restriction to me that probably won't naturally arise that, or maybe it would because I sent people also tell me that if you randomly sample, it's almost always linearly independent in, in high dimensions. So maybe actually that's not the big of a limitation, but yeah, the way he proposed it back then wasn't that scalable. Yeah. So yeah,

these tensor outer products composed of these rolls and fillers and apparently the rolls required quite a lot of hand engineering and yeah we've got this combinatorial explosion problem but anyway it's interesting it's a potential way forward. Okay Laura where does agency fit into all of this and just to frame the question a little bit

Some people are really worried about agency. I was speaking with Benji the other day and he said that agency is really bad. You know, it's going to lead to these things controlling their own goals and it could be very dangerous and whatnot. And we should strip away all agency. Yeah, no, I totally agree with that. It's like if you think about an intelligence system that's also an agent or just like a random human can be very dangerous, right? And probably agency is a large part of that.

So if you have two systems that are otherwise completely identical in capabilities and one is an agent and the other is a tool, I would prefer the tool. The thing is just that I'm not so sure if it's possible to reach an interesting form of intelligence without the notion of agency.

So my interest in this question has just been like, how can we define this concept and how can we detect whether it's present in a system? And that's a pretty difficult question, I think. It certainly is. Do you think LLMs to any meaningful extent have agency?

Yeah, that's a question that I've been thinking about. I think there's many definitions of agency and to me it's just a kind of goal-directed intentionality and we can get into what that exactly means. And doesn't LLM have that? I think you could in some way see it as modeling agents and maybe it also models their goals.

So, of course, they're trying to model the text and this text is, they're trying to predict the next word efficiently or the next token and decrease the loss there and this text has been generated by agents.

and probably is useful to decrease the loss if you also understand what goal overarching goal this agent has so if this agent is trying to persuade you maybe that that informs or not not you that lamb but if the agent is the text is trying to persuade something or someone then maybe it's useful to model that goal to sort of decrease the number of possible tokens that can be show up in that text

When we say the LLM is trying to persuade someone,

There's this weird thing, isn't it? Because to a certain extent, agency is observer relative. It's a thing that we say that another thing has. So it feels like at the bottom of the spectrum, it could be as if the thing has this goal because the LLM probably isn't thinking, oh, Laura's an agent and Laura's got this goal. And in order for me to control Laura, I need to do this. And it's in service of that. It feels like there's an unwitting form of agency at first. Yeah.

Which might be even more dangerous, right? Like if it's accidentally persuading you and it doesn't understand the things that can happen when you do that, then that might be even more dangerous. So it kind of gets at the distinction between simulating something and actually coming up with it yourself. And I don't know how you can find the distinction between the two. I guess you would agree that agency could emerge

Even if we're not explicitly trying to make it emerge. Yeah, I think that's the interesting case. I've been thinking about this a lot recently and I think the interesting case is when it's emerged. So there's this definition from Zach Kenton from DeepMind. They also have a safety interest in agency. And a couple of years ago, they made this definition of agency that's about how...

If you... an agent is something that changes its policy when its actions affect the environment in a different way. And that's a nice definition and I think that definitely captures something that I also find important about agency. But you can kind of trivially make a system of LLMs in an environment or something where the environment is also an LLM, such that it adheres to this definition.

So I think the important thing is like when does something like that emerge from something as simple as next token prediction? And that's kind of what I'm interested in. How might we measure that? Yeah, that's a good question. And I have no answer to that, but I've been thinking a lot about that. I've been even speaking to some psychologists, Ellen Su at NYU actually, who works on intent detection in AI.

So there is methods we can learn from in psychology that can help us inform here. But I think the thing I've been thinking about is what makes agency potentially interesting and complex is planning. So if an agent, if it can't plan, it's probably not super useful or dangerous.

So and planning seems like somehow an important aspect of an agent that's able to achieve complex goals. So I've been thinking more about planning and trying to detect when a model can be doing planning and when a next token predictor can actually be set to do planning. Yeah, it's so interesting that so many people are converging on the same idea. I mean, there's certainly an active inference there.

Carl Friston would say that the planning horizon is basically the measure of the degree of agency that a thing has. Even Eliezer Yudkowsky, he basically said that an intelligent thing is defined by its planning horizon, pretty much. And Josje Barck told me that agency is the ability to control the future and the future, of course, implies a planning horizon.

But you definitely think of agency though fundamentally as about this kind of cybernetic information exchange with the environment. Can you tell me about that? Yeah, so you just said that someone called it the ability to control your future that maps onto what I think. I think an agent is something that takes actions in order to control its own future inputs, which is essentially the same thing said differently.

And I also think importantly it is able to do this under uncertainty in uncertain environments because you kind of want to get at this distinction between reflexes and maybe deterministic environments where nothing changes and environments where there is uncertainty and the system can still control the future.

In the kind of, you know, the biological world, we are decomposed into all of these autonomous cells and agency is just something which emerges through the sheer complexity of interaction. Yet we still talk about LLMs as having a type of agency. What's the difference between the two? Oh, that's a difficult question. I think it's just an abstraction that we use to describe a complex behavior.

and we can get at that abstraction in a way that it applies both to the, you know, balls of cells that we are, and to the other types of cells that the LLMs are in a very, composed in a very different way. I think what you can't get at with this view is this, you know, it feels like something to be an agent kind of thing, like there is something that's

you know, less explained by this abstraction that I was describing earlier that maybe doesn't describe what it is like to be an agent or something like that or whether or not it feels like I'm setting my own goals or whether they're induced by the environment and

I don't know how to make a definition that can distinguish between those two things. I suppose the world model comes into it as well, that in order to do planning into the future, you have to have a very good representation of the world. Yeah, definitely. The more causal your world model is, the better you can plan. I mean, you also need other things, like some way to represent the possible futures that you're rolling out. But it's definitely, yeah.

And even that seems to suggest that causally embedded agents, we have this active sense making, continual learning. So we're always doing experiments, right? We're kind of learning about the micro causal patterns in the world, which makes us more, you know, it makes our world model higher fidelity. Language models seem to have a very globalized version of that, but that still works quite well.

What do you mean by globalized? Lazen, even though they're learned in all of these patterns from many, many data sources that have been mixed together, they can learn powerful representations that can respond well to their impacts. But we're like in the situation, continually learning, active sensing, like finding out about our environment. So it feels like we understand our world that we're in even better. Yeah, no, definitely. That's true. I think we have...

I mean, we have these sort of core knowledge systems that our intelligence is built upon, right? And that are...

present in all animals on the world to some extent and that shows that they are just so useful for surviving in the world that they just emerge for everything. Whereas language models are trained on language and they have probably some sense of all these kind of things, but they're not constrained in the same way that we are. Language is inherently able to describe impossibilities and things that are physically not possible and imaginative and

imagine and stuff like that. So it's also not so surprising that they show some different behavior and hallucinate and produce impossibilities. But humans are learning in a very, very different environment. And we have also learned to talk about impossible situations through language and to imagine a future that is possible or not possible and reason about these things. But we're still constrained by the physical reality.

So I often have disagreements with my co-host, Dr. Dagar. So he has a real no-nonsense definition of agency. He thinks it's basically just an automaton, right? So it's, I mean, I can give you the definition. It's a machine that receives input S from an environment E, performs a computation C that depends on a non-empty subset of S, and takes action A that depends on C to modify E. So it's basically like, you know, you have an environment...

Pretty much. And to be honest, you could use this rough definition even to describe active inference and many other things like that. But the thing I don't like about it is, you know, it's basically describing a kind of state machine. And of course, like for him, the environment could mean like any ambient things in the environment. And

For him, computation is very important. So he's a big fan of like the Chomsky hierarchy and he thinks there's something special about Turing machines. So he thinks that we as strong agents, we must be able to do this kind of recursive, nested, iterative form of computing, which is what allows us to do planning and whatnot. But to me, that just seems a little bit like a little bit weird, right? I love this idea.

philosophical notion of agency and I realize this is a bit wishy-washy because I'm using words like emergent self-organization autonomy learning adaptability intentionality you know degrees of agency and all of this kind of stuff and it kind of feels like how can a computer program that maps from an input to an output how could that be an agent yeah

Yeah, that's really the question, right? And I think that I agree with you that this definition that your co-host gives is, I mean, it's a fair definition. I just think it puts the emphasis on the wrong thing. I think it exactly doesn't explain what I find interesting about agency, which is this like acting under uncertainty kind of idea, right? It doesn't get at that.

And there's something very intuitive about agents to us. And it would be useful if we were able to describe that in a way that's more...

more abstracted away from computation than this definition that somehow gets at the difference between a thermometer that you could also describe in the system and an agent because that's what we're trying to do right maybe there is maybe there is no distinction but humans perceive a distinction like it's it's actually one of the core knowledge systems agency and this is

very nicely shown by this video from the 1940s, where you have a big triangle and a small triangle moving around in a 2D environment. And there's a little box with an opening, and the small triangle is trying to escape from the big triangle. And it's going into the box, and the big triangle is like...

bumping against the box and these are just moving shapes, but we immediately assign agency to them. And we say the big triangle is mean and the small triangle is scared. And maybe this is a failure of our application of the agent core knowledge system because they are not agents, probably. Someone programmed them to be. We intuitively pick out an agent from a thermometer.

and that's the distinction I want to get at and the definition by your co-host doesn't really get at that to what extent is agency just the way we think yeah

you mean it's not it's just not real or something it could be it could be both it could be because it's real it's such an important way of dividing the world up that it's become embedded in us as a core cognitive primitive but it it seems fundamental to the way we recognize things yeah yeah it definitely seems fundamental i think it's just important in the sense that an agent can

be of use to us or in a different way than a non-agent can or can be dangerous to us in a different way than a non-agent can and whether or not that's just you know something we perceive that's not sort of fundamentally there doesn't really matter then I think. Yeah one thing I guess that sometimes people say well we just philosophize everything and you know certainly when we talk about consciousness you know there's people like

David Chalmers who says, you know, it might we might be philosophical zombies It might just be a little bit extra and even with free will which is almost like a stronger form of agency Which is that in the situation you could have done differently So we're kind of like imagining how things could have been differently and it's a similar thing with intentionality what we think that intentionality is like something on top of what a language model might do or what an automaton might do and

Do these philosophical properties, are they useful? I think they are because again, they get at... So it definitely feels like something to be conscious to me and people have talked about that a lot. So it must get at something interesting, I'd say. And therefore it's useful. And I think similarly with intentionality,

I just view it as a useful abstraction of behavior that can guide us towards understanding better maybe how cognition has emerged or how animals, certain animals are different from other animals and how...

can also help us evaluate an artificial intelligence and whether or not they are doing something that can be seen as intentional and goal directed. So I think the other thing I don't like about the automaton view, or maybe even reinforcement learning as an extension, is that it's a form of behaviorism.

which is that we only look at what the thing does and we don't have rich cognitive models of like what the mental states are. And it feels to me, and maybe this is like an interesting departure for you because like in a way, like with the language model discussion, it felt like you were arguing that we don't really, you know, it doesn't matter if we convolve functions together into this big soup, but with agency, it feels like you are saying that we need to have an explicit structure of how an agent thinks.

Yeah, so I don't think it doesn't... Representations don't matter, right? I think they matter a lot. I think there is a distinction in pragmatic representations that are purely goal-directed and representations that are somewhat divorced from the current situation you're in or the current goal you have. And both are important and I think both exist in the real world, in animals and also in language models. And so...

I don't think it matters whether or not we convolve or how we do it, but I do think it's important to reason about what kind of representations have been learned and whether they are nicely reflective of a causal world model that we want the models have learned.

So I think this behavior is what you said that this definition is a bit behaviorist. I think that kind of gets at what my problem is with it. I think, yeah, because it's, it's, it's sort of like, sure. Yes, this definition applies, but it doesn't explain to me why I care about this system and what is interesting about this system. And, um, yeah.

Yeah, often behavior can, you know, explain a lot and you can say a lot about behavior. But if you know something about the representations that produce that behavior, you can describe a system in a more useful way. I know you're a big fan of the Simulators article by Janus. And he said, I guess you can interpret this in an agential way, that there's some kind of agential decomposition of a language model into these role players. Yeah. What do you think about that?

Yeah, I think that was like... So first of all, I'm a huge fan of the article, but I...

became a big fan because of Jacob Andreje's Language Models as Agent Models paper, because somehow he describes it in a type of language that I find easier to follow. But the simulator he's posed has, of course, been hugely influential and also for me in my conceptualization of language models. This is essentially the reason why I think about them maybe modeling human intent and the intent of the agents of the text that they have been learning

learning from and this view of them as a sort of superposition of many different agents just is such a rich conceptualization that really explains many things both their successes and their failures and I think that's what's cool about it.

One interesting thing about the article is this notion of coherence. So when an agent is, you know, like a role player is selected, then that role player will stick around for a little while. And certainly it feels like our intuitive notion of agency in the real world is that we maintain ourselves and we also stay kind of coherent over time. Yeah, a bit, maybe. Yeah, a little bit. I definitely change my views over time. I think that's also the sign of coherence.

Yeah, and it's important. But no, you're right. Like, I think there's also been a paper here and that shows that, or here at NeurIPS, that shows that language models are not, don't stay in character as long as actual agents do or humans do. Tell me more. To be honest, I didn't read it, but I saw it as something that like I should look into. But I mean, definitely they're probably not as coherent and they don't stick to their role as clearly as humans do. That's...

probably the nature of being an approximate sort of agent or a superposition of agents and that you can't really disentangle one agent from the other agents. What do you think of non-physical agency? So I'll give an example of that.

we as a collective form a kind of agency, you know, like a meme is a type of agent maybe. And I know Dagaard doesn't agree with me about this. Even the COVID virus, I heard that flu rates are dramatically up in the UK and there's a weird kind of symbiotic relationship between flu and COVID. So when flu is up, COVID is down. And it's almost as if they are these virtual agents that are sort of like interacting with each other through the hosts. Yeah.

No, I think that makes total sense. I think probably it will be really hard to say that a sort of collection of agents is not an agent, but an agent itself is. And I think it can be a useful way to represent something. For example, a company can be seen as a group of agents, right? And how it behaves. But at the same time, at the company level, there's something that seems...

sort of extra that you can't exactly explain from the parts, which might be some kind of emergence, I don't know, some kind of emergence, yeah, that's hard to describe, but that many people, of course, have thought about. But I think it makes sense that a collective of agents can also be seen and abstracted as an agent in some sense, but there's probably also something to a sort of single agent that understands, you know,

the actions you're taking that you guide, whereas in a collection of agents maybe that becomes different or more difficult or something. When we look at a super agent like a company or a country or a religion or something like that, do you think the purpose bubbles up or down? Ooh, let's see, both?

I think both. Yeah, I think the purpose of a company is probably definitely some combination of the people that work there and then probably the company as a whole forms some values or something that then inform the agents individually or something again as well. Yeah. On this subject that I was discussing with Benjo about AI safety. Is that something you're concerned about? Yeah. Tell me more.

Yeah, I think just if you think about and if you philosophically think about a system that's intelligent, that it just can be dangerous. So as a society, we don't even really know how to control humans, but we have set up a pretty okay system to do it that fails at different levels, right? At the individual level, at the between country level, at all kinds of level, it fails sometimes.

That's scary and dangerous. And I think intelligence is not so special that we can never build it. Therefore, that can be dangerous, right? But I've struggled talking about my timelines, as in when will this happen? I have no idea.

I don't see it happening in the next three years. I feel like a lot needs to change. I think society moves slow as well. I think there's massive issues in adoption, like these systems are not reliable. So I think as sort of philosophically about an intelligence agent is dangerous and a whole separate thing of AI safety that I found even more compelling

is that if we slowly give over control to dumb agents or dumb AI, that can also be dangerous in a society like ours. And that's also something I worry about. So I think understanding how everything works and how the system works is important. Because I also think it can... I'm not purely a pessimist. I think it could bring a lot of...

great things to the world. There's a lot of things that should probably be automated because, or at least it would be great if we can get certain professions some help because, you know, we're all getting older and a lot of people are working in care. And if we don't do something, I'm not saying AI is going to help there, but it will be great if AI could alleviate some of the things that are going to become more difficult in the future.

like healthcare, if they could make doctors more productive there, for example. But it's really non-trivial to think about how it can have a positive impact, I think. And it's good that lots of people are thinking about it. I love agency as a kind of mental model to think about this, because if it is the ability to control the future, then to me, it's analogous to power. And certainly talking about power dynamics is the language of

talking you know how we should govern this and i can see many arguments i can see how this kind of technology actually takes away our agency it can also dramatically give us agency because all of a sudden people can build chemical weapons and bombs and stuff like that yeah um but but the other concern of course is that it itself will you know adopt a form of agency and

through instrumental goals or whatever. Yeah. So on those three, where do you see the most significant risk is? I think all are risky. I think the thing I am most worried about is skewed access. So if AI becomes very useful and makes us more productive, it would be great if we can distribute that in society in a way that helps people like

I think technological improvements have maybe not, have not like in the right proportion helped the right people. And that's, you know, a result of the system we live in and the result of our politics. So I think it's really important in the future to think more about that and to think more about how we can give access to the right people. And I think

The way to go about that is policy and think about how our system works, how our economy works and be prepared for massive improvements in AI capabilities. If this starts to go bad, what do you think? Because obviously there should be some kind of a harbinger. What would be the early warning signal for you? Sorry, what's harbinger? As in like a harbinger is like a signal that something bad is about to happen. Okay. Yeah.

Yeah, I think probably that's not what's going to happen. I think we are going to slowly build something and then at some point we're going to be like, oh wait, remember back then when there were elections and Facebook apparently may have influenced them and we built this tool and we didn't realize how it would affect us. And I think that will probably also work like this with AI. We don't understand what intelligence is and probably we won't recognize it immediately if we see it.

That's fascinating. Yeah, I love this notion of kind of undermining our weaknesses. So it's actually a very sort of alien diffused thing that we might not even be fully cognizant that it's happening. Yeah, exactly. Yeah, I think that's more likely to happen than that. We're all of a sudden going to be like, oh, wait a minute. This is dangerous. Though, I mean, there are also examples of that happening, right? Like Chachi Piti.

in NeurIPS 2020, 2021, 2019. 2022. 2022. A couple more years off. Yeah, so I remember ChatGPT dropping and that was for me the first time that I was like, oh my God, language models are crazy. But ChatGPT was a very...

So OpenAI is like they made much more than incremental process, but ChatGPT itself that dropped that day was maybe just somewhat incremental in the sense that it was just like a usable interface to a model that was already pretty powerful. And it helped us understand that GPT-3 was actually really powerful with some instruction tuning on top and actually a chat interface. So that was like a sort of slow process

change of things that immediately made people aware of like, wow, this is pretty crazy. So maybe there can be something similar where an AI does something that we all didn't expect it to do. And that makes us collectively think like we now have to like pay attention and, you know, change things. But... Yeah. Yeah.

I think the locus of AI is something that we're hinting at as well, because certainly no one at Meta intended for these issues of social media to happen. They built these algorithms. They just kept going one step, you know, let's build like an advertising system. Let's do collaborative filtering. And all of these stuff are just externalities that I don't think that, you know, so it's like this unwitting agency.

But then what is the agent? So like the whole system, including us, we're the agent. And it's the same thing with AI that we're in a way we're looking for agency inside the large language model. But like we as a system, we're actually a weird form of new collective intelligence that no one really even understands. And that's pretty scary, isn't it? Like, yeah, it's...

Yeah, if Facebook can be seen as an agent in and of itself, and we have built legal structures around who to blame for what, right? But that doesn't mean that these people that we blame intended that to happen. So that's pretty scary. And that can become even scarier when you build a bunch of intelligent artificial agents that you cannot subject to the same level of societal control that we do to ourselves. Yeah.

So we last spoke at NeurIPS 2020, 2022. And I feel that you have, you've moved your position a little bit since then. Can you talk me through that? - Yeah. At the time, so I took a while accepting that church LGBT school.

Like many others, I was skeptical at first, and especially the amount of data that has been shown. And that's also... My recent paper has again moved my opinion. I, for a while, also thought they're doing a bit more... Less generalizable retrieval than the kind of approximate generalizing they are doing now. And over time, I've just changed my view of...

how promising this approach is. And I can pinpoint it to a specific thing also that happens is I put out this paper on LLMs are not zero-shot communicators. And at the time, I thought like zero-shot communication is pretty important, right? All of us can do it. We don't need five examples. So I thought, okay, we need to make sure these models can respond zero-shot to these questions. But later I developed this view of

them being multitask learners and general learners that you do need to find the right way to interact with them, right? And one very salient memory for me was that Andrew Lampinen described it as, I think even on this show, he said, zero-shot prompting a language model is like walking down the street and shouting to someone like, what is 15 times 32? And they're going to be like, you know,

Who are you? Far off. And that was sort of his analogy for zero-shot reasoning, and that makes total sense to me. Just because they can't do something with your specific zero-shot prompt doesn't mean that they can't do it at all. And it's important for them to do a zero-shot, definitely. It's a limitation. But if they can't do it, you need to try a few shots, and you need to find the right prompt. You also don't need to go overboard. You don't want to do...

prompt engineering on the test set, essentially. But there's the middle ground. Laura, thank you so much for joining us today. It's been amazing. Thank you.

How Do AI Models Actually Think? - Laura Ruis 01:18:01 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

How Do AI Models Actually Think? - Laura Ruis