We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How AI Could Be A Mathematician's Co-Pilot by 2026 (Prof. Swarat Chaudhuri)

2024/11/25

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Swarat Chaudhuri

Topics

Swarat Chaudhuri教授讨论了人工智能推理、定理证明和数学发现的突破性进展。他解释了神经符号方法，并分享了他关于COPRA（基于GPT的证明代理）的开创性工作以及对大型语言模型的见解。他认为，通过结合神经网络和经典符号代码，神经符号编程可以表示混合模型。他认为，使用极低级的语言（如汇编语言），让机器自行找出指令，可以发现人类无法想象的新事物，从而实现数学、物理学和其它学科的新突破。他还讨论了在数学中使用AI进行定理证明和概念发现，以及如何扩展和模块化数学证明。此外，他还探讨了形式化方法和AI数学中的挑战，包括形式化证明、经验性谓词和AI数学中的不确定性，以及如何解决AI系统中的污染和概念学习问题。他认为，大型语言模型在数学推理方面取得了显著进展，但仍有很大的提升空间，例如缺乏模块化和组合能力。他介绍了他的工作，包括类型导向的神经程序合成、神经松弛程序启发式方法以及使用大型语言模型进行库学习和抽象。他认为，大型语言模型可以作为抽象工具，识别程序中的常见模式并用自然语言解释，从而学习概念。他还讨论了符号回归，以及如何使用大型语言模型来指导程序搜索或表达式搜索，以及进行抽象。他认为，到2026年，AI可以作为数学家的重要助手，帮助他们解决复杂的数学问题，并可能导致新的数学形式和人类与AI之间新的合作模式。他还讨论了形式化证明框架（如Lean）的作用，以及如何将大型语言模型与这些框架结合起来，以提高数学证明的效率和可靠性。最后，他还讨论了测试集污染的问题，以及如何减轻这种风险。

Deep Dive

Chapters

Professor Swarat Chaudhuri discusses the definition of reasoning in AI, emphasizing the importance of performance in mathematical, programming, and planning tasks. He also touches on the challenges of defining reasoning based on human-like behavior and the quantitative measures needed to assess progress.

Reasoning in AI is defined by performance in tasks like mathematics, programming, and planning.
There is no binary definition of reasoning; it's a quantitative measure of improvement.
Different methods for reasoning include symbolic and neural approaches.

Shownotes Transcript

Translations:

中文

Neuro symbolic programming is this approach to artificial intelligence, where you are representing models using a mix of neal networks and classical symbolic code. You mentioned alpa proof that a great example, but I would say that even these agents that are generating some code using language model and then executing the code, and then getting and then using that to make for the decisions, they are really doing something neural symbolic as well, right? Because you are actually using a piton interpreter, or which is a symbolic too.

Now there is a universe where you have, again, an extremely law of a language to sort of assembly language, and then you just let the machine figure out, over time, what kinds of instructions to use. And to me, that could have a lot of value, because you could discover things that no human would ever imagine, and that would enable your potential to do new kinds of math, new kinds of physics, new kinds of ology. And you are going to discover these higher level building blocks.

Yes, you know, no human has figured that out yet. This A I has. But then you are using this building block to build parSimony programs, models that can do amazing things, and then you are just good strapping, and you're building up this entirely new universe of mathematics and computation. Can imagine such a worth. Emily is sponsored .

by a senate, which is the model serving platform for machine learning workloads. Now you log into their platform, the first thing you see is the SaaS option. So this is the the really simple option.

Just like on OpenAI, you can send requests up, you stick your access token in there, and you can access all of the latest open source models. And it's faster energy, cheaper. By the way, when you sign up, you get ten free credits, which means you can do some without spending a penny.

Also watched the interview I did with kenni, the C. O the other day. mt.

Is sponsored by two for A I labs. This is a research lamps started by Benjamin ruiter over in zark. They have just acquired minds AI the winners of the art chAllenge.

Their sole purpose is trying to figure out age as quickly as possible. And they are looking at the art chAllenge in particular. If you're in eric and you, anna, join that their monthly meet ups.

If you're interested in working for them as an mr. Researcher, reach out to him, go to two of a lab store. Ai twit, welcome to M. L. T. Thank you very much.

Thanks for having me is so .

nice to have you hit. Can you tell us about your background?

I'm a professor of computer science at uh the university of texas at Austin and um this year I am also a visiting researcher at google deep mind in london. My original research background is in symbolic methods for reasoning and programing languages. But over the last ten years, i've also got a very involved machine learning. And these days i'd say that I have a foot on both sides, both symbolic methods and and machine learning.

wonderful. Now we had a very interesting conversation in the car on the way over about reasoning. That's right. What is reasoning?

Well to me reasoning is to some extent in the eyes of the beholder. You know it's one of those things that you know IT when you see IT. Um but I think that for the from the point of view of defining that in the A I context, IT makes sense to have some criteria in mind.

Um so what constitutes reasoning? So um one of these is a in my mind doing well at mathematical or programming or planning these kinds of tasks uh right that people have associated with um with deeper thinking and and and reasoning. So I would say that if a system is performing Better at those tasks compared to existing methods than that, I would say that that is doing you know it's it's showing improvement on reasoning.

So I hesitate to say that a model either does or does not do reasoning because to me, it's not a bullying. It's really a quantitative measure and uh you can see you know progress on this measure uh, over time um now this is more about the the what of IT you know what does the model do in order to be uh in order for you to be doing reasoning. There's also to the question of how and I think that it's really useful to separate the how consideration from this because I think that there are many, many different methods uh, for doing reasoning. We have seen some of them um in the recent past gain prominence, but there are other kids of methods as well.

Yeah, this is, this is so do you think that is, you know, like some people like human shows, ists and they say that it's not really reasoning the way we reason is is Better. John saw was a famous example of this and he he argued that there was some complicated physical causation that can't easily be replicated in a machine. But we can agree that you certainly in simulation that there's a myriad of different ways we can make something behaved as if it's it's reasoning.

What's the difference, right? So I think then again, it's one of those things that we have to decide on our criteria. So if you have a system that takes a very large amount of computer in order to solve a very difficult mathematical problem, is that reasoning right? So by your definition, I think that a that probably wouldn't count. Um but um I think that you know one could argue that um it's the end that matters that if you're able to solve this hard math problem then you are doing reasoning well.

I mean there's also the the right bust aspect of IT because um for example, you could simulate something that happened in the physical world and you could reproduce the trace and you could say, well, it's the same thing so why is IT not reasoning? But but the real phenomenon would be able to reason in many, many different situations.

And certainly with language models, when they are trained on specific types of reasoning, then of course they can generalize in in that setting. But there are these holes in the swiss cheese or all over the place. So there's always this problem. We have text contamination.

isn't that? yes. So I think that then you know you have just named one additional criterion uh, for a model to be doing reasoning, which is that you need to have this former robustness. I would add to IT going back to the previous point, uh, maybe computational budget is also a consideration if IT takes you know five thousand hours of computer on a massive cluster in order to solve a math problem.

Maybe it's not exactly doing reasoning either, right? But to me, again, you have to decide what your definitions are right, what your criteria are, and then you can talk about whether or not a system is making progress on those measures. Talking about a model is doing reasoning based on, you know, human like behavior to me that sort of a it's a bit to ambiguous for my taste. I would say .

yeah this chAllenge with anthropy ma ization that the existence proof we have, of course are humans doing IT. But um IT wouldn't be particularly good from an A I point of view if we have to create a similar lym of of of a human.

I I do believe though that there's some kind of spectrum of strength of of reasoning and IT feels to me that the language models that create these very below up circuits and it's not reusing the same circuits in different situations and we seem to have this incredible ability to um reflect these very parzon monist abstract motives and and some cognitive psychologists argue that there's something fundamental about these motifs. Maybe it's the way the universe works are certainly it's the way our cognition works. And that doesn't seem to be the way that language models work.

I completely agree. We have been arguing for um these sorts of more modular compositional models a for a long time and I think that construction is an essential capability of humans that modern language models lack. But I would say that that's going into the the hall of things.

You know, there are a certain set of goal posts that I want to be able to achieve certain tasks, right? To me, it's useful to separate that definition of what we are trying to do from how we are achieving IT, right, using modeler models in which there are explicit destruction mechanisms. And we are going to talk about examples of that uh from my work in any other folks work, i'm sure in the in going forward.

But I think that uh it's it's possible uh that um you know there may be some class of language models that comes up you know maybe with a different training objective, but that even though it's not explicitly uh dealing in uh dealing with modulator, it's able to nevertheless robustness right. I I don't know that such a possibility is just uh, not there. However, I would say that given all we have seen from recent a language models, there seems to be something fundamentally chAllenging in terms of robust and to me, you know mechanisms for distraction and modularity would be a uh uh a good bet uh to have uh a to achieve those goals.

Oh one from OpenAI is is IT reasoning.

Uh to me it's a it's a milestone in reasoning. There's a big head room ahead of IT. But uh, I definitely by measures uh that we just discussed you know performance in in programing tasks in solving math problems, IT seems to be substantially Better than prior models. So to me, it's a step towards models that .

can reason interesting. Are there any situations where I maybe we should first say in situations where IT is doing reasoning, which means it's it's fitting your criteria. I I think it's fair to say that yes, yes, it's it's reasoning. And then there's the space of situations where perhaps IT IT doesn't reason for many reasons, possibly because of computational reasons that might be trying to solve an N P hard problem and IT just simply runs out of context um or IT might be trying to um do reasoning in a damn where IT hasn't been in a trained on, for example, so that there are limitations. But you think in principle of those limitations could be overcome.

I think that um o an augmented with a lot of other things could make substantial steps towards overcoming those limitations. okay. Now there are a few things here that I have left ambiguous. And to me that's where the real questions are, that water those extra uh, pieces that need to be added to something like a one in order to achieve.

Again, I wouldn't say does reasoning in a Better performance at reasoning tasks, right? And to me there are um clear candidates like for example, having more grounding mechanisms, uh, using more powerful symbolic tools. But this is speculative and and uh over the next few years, I think we are going to see if those methods actually work.

It's interesting to to see the shift towards newest symbolic. So um a lot of focus in the kind of connections ism space and on at a party, for example, spoke about software two point. There's this idea and hinton as well.

There's this idea that language models can do everything, that they can do this emergent symbolic and reasoning, and we can train them into end. And what we're starting to see now, things like after the geometry and alpha proof. And so are these hybrid systems that, that kind of and generally as well, I I think that the way to our business is building specific architectures to do specific things. Well, I think having these completely general blank slate things doesn't work very, very little. But are are you seeing that trend as well?

Yes, absolutely many of us have been arguing for neurosis bolic methods for a while now. But uh, increasingly I feel that this term is is uh unnecessary because IT seems to me that neuro symbolic methods are really increasingly everywhere. Uh, you mentioned alpa proof that a great example.

But I would say that even these agents that are generating some code using language model and uh then executing the code and then getting feedback and then using that to make further decisions, they are really doing something neuro symbolic as well, right? Because you are actually using a biton interpreter or which is a which is a symbolic tool. Now that said, to me, the more interesting question is, again, not the world itself, but a precise form that these kinds of combinations are going to take.

And we have seen some examples of this as for example, in alpa proof, we have a machine learning model, a neutral model that is getting feedback from up a lean interpreter uh or a lean prover right and and that's providing IT a very solid form of grounding. Um so I think that we are going to see this trend in other farms as well in broader settings. I don't know that IT has to be extremely domain specific. There is a world where you can imagine in a code execution and formal proofs basically informing all sorts of decision making in the world, with a purely neutral models serving as the as the layer that goes between human language and sort of messy signals uh to this kind of reasoning mechanisms.

Can reasoning exist without axions?

That's a great question. Um I think that again, that depends on what we mean by reasoning. When we are talking about mathematical reasoning, the entire objective is that you are going from certain asians to certain kinds of conclusions.

So so mathematical reasoning as we understand IT doesn't really make sense uh, without exim what about something like uh, physical reasoning? So there are everything that we consider to be physical reasoning IT depends on us being in a certain place with certain laws of physics. So to me um you know the you can imagine reasoning happening in a world where those laws are different. But even then you have x ims. It's just that you have different axioms.

Can we combine together reasoning systems?

absolutely. We in fact do IT all the time we combine mathematical reasoning systems with the, for example, you know experimental empirical reasoning methods that people losing the natural uh, sciences all the times when we are doing any kind of sciences c discovery, right? And I can imagine that in the future we could have many different reasoning systems uh with different kinds of capabilities and they all come together in in um something that uh bigger than the sum of its parts .

out for zero. Or do you think that reasoning?

I think again, IT is IT is definitely a powerful system that makes highly non trivial progress on a task that traditionally considered to be reasoning. So to me, you know, it's it's sort of a behavior oris definition, right? If IT looks like like a reasoning system, IT does well on a reasoning task, then it's doing some kind of reasoning.

Okay, if if IT looks like a duck, cracks like a ducks. What is neuro symbolic programing?

Neurotic symbolic programing is this approach to artificial intelligence, where you are representing models up using a mix of neutral networks and classical symbolic code.

Uh, so you could imagine, for example, you know a program in which you have, uh, some neutral models that are making decisions about a perception, let's say, uh, whether or not do you know a certain kind of object appears in a scene and then you have some some vectors coming out of that and then you have some symp c methods that are processing those factors. Um a lot of the times this makes sense if you have prior knowledge about uh, what a model should look like. Uh, so then you can, you know, imagine using the symbolic code to basically encode some of those players.

You could also imagine, uh, new a symbolic models being used when interpreting hope is a concern. So for example, you are trying to understand a certain biological process. Now you may get this perfect black box neural predictor, except the scientists cannot really make sense of IT and use IT to guide experiments but if you have a more mechanistic model that is uh quoted up in the form of um in the sort of notation scientists traditionally use and to me all of those are just you know programs, physical situations and and you know definitions of dynamics systems and so on, these are all just programs of various starts. So if you have a model that is you know represented mechanistically like that as as a sort of program, but may with certain neural pieces for parts that just cannot be easily symbol ally represented, so that would be another uh, potential application of .

this yet that seems to be even I guess it's quite ambiguous term. But for example, france was a talks about discrete program searching a neural network to guide the generation of programs um in in your work for example you could use IT to represent functions or even some kind of post hoc validation type process. Ensure things in god rose um and there are many, many more kind of instantiation. So so I guess it's it's quite an ambiguous thing.

Uh IT is so when we defined the term neurosis mbolo lic programing, we wanted to be very precise yes but um the word neurotic mb lic is again, as we discuss earlier, it's being used in lots of different farms uh, to the extent that I would say has also kind of lost meaning, yes.

But when we talk about neuro symbolic programing, we mean a very specific thing, which is I have A, I have a programing language in which I can represent these hybrid models. I have combinations of of symbolic called a neural networks, and then we have somebody of methods that are being used to discover those programs. And these methods may be partly neutral, partly symbolic. Typically there there are mix up to .

so um many fans of the show would be familiar with the dream coder paper by Kevin Allison and a with as I as I recall correctly, there was a demain specific language and I had you know like a waking phase and they had this neural guided search and they had an abstraction sleep and and so on. So it's related to that .

idea absolutely. yeah. So um I would say that there are a few categories of methods uh that have come out in the last few years in this space.

And one big chAllenges, library learning, where you are not just working with a fixed programing language and trying to find program in that language, but you are actually doing explicit abstrusities. You are trying to discover new primitives, uh, that are then added to the programing language. And dream quoter falls in this category.

In dream quoter, you are starting with an extremely low of language, the lama calculus, and then progressively you are discovering bigger and bigger um modules which are uh abstractions really you're seeing patterns in these really low level programs and then you're abstracting them into these modules and then you are using them right now. Um the programing language that they had over there is purely symbolic, but one can easily generalize IT. And there is in fact some work that happen more recently where uh that's been done.

Um so you could imagine then these are programs that are mixtures of a neural network s Sandy sorts of lana calculus terms, right? And then the method that they used for discovering these library functions and also, uh, just know, do program sentences search for programs. Those were all new all over there. One could imagine, you know, modernizing them a it's recent enough, but you know few years back, uh, you could imagine bringing them to uh sort of twenty twenty four adapting them to 224 deep learning by using large language models to do the search of our programs and and destruction。

interesting. So um so dream coder was not using language models and and had a hard code D S S L. And one thing that interest me is what the primitives are. So for example, we we could search the space of turning machines.

We could use land to calculus week, I think just turn by as a big fan of cognitive psychology, which is that we have these native fundamental prize of cognition. Therefore, we should search those prize. And presumably IT was trained, uh, to do a particular task.

You could train the arc chAllenge. So then what is kind of doing is IT saying, well, what's composition of these seated prize in my dsl and produce skill programs that you know would work for this particular thing? So so like there's a question of where where do you start and to what extent is the task you're training IT on? Kind of does that drains your influence, the the library that you learn.

right? So to me, uh, the prior question is open. I think that humans may well have A A set of cognitive players, but we don't necessarily know what they are. What is however true is that the presence of players makes your ability to solve the program synthesis problem of the learning library learning problem, uh a lot Better.

Um and um by that I mean that if you have already a reasonable set of primitives that their new programing language, you won't have to discover everything from scratch. That said, I can imagine a world where you are starting with an extremely low of a language without any sort of buyers. And you are, you just have these basic Operations like composition and a, then you are just discovering the entire universe, programming of mathematics, of physics, of what have you does IT matter.

If we can understand the programs that that are generated. This will be a theme in in this discussion because of course, you're work in they're improving as well as talking about how um essentially mathematicians don't do the stuff themselves anymore. We are leaning on computers to searched this huge space. But but just with this degree program search thing, um we might be creating programs that are still you know lots of abstractions and and lots of interesting efficiencies in, but they seem quite incredible.

That's right. So I think that we should distinguish between the interpretation of question and the capability question. To me, there is potential value in international ability, definitely in certain application domaines like scientific discovery. We really care about interpretation that, uh, there is also a potential safety and security kind of fangs and and we are going to discuss that later. But I think that is useful to separate that concern from the concern of.

Just a, what would you take to build models that are really robust, stand and capable? And so I think that there is a universe where you have, again, an extremely law of a language, a sort of assembly language. A and I consider the land of calculus to be something like that.

And then you just let the machine figure out over time what kinds of obstructions are useful. And to me, that could have a lot of value because you could discover things that no human would ever imagine, right? And that would enable potentially to do new kinds of math news, kinds of physics, new kinds of ology.

And so I don't rule out that possibility uh, at all. Uh, I think this idea of obstruction would be valuable even in that regime because what that will mean is that you are going to, you know, discover these higher level building blocks. Yes, you know, no human has figured that out yet, but but this A I has.

But then you are using this building block to build parSimonious programs, models, uh, that can do amazing things. And then you are just book strapping and your building of the single entirely new universe of mathematics cent and competition, right? Um I can imagine such a world.

So can you tell us about your work in the area? So so so you've created this architecture that that uses elements and does a type directed search and learning search.

And so I can can you tell us about right? So I have a series of efforts in this space. Um I back in the day did a lot of work on just symbolic methods for programme sentences and uh set up there is that you have a programing language, you have a programing task that is defined leader by at the time IT was either logical constraint or a set of examples and then you are defining a search problem, find me a program fits the the task specification. And so the symbolic methods had some scalability chAllenges.

And then when deep learning came about, you know, we all moved uh, in that direction and this sort of neutrally guided program search, this became became A A A very powerful approach. Uh, now uh, some of the methods there, they combine uh these older symbolic ideas with the deep learning ideas. So for example, we had these results on, uh, type directed neural programmer synthesis where you have a you are generating programs in a strongly type language and the type system of the language serves as a pruning mechanism as well as a guidance mechanism.

So IT rules out programs that are obviously not going to be reasonable. And uh also health us focus on certain kinds of sub goals. Now if you think of what's going on in the world of they are improving, uh, it's actually very closed related.

Uh, so leaning is really a functional programing language with a really powerful type system and um so when you are, for example, using feedback from the three improver to rule out certain kinds of steps in your in your math proof search, um you are really using a version of this idea right uh but we have been thinking about that for for the violent, the broader programme sentence, this community that came out of the programme languages for they were thinking about this problem. Okay, so um then um this is still in the context stuff. You have a fixed programing language and you are looking for programme on that language now uh, there are uh other approaches that that are relevant tier。

So we had, for example, uh these results on using neural relax session. So there is the paper we have LED by I am shah and uh with is song you uh who's my long term collaborator and and Jennifer son um and my farmer student have in a farma uh so this was on using neural relaxation of of of programs up as heroics for doing a search of her programs. So the idea is this, suppose you are doing a search of our programs and you are progressively filling up these uh, incomplete program.

So you start with nothing just in just a cord with a big piece of, uh, a big hole in IT, and then you are progressively adding more and more code. So one chAllenge here was that because you're always working with these incomplete programs, how do you know whether or not a certain uh, decision is valuable, right? As then you know, should I keep a certain incomplete program in the in the pool or should I throw IT out? And the idea that was that, you know, in that whole, you are going to, uh, so there is a space of possible ways to to complete that program.

So there are a lot of symbolic, discrete terms that you can you can plug into that that hole. But instead of that, what you do is that you stick a neural network in there and there is this kind of a subset relationship. So there is so because these neutral networks are more expressive than any symbolic program in your programing language, uh, you are going to have the property that um the if you train the neural length ork enough, you are going to get a loss that is Better than whatever you would get with any sort of symbolic program.

And this leads to a an argument that you can basically use the the the the loss that you get after you have plugged this new al for into that hole, uh, you can use that as a lower bound on the cost to go. That is the the uh, best possible thing you can do from that point on using a symbolic program. And this can in turn be used to guide uh hurried c search you know this sort of classic search algorithms like a star and um and branch and bound plus you know interactive deepening search and and soon so that this is whole su of algorithms uh from go fight back in the day right now you're imagining accelerating these sorts of algorithms with these neutral relaxations.

Um so this sort of idea has been there. So in general, we have looked a lot at this uh, question of, you know, can I use a neutral network as a relaxation, a sort of continuous approximation of a discrete program and to what you can report such a relax. And we have looked at a few versions of this other uh more recent efforts. So we have this uh recent paper on um library learning in the context of symbolic progression with with dilip.

And here we are doing a lot of the same sorts of things, uh you know searching for programs uh doing obstruction, but this time using alembic and um in particular and elm can be a really good way of um guiding even if you're doing this sort of traditional search of her programs are for example using a genetic Albertha more using you know some sort of type directed animation. You can still imagine you know an LLM serving as the neural network that guide such a search. But on top of that you could also imagine using an l an as a tool for obstruction of the sort that we were talking about a little bit ego.

So for example, you know, in dream quarter, the way obstruction would happen is that you, uh, you know, look at a set of programs that did well, and then you find synthetic substructures that show up in those programs again and again, right? And then you would say that, okay, this is a module that I can potentially reuse, and that then becomes part of my library. The issue is that this is, this process can be difficult because identifying these sorts of common substructures in a large pool of programs can be very chAllenging.

And there are these symbolic methods that produce for this purpose. But you could imagine not doing that and instead just using an alm. What could that look like? Well, you should read this paper which is LED by ara gari and a server a single um and it's with miles crime crime mr. And omar cus uh, appearing at europe this year.

But the basic idea is that you are going to use an element as the obstructor so you are going to Oscar, what are some of the common patterns that are showing up in these programs and you're going to use its natural language reasoning, understand it's a lot of time, but its ability to at least a, you know find patterns in syntax is in the syntax programs. Uh, you are going and and explain that in natural language you are going to use that capability to do the obstruction, right? And that's how you're going to learn a concept.

I'm a big fan of a mouse, a criminal, that he wrote a landmark paper on symbolic regression that's right a few years ago. And I I guess that is that another form of new symbolic programing that and and also i've heard people talk about doing symbolic, the composition of language models. So this idea that know you, you start with this kind of block of clay. You do the casket gradient descent and then we we tease out some kind of symbolic structure from its fascinating absolutely.

So uh going to your first question first, uh, symbolic progression I see as a former program synthesis. Yes, really what you're trying to do there is that you're given a data set. You are trying to find the uh, and expression right that that fits the data said well. But an expression is just a special kind of programs.

So some of the methods that we have talked about, for example, this this neural admissible heuristics that was done in the symbolic progression setting, that you are trying to find a program that fits a data, said, well, right? But now one can imagine, you know, going beyond the sort of genetic programming approaches, or, uh, these approaches of distillation, very first trained neural network, and then distilled into A A A symbolic program, uh, you can go beyond these kinds of strategies and you can use an L M uh really heavily in order to do that, to solve that symbolic progression problem. 嗯， right.

And and so what are the ways in which that can be used? One angle is that you could use an elem to basically do the search of our programs or guide the search of our programs are expressions in particular. These lets that we are talking about what evolutionary research.

So you could imagine using an L. M. To do the cross sof her or the musician of the programs that are there in your pool. And you can also use an alem to to do the attraction, which is that you could ask the L M. To to come up with some of the concepts that are showing up inside the hyper forming programs, right? And then those concepts become A A condition under which they they basically conditioned the search of her, our programs at that point.

So I spoke with him a sba row combat. I very recently, he's got this L M module architecture, you know? So he said, l ms.

Are very creative. So now we can kind of like traverse this space and then we stick a verify on the end. So we're actually we can kind of Cherry pick when they actually gets IT correct.

I had a bit of disagreement with him because certainly when we let the alliance and find creative things, when when we are doing, let's say, LLM guided search and when we let the elms do abstraction, I feel that they are still somewhat parasitic on the data, that they are trained dom, which means there is a kind of creativity gap. We seem to have access to a source of creativity, which is richer. And and then there's the thing where we building up this library, and at some point, does the weight of the library kind of affect our creative ability?

That's a fantastic question. Uh, so one thing I would add is that in settings like symbolic progression, you have the advantage that you have grounding provided by the the data. At the end of the day, what you're trying to do is you have seen some empirical you made some empirical observations and you are trying to find expressions that fit dos those observations, right?

So now when you have a new candidate that is creatively constructed by the elem, you could imagine actually evaluating that on your data set and seeing how well IT does, right? So that's something that prevents IT from going off the rails completely. Now um regarding the the prior knowledge question, yes absolutely I am are drawing on the prior knowledge that they have seen.

And if you're using alams in a purely black box, there a shut way, then, uh, you are only going to be constrained by what the alam has seen before. Now I would add that this is still a there's still a nontrivial value in that prior knowledge because when you are doing new signs, you are typically building on science that other people have done before. And the L.

M. Has presumably seen that in the form of various sorts of scientific papers and you know, math problems and and on. But that's IT. You know, you don't have to be stuck to the zero shot black box model.

You could imagine a loop where you come up with candidates creative using another room, you empirically evacuate these candidates, and then you use the the experience to basically find tune model further. And you could even imagine a universe where you are starting with an elephant that is completely blank slide. Maybe IT has some elementary knowledge of language, but not more science.

And then you are just progressively building IT up on all manner of scientific desks. And it's seeing, you know, how various kinds of hypotheses is performed. It's getting feedback from the real world right in the form of experiment. Maybe IT can even go ahead and do new experiments, right reason about uh casuality and and and determine water some of the new experiments to try out and then get feedback in terms of results of those experiments and then use that to to uh we can use that information to find unit further right so a lot more as possible than just using in an LLM and then uh as a black box and then just having a very fire um at the end.

I agree with that. There's always this discussion about creativity. I mean, dems has all by demis has always be spoke about the latter of creativity, with the inventive one being the the highest strength. And of course, you know, we can. We can imagine unicorns.

We can imagine things that we haven't seen in the real world because they still composed to primitives that are in the complex hole of of things that we understand, but in the in the context of a programme generation. So language models, they have so many degrees of freedom and stand, have natural languages, made up languages and programming languages. And so, and then we can do things like generating code, and we can use dcs.

And even even then, if if if we execute a bunch of these programs, they might not holt or they might not be valid. So there seems to be like another step. We need to actually check that these these are good programs to run.

That's right. And uh, by the way, you can get more forms of feedback than just whether or not this program and correctly on a few tests, you can do more serious kinds of analysis of programs as well. But at the end, um yes, so you are getting all these extra forms of feedback uh, from those external tools.

And then the question is what do you do with that, right? So you could just use that to decide whether or not to keep this program. Uh, but you could do more. Maybe you could go and go back to the the element and use this information to basically train IT further and guide the way rides programs.

And if you think of human programmer, that's how IT happens, right? So you write code and then you try out various sorts of things, and then you will see that certain kinds of programs work and certain kinds do not. You get feedback from your compiler, your type tracker, your interpreter, right? And then you use that to refine the way you called as well.

Yes, even with the halting problem, I mean, I I have some idea in my mind this is I don't know IT it's it's a quantic algorithm and it's got a thousand data points. It's still running after ninety seconds probably that something wrong here control sea, go back to about .

yeah and you know there is a huge body of work on a solving the halting problem, even though in principal, this is an undecidable problem. In practice, this literature on proving termination of programs the way IT works is that IT recognizes that proving termination of a program is really about coming up with an inductive argument.

So what you're showing is there is some kind of a an expression whose value goes down strictly each time you execute one step of the program. So if you think of a loop, right? So you have written a loop where IT says that while I is less than an, you know, something, something and then I gets implemented, you know that this program is going to terminate a, you know, in spite of alan turing.

The the reason is that what's going on is something very simple. Each step of the loop, you are incremental this eye, and you can only go up so far, right? And so there is this value, which is n minus AI, which is strickly going down in every step.

And IT can only go down so far. IT cannot go down below zero because zero or body in the lower bond you have said because of the way your program is structured. And so this kind of a reasoning that you are doing here, right this is an example of an inductive proof, uh, except applied to programs. And there is a lot of work on automatically discovering these sorts of inductive proofs, which are really in a very simple symbolic expressions, right? And you could imagine using an L, M or other kinds of machine learning techniques to search for these sorts of arguments.

Couple of points on that. Maybe you should bring in turing, famous proof, because proved by contradiction, didn't know about the holding problem but but also if I understand correctly, you're pointing to specific instances where you can do proof by induction that the program would would how would you expect there are are still a space of programs that that .

you wouldn't be able to do that? Um so I didn't say anything about automatically discovering those uh, inductive proofs, right? So so my point is that far of vast number of practical programs, in fact most loops that people write or most recent that people write the argument for why these programs are going to termine IT is very simple and you can search for those kinds of uh proofs um using various sorts of methods, even just basic animation works quite well.

But you could imagine you know using more contemporary uh machine learning techniques uh, to solve this problem. But you cannot possibly have a method that is going to be automated and it's going to sort of uh, it's going to always termine IT and it's going to work for all programs. no.

But the question is that you know for realistic programs, uh, will the smoke or not? And we have every evidence that for realistic programs that people write, uh, you could just do one of these things, you could try some of these strategies to prove termination. And if that doesn't work, you just say that, okay, no, there is a problem here. Just go fix IT, right? And maybe there is some reason why the program would still work, but whatever you know, you can just ignore that program and find a different .

way of solving the problem. So my co house doctor, he's really interested in in turing machines, and he says, you know, language models there and we're not talking about LLM systems, we just a language model. He said they are final state autometer and um a turing machine still what has a final day autometer as a controller for these? And he said, you know that there's a space of algorithms in the fsa class.

He would actually call these turing machine algorithms, even though they are for a final day autometer that cannot be searched, discovered with the casket grade index because of this holding pom. Now I think you were pointing to earliest of interesting research has been done in this space. I mean, he would be fascinated in that, right? right?

yes. Uh, so there's been a whole lot of work on using gradient descent to solve program emphases. And to me, searching for during machine, it's just a formal program synthesis is it's just the programing language is extremely low level IT is the um that every program has you know access to this one counter ah that the tape and then um and then also a controller which is a finance state machine, right? So to me this is a very simple kind of programing language.

Um IT was useful in mathematical proofs in the theory of competition because I was so simple, right but I don't think that there is anything special level touring machines from the point of view of uh programing, right? Anything any reasoning that you can do about touring machines. You can also make the same kinds of arguments about other broader class class of programs.

Now there has been a lot of prior work on doing gradient descent to find programs and these not been very successful for the reasons that you are leading to yeah and that's why people have historical a wanted to use neural networks in a sort of guiding role that you are still going to have some kind of a direct search going on. And then you would um you uh use a neural network to guide that search process in various ways. And I would say that the use of neural networks and dream quoter is an example of this, but also the use of aliens to write programs or guide searches of her programs that also those are also examples .

of this interesting. So um true machines have this amazing ability to expand their memory. So don't we don't need to train the thing again from scratch. We can just say I I need to I need to get some more data. I need to get some more data to out to my memory.

And people say, well, um you take a language model and you can train IT to do rags or even during its training process, presumably it's actually retrieving things from from a database or you could train IT to do code execution and then you could say, what would aren't we basically searching, turning space because I can, I can say, generate me a program that computes the ends dig of pie h, which is something that I has an arbitrary n time and and and then I can run that on a door and then I can pass IT back in. But IT IT feels to me that there is a difference in kind, right? But but what is the difference? What's the difference between doing that and actually having a machine that concerts during space natively OK?

I'm trying to understand so um you could imagine having the definition of the of the turing machine programs as as just A D S right? You could define a programing language that's where each program is a touring machine, right? So to me this is this is just what you would do if you open a theory of competition text book, if you go to the definition of touring machines, it's just a programme language, right? There is a syntax, there is a santis.

So now what is the difference between training and l to generate programs in this language? And um any other kind of program syntheses that L E S already do, right? I don't think there is a fundamental difference um between you know turing wishing generation and python code generation. Now I think what you are describing though is something slightly different, which is that the process that is being used for this generation, you know, is that process a finite state process, or is that process something more like a doing machine? And that, I think, is a very interesting question, right? So what you're really saying that instead of having a language model, which is just Operating, which is the strain in this particular way that is just predicting the next uh, token and uh and yes, maybe I can retrieve us a bounded amount of information maybe you know, instead of having a massively scaled up LLM, uh massively scaled up or gressier bottle, he could have a massively scaled up touring machine that is doing all sorts of generation, right? And I think that's a very exciting idea.

Ah yeah exactly. And the thing that I took me a while to get my head around is the the controller in a turing machine is a finance automatic so that there exists a set of weights in a language model that would allow um you know to to expand its memory with some push stand automatic or something like that. So we just can't find IT with sarcastic grade the sent and um the the other thing is, yeah I completely agree that we could in principle imagine we had a huge data set of internet scale of turing machines and descriptions of because I think the description is important because what's the point in having a turing machine language model that we can't program in in anyways? There needs to be some kind of intelligible mapping to something that that we could use to instruct.

right? So again though, I would make a distinction between the what is generated and how you are generating IT. So there are two different possibility. So one is that you use a language model, whether IT d be a turning machine language model or a an r an language model law, and you know, uh, a transformer language model, you're using some sort of a language model to generate programs in the turing machine language. So that is fortune.

The version two is one where you're making the specific point that this language model itself is a tooling machine that IT goes IT has the ability to you know access this unbounded memory and IT goes and you know looks up stuff, uh uh from from this this infinite tape, this unwounded memory. And that is something that is a very interesting proposition. My one hesitation there would be that we are finding out that the architecture may not matter that much after all.

Uh, what really matters is the scale. So there are all these new results about, you know, states, space models and and a highly scaled up our ends. And we are seeing that many of these tasks for which transformer seemed especially capable um are also being done by these extremely scaled up ordinance, right?

So I think that you know IT is possible that this touring machine architecture is going to need, do you know, amazing new capabilities. But I would probably a way to see some more evidence on that. But that, that is definitely a very interesting idea .

to explore and that there's always the objection of, well, for all practical purposes, IT doesn't matter either because we can scale at the models, and i'm very excited about in a minute and state space models and and member in all of these things because they can allow us to possibly tex our model size. And then there's always the question of, well, at at some points, the models are gonna be big enough to do all of the the reasonable types of computation that we want to do. But the counter argument that is, you know we we sometimes spend a lot of time pondering problems and you know the the linux Operating system, he that always gives us as an example that that we've had to design and see really complicated artifacts to reach the level of the technological complexity and and maturity that that we have now. And IT seems out of the reach of of these models.

even if we hundred them. That's right. That's right. yes. So I think that um I am a big believer in having algorithms that are that go beyond just the the auto regressive language model paradise。 And that's why we were talking about these methods where you have these external tools like you know code interpreter or or or you know a lean proper uh or turn proves like clean and uh using feedback from them to do something uh, more than what we could do with other regressive models.

But to me, the power of these methods come not from the internal architecture of the language model, but more in the higher level architecture of how these different kinds of rounding mechanisms work together with neural networks. right? And I think that that a is is possible, that a different neutral implementation of of elements is going to give you drastically Better results.

I may be a little bit skeptical about that is something I would like to see more evidence of. But what to me is very clear is that if you have these sorts of higher level uh architectures where you are combining newer networks of whatever thought right with these kinds of um symbolic grounding mechanisms um executing in the real world, human feedback, uh, all of these things h you are going to get something Better than what we have today. And to me the chAllenge then is how do you pass back feedback from these various sorts of grounding mechanisms back to the neural net? K, and maybe when you are doing so, certain kinds of neutral architecture are going to, uh, be a lot Better. But that is something I am still kind of on the fence about.

I really agree with with what you're I mean, one thing I noticed is right now, certainly building real world applications, the way to overcome this complexity curse is with architecture and process. And it's the only way to to row versify these things. Possibly the only trade off though is the more engineering and architecture you put into a problem, you're kind of road butifer IT for that problem, but you're also somehow taking away some of its generality and other problems, right?

So this is why I would ideally like to see something quite simple um an architecture that is quite simple and and domain general but IT seems to me that there are a few examples of this so you know let's take mathematics so to me mathematics is uh and and a very useful model for a doing a lot of different kinds of kinds of solving a lot of different kinds of problems.

Um so for example, if you're doing physics, if you're doing biology, if you're doing programing, mathematics is useful even for day to day logical reasoning, mathematics is useful. And so I could imagine that a the use of a uh a tool that can give you rigorous feedback about um the mathematical soundness of your models of the world. Uh to me that makes a lot of sense.

And so I would probably not consider that a restriction on the kinds of uh approaches that uh on on the kinds of domaines that you can apply your methods. Likewise, code execution to me, court, is a very general a model, a way of representing models of the world. And programs, to me, always combat semantics.

So being able to execute programs and also analyze them, and in a reason about sort of four case behavior sense on this, is also something that I don't think imposes a massive restriction. So to me, the idea would be that you would have these sorts of grounding mechanisms that are fairly general, and then you would have a flexible, uh, way of composing them together. Uh, you wouldn't impose, you know you wouldn't hard code a lot of things.

Um you would maybe hard court when you are deploying these methods in particular applications, but the framework should be perfectly general where you can write out you know new kinds of complex compositions of, say, lan proves and court execution and neutral prediction and maybe some natural language processing and vision as well. Why not right? So you would allow compositions of all of these different kinds of modules into new kinds of models and then you would know maybe, uh an A I system just go and try to find the right composition.

Can you tell us about the laser architecture?

Absolutely, the laser approach. The goal of this approach is to to um use element s to drive a search of programs and also to come up with obstructions that explain what is going on in hyper farming programs. This was done in the context of symbolic regression, where the set up is that you have a, you have a data set and you have a language of programs or expressions, and you are trying to find a hy performing expression.

Now, prior approaches here, they used evolution research, and there is also some work that uses um elements to basically direct some parts of that search process. But what we were really expLoring is that can be use elements to come up with high level explanations or high level concepts that shop in the hyper forming programs. And so you could imagine a process where, you know, evolution research is coming up with a pool of candidates, and you are actually evaluating these candidates in using the data that you have.

And so this serves as a grounding mechanism. Now you have a bunch of hy performing candidates left. Now you're using the elem to explain what are some of the common teams that are showing up in these these hy performing programs. And then you are remembering those themes, those concepts, and then using those to drive the search as well. right?

So an approach like this, one big appeal is that even if you are not using all of the you know the scientific prior knowledge, uh, that the LLM has just the ability of l ms to abstract pieces of text into higher level concepts, that is something that is still very helpful. So uh, we deployed this approach in um the problem of finding rediscovering a bunch standard equations. So specifically equations from the all the equations in the final lecture on on physics. This is dust that was proposed in this A I findon paper by max dig american and others. And in addition, we also evaluated the system on some synthetic benchMarks, and we also used that to do a new discovery of an actual an elem scaling law.

Oh, tell us.

yes, absolutely. Uh, so, uh, here we wanted to go beyond the chinchilla law. We wanted to come up with a new law. Uh just a using data on elements.

So you have you know information about uh for example, number of parameters, the size of the data set um and you know how many shots were used in in a in training and and song so you have a bunch of these parameters and then you have you have also data set charteris tics right and then you have performance and so now you could ask this question that, uh, can I come up with a can I discover and a symbolic law that is going to um explain the alarm behavior? Now what we found was that we have this, uh we got this law that was a little bit different from the chinchilla law, and that was definitely objective, that we wanted to find something that is different but yet as explantatory. And yes, for more details, please see the paper.

But in addition to that, we also evaluated the system on completely synthetic tasks. And they also be found that um this concept learning idea made some different. So even if you ignore the L E M S prior understanding of the problem domain, which is not something I would argue you should do, because at the end of the day, in real world applications, you are on preexisting knowledge. But even if you ignore that a bit, you still have a another way to use l ams, which is as a tool far coming up with attractions.

Yes, i'm also inspired by melloni met. She's done some work in in in this area. There seems to be something really interesting about iteratively abstracting, which is to say that there's almost some fixed outer loop and you're going up and you're going up and you're going up. That seems that tell me about that.

A that would make a lot of sense of the way we are doing. IT right now in laser is a little bit less sophisticated. We have our abstractions described in just natural language, but you could imagine a world where you have these obstructions represented in a more symbolic, programmatic farmers as well. And now as time goes by, your exactions, you are deliberately designing these abstractions to be higher and higher level. Uh, that is not something that we have done, but I would say that that is something that present in dream quoter because dream quoter you uh do have once you discover some library modules, you can write new programs using those library modules and then you can abstract them further, right? So this obstruction process IT is naturally directed towards discovering higher and higher level modules.

And could you make a quick comment as well that this is to me, is so exciting. IT is not currently sort of in any dream code of example isn't sota on on the art chAllenge, but I have A A A deep kind of I can feel IT in my bones that it's the right way to go. And unfortunately, because of the softer chasing and so on, there isn't enough attention being being paid to this really, really innovative work in your reflections .

on I think that sa chasing is is a really unfortunate trend. Um IT is also I think um it's a misallocation of resources. There are companies that are well suit to chase sing sota because now you need both exploration and exploitation ation and uh chaisson sota would be exploitation, right? You have a well defined approach now you're scaling IT.

You are you are um solving new problems that that you couldn't do before. great. But I think there is also this exploration direction that is very important.

And unfortunately, so zing is a is hurting that a bit. But at the same time, I would say that there are a lot of folks that are that are coming up with new interesting algorithms. Um what needs to perhaps happen is more often are from the explosion itself to the exploitation.

Uh, so h in other words, you know, let's say that you have this this algorithm like dream quarter. How do you scale that up? okay. So our laser method is one way of scaling IT up a bit a by using lamps. But maybe you know there are other approaches as well.

And also there is the question of how can you connect this to run this on very, very large amounts of compute and and then, uh, can we see what happens? Now this is something that, uh, we haven't done yet, but that's that's a direction that we are definitely very interested in. We are speaking with various are partners uh to see if there is any way of scale in this up fosti um and we'll be very excited to know what happens um uh once we do that.

you've worked extensively on mathematical discovery and I think you made the comment in one of your papers that you think we might have an A I copilot by twenty twenty six. What makes you think .

that that was actually a point made by a talent tower?

Oh my apology.

my greatly who I greatly respect um and I actually agree with that. So I think that we are in an extremely exciting time in a for math. And I would imagine that um getting an E I as a significant contributor, although not the you know the soul laughter h you know math paper a or even a computer science paper with significant and mathematical components should be should be possible uh, in the north of distant future.

Now okay, so what do we really mean by that? Um so you know if you think of the way mathematics is done and even actually what happens in political computer science, you have these definitions uh that are pretty rigged but ultimately uh and and you have proves uh but ultimately the purpose of this is to convince human reviews and human readers right and so what that means is that you can make mistakes. There are corner cases that you can you can uh forget and your proofs may be broken as a result of that.

So this is one possible risk, uh, from the way in the way mathematics S S done today. The other thing is, is just scaling up, right, even if you don't think that there is anything fundamentally wrong with the way math is is done today, there is an opportunity a to scale up mathematics using competition tools. Uh, that was just not possible until now.

So what I mean that is that if you think of, you know, a very large software system, if you think of, say, the linux cornel or the google cloud infrastructure, these are extremely complex systems that no individual human can necessarily fully understand. People understand these systems at the high level of obstruction, and maybe they no, you know, all the low level details of specific components. But I would say that there is really no one or actually is very few people who have a detailed view of the of the entire uh system, uh, all the way.

Now if you think of math proofs, historically, math has been very oriented towards these individual contributors. So when you are thinking about a mathematician and is somebody who you know, they produce a proof and this all the details of the proof they they know right um and and they have produced themselves or at least they have understood uh those details. But imagine a world where you have these extremely complex mathematical problems where uh, maybe no, no human is individual human is able to understand all of the details, but you still have the guarantee that the overall proof can be broken down into these species that are more manageable by humans, right? Just the way we write code.

Could we do that for mathematics? What would that look like? I think that that's a tremendous opportunity that these systems like lean provide.

And now A I naturally comes in just the way A I copilot have made IT much easier to write code. You could imagine having a copilot, uh, writing math proof as well. Now if all of that is possible.

So we'll have maybe new kinds of mathematics, new moralities of collaboration between humans. And this is something territory has uh extensively talked about. Um and then A I driving this process, making us ten x more productive. Would I say that that would lead to you know newspapers with the with A I contributors? I would say yes no you know maybe we wouldn't. This is a quarter necessarily uh because um just like we don't in list our calculators or computers as authors, uh, we probably would just accept that okay there is the stool over here and they did a bunch of things but but there is still human creativity at the highlevel and those are going to video authors. But I would say that the A I would definitely still uh do a lot of things that today a math great student has to do.

interesting. So we are seeing, I guess, a lot of people think about mathematical theory as being quite parzon oneac units that an individual mathematician or or a few mathematicians have have worked out and understood. And we, I guess not until recently, thought about what would that look like for this to scale massively. And so yeah, we need to convince the review is and you are saying that the review is even though it's increasingly inscrutable, able because it's so large, the reviewers would be more inclined to trust IT if IT were found .

by a computer. So I would even question why you need a reviewer to understand every single detail of the proof, right? Maybe the reviewer needs to understand the high level components of the proof.

And then just a having assurance that the lower bits of the proof are done correctly, just like when you are reading code you don't necessarily think about, okay, here is a system call that this code made is this system called implemented correctly at the assembly level. You don't reason about that is something that you assume that the linux kernel implementors has properly figured out, right? So in the same boy, one could imagine in proofs where the proof is at the highlevel of attraction.

And then there are these lower level building blocks that you are you are, uh, taking for granted. Now also one could imagine, you know, just like when you're building a big system in software, you break up the system into pieces, you pass these responsibility, ie s onto different folks. And there is an interface.

Where things are supposed to um check out and so long as they do uh and there are tests that pass um at the unit level and end to end, uh you just accept that this system is okay, right? So likewise, you know maybe for mathematical proofs, we should also have the same sort of the composition you would have this you a big complex proof. Let's say we are talking about farmers last term, which uh took an individual human a very long time to prove right.

But maybe now you're going to come up with the compositions, you are going to have a team of a thousand mathematicians that going look at the pieces. And then there is some way of assuring that, assuming these pieces are done correctly, everybody fulfills their local obligation. The global goal of proving farmers. Last year, R M is going to be true.

So i'm fascinated by by this idea decomposing a larger hole into these local obligations. Let's look at the linux s Operating system. Do you think there is some kind of a bottle neck introduced by having these human intelligible interface modules at the local level? And what I mean by that is, I can imagine a future where language models are so powerful, they can, they can just regenerate the entire code all of the time.

But that's going to be a nightmare for checking in. Code isn't IT because, you know, john up just broken Jones component and this interfaces change in the schema over there has has changed. So do you think is always gonna an interesting kind of trade off where in order to have in interface that is consistent, we will need to almost like increase the complexity of of the local module, right?

So I think this goes back to this discussion of interpret ability and and capability. So I think that in principle, IT is possible to come up with A A linux cornel that is written entirely in the assembly code. And IT is IT is not modulate. IT doesn't have these sorts of human comprehensible interfaces, right? But yet IT is profit ly safe.

And by that I mean that you have these high level objectives that some set of humans have written down uh of water, the properties that the linux corner links to satisfy and then you know your A I goes and producers a massive blob of assembly language code. And then there is also a formal proof that the score is actually doing the right thing. right? So then why would we have any objection?

The issue to me is that number one, there is that kind of fair is at this point fantasy. We don't really have any AI that can produce you know large bodies of highly reliable code without any kind of human intervention except at the very top level. Love doing some specification um and also you know over time requirements are going to evolve and uh you may you want to have a path through which humans can be involved in the in the maintenance of this linux.

Cornel of yours, and I think that the same sorts of argument applied to mathematical proof as well, there is a world where all math gets done just via uh you push one button and then the math gets done. So the mathemagician comes up at the theory um they stayed the theory um and then they push a button and then your AI goes in and just comes up at the proof in the lowest possible level without any kind of any kind of uh, human interpretations. But there is a proof cheko, namely lets the land proof checks that has checked the proof for you, right? So I think that this is possible in principle.

In practice, we are very far away from that. So at least in the short run, we want to have proofs that are at least in uh interpretation, lack module boundaries and h then humans can be involved in designing the high level process and the machine takes care of the low level of details. But I could imagine a world where humans are less, less and less than world. Although you that would dress some questions about what's the point of this mathematics. You know, part of the reason for mathematics is that it's fun, teaches us something about the world. But you know, if the point of mathematics is that it's purely an instrumental, think that, uh, you are doing this mathematics so that you can predict whether or not, I don't know, a certain kind of chemical reaction is going to take place, then maybe that's something that you can allow the AI to do, just like today, use mathematical or or or mah lab, uh, to do a lot of that simulation.

So when what are some of the advancements in this area, and I I know you ve done a lot of working in this area, what what people look at.

right? So there are a few different directions. Uh the direction that the most well studied is still improving. And so here there are couple of versions.

One version is that you are starting with a formal statement of the theory um in a language like lean and then you are searching for a proof of this theory m so enough framework like lean a proof is just uh it's just like code is a sequence of instructions which are in that setting uh call tactics and so you are finding a sequence of tactics that's going to lead the goal to be proven, which is really like finding a program that's going to achieve a certain specification. And there's a lot of work on this. Um the most promising approaches.

They combine the use of a neutral network with execution in the um lan peronne and you are getting some feedback uh as to whether or not you know the decision that you that you applied let to do things or not and then you would use that feedback to to um do more generation. So that's sort of what's going on there. There is also a body of work on informal tear improving, where do you have natural language problem statements, and you are using, a lot of the time, purely informal methods, like, you know, this sort of variance of gene thought to to do math reasoning.

But sometimes you are also combining these two approaches. So there was a nice paper by shan wallop and is on uh doing using a natural language model to come up with the high level proof sketch and then using that proof sketch to drive a farmer approver. What was the name of that paper rafts sketch proof yeah and uh so that's one direction.

If you're improving formal tier improving and also informal tier improving, there are also a lot of other interesting questions which are less explored. One direction that is a really fascinating is sort of open ended exploration. Um if you think of uh, what's really needed in mathematics, uh, what can I contribute? One big opportunity is in coming up with Better conjectures creatively coming up with Better goals uh that uh you want to prove not just approves themselves but but yeah also the also the statements of the themes.

So that's a direction that I think is extremely important. But um unfortunately, there hasn't been that much on that so far. Uh but there are several research groups that are very interested in that problem as well.

And um and so for example, uh, there are these uh efforts where you know you have a so no a good men had this really uh nice recent paper on using intrinsic motivation to to drive A A prover uh so the idea there is that you are you have these two agent. So you have an agent that is coming up with new claims and you have a different agent that is trying to prove them. And so really, you know, you have this interplay between the process of conjecturing and the process of proving, which is really what happens in in mathematics if you think of IT.

So last night I read your corpora paper, which which was really, really interesting and I mean you already spoke about lean in high level terms, but maybe you could go to a bit more detail but um essentially IT was an algorithm to um you know you have a fear um and then you have all of these obligations and then the algorithm would um resolve using a language model these limits and then IT would then recursively kind of cool itself until IT ran out of obligations. And I guess the heuristic part was was using the language model to find an appropriate lemon from this lemon database and is a fascinating architecture. Can you tell us about that?

Uh sure. So ah one thing here that is uh in common with all of the other lean tear improving or formal tear improving uh efforts is that a there is this overall structure of a formal proof.

And the idea there is that you have a goal that you are starting out with at a top level period that you are starting out with, and then you are progressively applying these tactics, which are simplifying this goal into subjects, right? And after a while, assuming you have selected a good set of sequence of tactics, you have no more goal left to prove, and then you are done. Your team has been proved.

Q V, D, right? So this is just lean. This has nothing to do with A I now the question is that how can you use A I to select tactics in a smart way? And there uh the coral paper made a few contributions. One was that you are just using a large language model um without any kind of find tuning. You are just prompting IT to guest tactics and this is different from firework where uh folks used models that were explicit find tuned on uh, proof data, formal proof data.

Now what was sort of remarkable to us was that even just a black box model could predict reasonable tactics for for um this sort of formal proof if even though the language is fairly now, an important point though is that IT wasn't enough to just query the language model. You also had to do this L M module thing where you are getting the tactic predicted by the L M, and then you are using that to actually change the your executing that tactic in the underlying lean framework. And then you are getting back a new state, a new set of obligations to prove.

And then you are going forward with that, right? And the l can actually see what the current obligations are and also actually some some IT has some memory uh from the h from the past. And all of this is plugged into a bigger search, right? And the search process is doing backtracking, is recording what failed, what didn't work or what worked and what failed.

And all of that information is being used to guide the generation process of the elem. So that is one idea. And any other idea is that you are doing retrieval. Um now here the intuition is uh that when you're doing math, typically you are using definitions from other projects uh and you're using lamas from other projects and the idea of using these these sorts of external knowledge bases, this was used originally introduced in uh this model called reproval LED by ku Young who's now at. At meta, uh doing A I for math research and this reproval model uh was fine tuned.

But here what we wanted to explore was that could we just use a large language bottle in a few short way uh to uh do the same sort of thing. And we found that, yes, IT was able to use this retrieve knowledge a from external database. And so what that shows really is that in context, learning can go a long way, even in this sort of very eery setting of doing formal .

till improving. Yes, and you try that on GPT three, five turbo and GPT four. You notice that there was a significant improvement of GPT for you.

You ablated the retrieval and the retrieval helped dramatically. I think you limited IT to about sixty steps. But presumably if you went for longer, IT would have been even Better. And um the backtracking presumably had some notion of i'm in a bad state now i'm going to go back how did that work?

So the good thing about a formal framework like lean is that you are getting information about what is a what what kind of state is bad, right? So if you are if you are at a certain uh state of the proof, uh, you have a certain set of obligations to prove. Now you take a tactic that's not helping uh or in fact that is just not applicable in the current state, then you are going to get feedback.

Or for example, if you are finding that you are cycling back to something that you have already a state that you had already seen. And so you're basically going to end up with an infinite loop in the in the state space. Then that is also something that uh, you should avoid, right? And this kind of information you can get by explicated embedding the elm inside a broader search process.

I think again, do that the that the main finding in that paper is that in context, learning can be interesting uh, event in the setting. Now we limited IT to sixty because we were working with an academic compute budget and also GPT four was fairly expensive at the time. But I think that going forward, as the models get cheaper, IT would be really interesting to see how far this sort of LLM plus uh search, plus external grounding through through lean or other kinds of frameworks, how far this can be pushed. I think that you can actually go a long way, especially if you are willing to do exported ation, which is that you collect data and then use that to find in the model for the and and you Carry on like that a couple of things.

So I am interested in this concept of reach, ability. Um so so you have a whole bunch of tactics in in a database. And then there's there's some notion that we've been kind of pointing to about the complex whole of the tactics giving you some reachable space. So so so maybe we should talk about that complex's hole, but presumably there would be examples of of themes that have a goal state which is not reachable with the tactics. I mean.

could you talk through that? right? So there are really low level tactics that would enable you to really find anything. It's just that the there may exist a sequence of tactics that lead you to that um gold state, but IT may be so long and so complicated that your A I may not be able to find IT and that happens all the time. There are lots of statements that we are unable to prove.

Now I think one interesting question is that do you um expect a to do Better if you were able to abstract these sorts of low level tactics into higher level tactics? And I think the answer is yes, right? If you think of how a mathematic student progresses, you start with really basic algebra and and or maybe even before that, you start with earth metic.

And then you you get to algebra and then suddenly you have this big leap and then you get to calculus and you know you you learn probability and then you could get even more abstract. Um you go to complex numbers. And and so there is this high archy of instructions that a human student goes through while learning mathematics.

And the purpose of these attractions is that you are able to be more robust. You are able to handle more and more kinds of problems. Uh, so you are unifying a lot of sort of point solutions into into general solutions. And I can imagine that that approach would be valuable even in this context of A I for math, that you would start with some low level set of tactics. But then maybe in this kind of dream quoter way you are going to discover more abstract tactics, and then you would be using those, uh, to do more success proofs, and then stuff would just explode at at some point that you will be discovering new kinds of mathematics.

How does the resolution play into the leakiness? M, and i'm certainly thinking in physics example, the new tony mechanics and relativity. And so do you do you introduce leakiness by relying on high level of abstraction.

not in this setting of formal proof? And the reason is that the way a framework likely nor rather their rather once as well um these frameworks are architected. You have to have certainty all the way through, otherwise your proof is not going to uh, be accepted.

And so even if you are at a high level of obstruction, you are going to have implementations of that high level primitive in terms of lower level primitives that actually are are sound. Now that said, you could certainly imagine versions of this, uh, where that is not the case. So for instance, you could imagine a portion of A I for mathematics where there are some law level or are some some proof obligations that are just sort of empirical right?

Imagine that i'm making a statement that um I don't know maybe this is not a universe sm mathematical law but maybe it's a property of a certain place and and and time, right? So maybe you are you have a predict that you are just evaluating empirical now you could give IT mathematical meaning, uh, by making IT into a probabilistic predict. But you don't know that you know the distribution that you have selected is the right one, right? So in that case, what would happen is that the more high level your resolution, oh or the course of resolution, uh, the more uncertainty there would be. But you would still have some amount of if you're using you know formal mathematics s above that level, you would still have some amount of assurance about the claims that you are are making.

So IT would be not that different from, you know, when you're using math in in a in a natural science, there are empirical observations that you are making and there is noise in that, right? But you are at some point attracting them into this cleaner mathematical model and all the inferences that you're making waste on that model, they're conditioned on your assumptions, your modeling being correct, right? And I think that it's not that different from this that if you have two core screen model, you are going to have you know more uncertainty regarding your conclusions, right? It's not a zero one that IT has to be the fully certain or or completely uncertain.

I was speaking with David biv c recently, he that is very well known in applied catch gory three, and he was kind of saying that our entire epistle make framework almost derives from the questions that we ask and I wonder whether it's a similar thing when we talk about theirs and and the utility of of themes and the role of humans and creating these themes. Um there must be some kind of you know reason why we're asking these questions like where does that come from?

I think that for a lot of mathematics, it's been inspired by the goal of solving real world problems. If you think of why people came up with geometry, why people came up bit calculus, I think there was a very pragmatic goal that you wanted to understand how the world works, and you wanted to then apply to to do new things to solve real world engineering chAllenges. And so I think that that soften the starting point. But then, of course, you know, there's this motivation of elegance as well, that and curiosity that you want to have elegant unified solutions rather than point solutions that are just checked up. Um and you also are curious that, okay, so I have made these sorts of assumptions in this definition. What if I change these definitions in this way, what would happen? And I feel like if you just combine these these ideas, uh, the idea of modeling phenomena that are that are visible in our a day to day lives, and combine that with these motivations of curiosity, where you are just asking OK modified, did this what would happen and unification of point solutions into something more abstract, I think that you would get to a lot of mathematic sad way.

Could you exchange that a tiny bit? So what if you could break this? And what makes a really good mathematical theory?

I am not a professional mathematician, so I am probably not the best person to ask this question. Maybe I can talk about computer science and and I can try to answer this question for, let's say, critical computer science papers because I come from that community. So there I would say that um one question is, uh whether or not this there there's S A question of interesting ess. Yes yes. So what is interesting um I think that I should be clean the problem that you are trying to solve.

Uh IT should be queen statement IT couldn't be that i've come up with this bizarre definition with in different uh the cases and then um I want to prove a certain term about that right so put IT should be something that is relatively success to elegant and then um IT should be novel so if IT feels like I have seen in versions of this before and I can take a proof for one of those previous models and this twick a little bit and then the proof comes out, that's just not worth pursing. So I think novell, ty and elegance are big criterial. Then I would say that in computer science specifically, utility is a consideration as well.

The reason why computer scientists often care about certain kinds of theories is that bacteria models, real world competition process. So there was in an interest in inside online algorithms because, you know, there, there were many practical computer systems that needed to process data online. There was an interest, interest in alcoholic game theory, because the internet came about.

And then all of a sudden we were departing from the traditional centralized view of the of the competition world. Instead, the internet was this massive computer. And you had all these agents that were driven by their own objectives, but they are coordinating using this.

That's a an adult or, uh, any number of other kinds of online coordination. Megan's ms, right. And that LED to all of this interesting work on on algoma game theory, where IT was us, that. All right, so let so that I have these key different agents and they are arriving at any club um right which is this a this market, right um the the they could be blame of the market.

But um I want to now ask in contrast to traditional game theory, can I reach an equilibrium in polymer time? right? Is this equally we are not only reachable ble but computationally tractable uh to reach so the way I said a lot of interesting computer science theory has happened by taking these sorts of lone standing principles and math and and a prior computer science and then adapting them to these new settings, new models which are inspired by practical problems on the ground in in in um computer systems.

Game theory is a beautiful example, absolutely beautiful. I mean, i'm interested in agency and that's a wonderful example of one of these. Just it's a phenomenon in the natural world and IT IT beautifully demarcate certain dynamics of of behavior and so on.

And then we create this game theory thing, which is also using the concept of an agent. And IT IT has this unreasonable effectiveness. So modeling many um you know complex systems even though we choose to model the agents at at different scales and even mixing different scales and inside the model.

But come coming back to your definition of really good fear. And though you said clean and I think you are talking about like pazzini, but you know, sky said that LLM s on a theory of of language because they they don't have that clean demarcation. They don't IT doesn't d mark, what he thinks is language verses anything else like an impossible language or or something. So is this something about this this demotion?

I think that l lamps um would be would produce whatever is it's there in the in the training site, right? And and and some amount of composition, therefore. So I think that if an illam was primarily trained on beautiful themes and beautiful literature, I think that you would see certain certain um aspects of that in the the text that the generator as well.

However, I think that that clean, that elegance also is suttles. It's not just about following certain high level motive so beauty, but it's also about um being consistent. So a lot of the time, for example, when I use ellam for writing, I find that okay, this paragraph is really well written nice, but then the next paragraph is somewhat different.

It's IT may be you know individually fine, but then when I put the two together IT just become something horders right. I think that this idea of elegance is is a sort of a global notion that it's not just that you know little pieces of the theory are clearly stated and therefore the the the entirety is uh cleanly stated as well. Um so I think that to me it's a very open question that can an elm come up with with elegant statements. I don't know that the answer is is no or yes.

And and just to close this up because you did sketch out in your ablation um going all the way just to a zero shot in a blank slight GPT IT wasn't G T four wasn't and I think I got something like two point three percent if I member correctly and then IT sort of ranged all out to about thirty percent with all of this corporate algorithm and the retrieval in the backtracking. Um the thing is that there's always this nagging thing in the back of our minds about the test set contamination. A little bit of work and could you talk talk about yes.

So this is a very difficult a issue with alia elm based approach. There is always this worry that that the results you are seeing are because of contamination.

Now we did what we could to address this issue uh and what that paints that so first we showed that if you strip out all of the extra pieces that be added, we just asked the L M to go and produce the proof um and to end IT wasn't successful IT got something like two percent as as you mentioned, we also looked at a lot of the proofs that are generated and we looked very hard for for similar examples in the on the internet and um we just didn't find any. So that is another indicator that the proofs that we were seeing were not just a an artifact of contamination. Now that said, contamination could have played an indirect maybe there are similar kinds of problems that the alarm has seen before that we just couldn't find. Uh, that's definitely a possibility. But I feel like at this point of history in A I research, we just have to know be very aware of this risk but still continue um using allam s yes sweat.

When I spoke with sub around, he said when he was solving a planning problem in blocks world, he made a mystery blocks world version where he just scrambled all of the tokens and apparently that that did cause a significant deterioration and in performance. But it's it's so slipper isn't IT. It's very, very difficult.

Yeah yes, yes, on standard benchMarks, I think this is a big risk. Now I think that one could imagine using synthetic benchMarks. Uh, now the approach that we took in the corpora paper, I just don't think it's well designed for completely arbitrary synthetic benchMarks because the whole point was that the prior knowledge inside the elem helps with finding new tactics.

But there is the other paper that we recently worked on which is uh IT involves concept learning um using an LLM in the context of a symbolic progression and there we also used an L M inside the search framework is this evolutionary research in this process in in that case and uh I mentioned newspaper earlier in our conversation, you are actually using the elem to do concept obstruction as well and you're using these abstract concepts to guide the search process. Now in that case we did experimenting with with synthetic uh problems and we found that the alarm does help there as well. And the reason is that even if you are not using any sort of prior knowledge, the abstraction ability of the L M that is to come up with you know high level concept based on some some examples of text that is helpful um but that's not a part of the coral approach. And so there is no reason to believe that if you just come up with a completely synthetic problem, uh corpora would be able to do that. Well.

amazing. And before we go, where can people find out more about you?

Just google my name and and then you'll get to my website and all of my papers are a link from there and um I also have a twitter account uh just smart um but if you really want to discuss something a technical, just send me a message and I D be happy to .

check more amazing will be putting that the papers upon on the screen as this has been a brilliant conversation thank much for joining having me.

I love you as well.

How AI Could Be A Mathematician's Co-Pilot by 2026 (Prof. Swarat Chaudhuri) 01:44:42 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

How AI Could Be A Mathematician's Co-Pilot by 2026 (Prof. Swarat Chaudhuri)