We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Nora Belrose - AI Development, Safety, and Meaning

2024/11/17

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Nora Belrose

Topics

Nora Belrose 介绍了 EleutherAI，一个非营利性人工智能研究组织，以及她的团队在概念擦除和概念编辑方面的研究兴趣。她解释了概念擦除技术如何用于深度学习的各种目的，例如公平性和减少模型中的偏见。她还强调了概念擦除的挑战和局限性，例如需要标记数据来定义概念，以及只能删除线性可用的信息。

Deep Dive

Chapters

The chapter discusses the concept of simplicity bias in neural networks and its implications for AI safety. It explores how neural networks progress from simple to complex learning patterns and the challenges in concept erasure through LEACE.

Simplicity is a good inductive bias for models to generalize.
LEACE (LEAst-squares Concept Erasure) is a method for removing information about a target concept in neural networks.
Neural networks start simple and gradually become more complex, which could have important implications for AI safety.

Shownotes Transcript

Translations:

中文

Broadly speaking, simplicity is a good, heroic. And the sort of literature on simplicity, biases and deep learning does tend to say like this is the explanation for why models generalized, right? If they didn't have any kind of policy bias. And they started incredibly complicated from the beginning. Within phenomenology, there is different sort of ways of thinking about what experience really is.

I think amen, whose zero might have been more closed to kind of an ideas perspective, famous for this idea of the epic, where basically you are looking at your experience, you're describing IT, but you try to refrain from assuming that your experience is reflective with some objective reality. Other phone or meta are usually as as being less ideal as especially mp. Whereas a lot of people when they think about, they'll say like, well, your experience doesn't actually include people windows objects because that's an interpretation of your experience.

Your actual experiences is just like colors and like ross sounds like uninterpreted raw fields and i'll say that's the thing that's real. The interpretation is like kind of fake or something that loves he wants to reject that because he says, like okkak. Look in the very idea that you're seeing rock colors separated from the objects that have the colors.

Is this of post talk abstraction that you get from philosophical thinking. But really, what what's there in your experience is just like objects that have properties, like colors and stuff, but like the object is there and experience IT directly. Do you .

want to run lama efficiently on smaller GPU? What if you could run both training and inference on the same GPU we've sent her miles? Breakthrough optimization technology, you can maximize hardway utilization and slash A I computation costs.

Running L L ms at scale shouldn't break the bank. Centaur's intelligent optimization platform helps enterprises deploy A I models with maximum performance at minimum cost experience the difference narrow. Welcome to M, L, S, T. It's amazing to arbe here.

Yeah, i'm going to be here.

Yeah it's amazing to meet you. Can you tell us about yourself?

yes. So um i'm nora. I'm the head of the interpretations of research team at um a user ai um where a nonprofit H A I research organization um that got started just a couple years ago. Um we started out as as a discord server. We still do a lot of our research kind of out in the open on discord um and yeah so that's that's what I do um in my day job and i'm here at icml um to present my most recent paper.

what are your main research interests.

So there is like a few different topics that me and my team um I collaborate with a few other people ea lusa um are interested in. Um I guess one of our research interests is a concept ratio and concept editing. Concept ratio um is kind of a cent of tools that can be used for a few different purposes in deep learning.

Um so one application of conception um is fairness and bias production. Um we all know that um models like language models, often pick up on um kind of the biases uh kind of harmful biases about a protected uh minorities um in their training data um and we often want to try to mitigate those biases. There is a question of how to do that um with conservator.

What you're trying to do is you're trying to look at the internal representations of the network um and remove information about the target concept. So that might be race or gender or something you know entirely different like part of speech, but you're trying to get rid of um this targeted information in their representation while kind of keeping all the other kind of information present and their representation. Um and there are some um that there this kind of a preexisting literature that goes back like a few years before we got into I got into IT um but just last year we put out of paper called uh this least squares concept ia um that that introduces a new um way of doing concept that has some nice mathematical guarantees. Um so what does .

that mean to a raise a .

concept exactly. So so that is um a somewhat trick tRicky question um the way that the literature has kind of chosen to Operationalize ed this idea is to say, okay, look um we're going to measure the amount of information about our target concept by basically training a probe so could be a linear classifier um for example, on the representation to try to predict um the concept you know race, gender, particular what effort is um and if IT turns out that the uh classifier is unable to uh you know predict the target concept Better than chance is just predicting fifty fifty on on every input or something like fat, then you can say, okay, there's no at least there's no linearly available information about the concept of the representation.

There's always a concern that well, maybe classifiers strong enough, you know maybe A A more expressive or stronger classifier would be able to extract the information um but you kind of have to start somewhere. And so that's that's the approach that we take in the newspaper as well. We consider the the lining AR concept a regime where are trying to make sure that no lennar classifier can extract information about the target concept.

So what's the bank story here? I mean, how did you start all of this?

yes. So um you know I wasn't initially kind of interested in in concept raca can afford to own sake.

Me and my collaborator alex work on a on a total different project um where kind of as a sub problem in that project, we wanted to kind of remove information about a particular concept in that case was not actually about fairness um and so you know for that project I did some kind of literature of you I look looked into like the existing methods at that time for concentration and one of them was called r list. So that stands for a relaxed lennar adversarial concept. Ia, um and it's you know is a pretty cool approach actually.

Um basically the idea is that you've got this linear classifier and you also have an orthogonal projection matrix. This is a slight simplification, but this is basic was happening you've got the classified you've got the projection matrix s and you're simultaneously optimizing both of them and kind of an adversary set sort like a again scenario if you're familiar with that. Um so you're optimizing the classifier to try to predict the um concept from the representation, but you're also optimizing the um projection matrix, do like maximize the classified as loss and you're taking you're gonna take like one step on the classifier than one step on the production matrix back and forth.

Eventually, hopefully um you'll reach a fixed point. It's it's this kind of adversarial game um and so that does work pretty well um but that does have some problems in particular. It's a very slow and kind of trick to get IT to converge like sometimes it'll get get in circles and stuff like that. Um and I wanted to um basically speed up uh this concept show technique um for my own purposes.

And so I looked at this other paper um called spectral atta beat removal is a different conservative technique um where they so what they do is they compute cross covariance matrix between your representation um which works let's call IT a vector x okay, x is your representation that you're getting from your real network or whatever IT is and then z is like some other vector that's representing you know whatever concept you care about um and we're computing across conversion matrix, like each entry of that matrix is like a covariance between the like entries of x and entries of A Z. Okay, so we have this matrix um and then you're going to do S V D on that matrix to kind of basically to find the kind of directions of maximum correlation between your representation and this concept and then they just do a projection to a remove those kind of directions of of maximum correlation. Um so it's it's a very simple technique um and it's fast in S P T is fast compute.

Um and so what I did was I was like, okay, i'm going to use cell um this method as an initialization for our place. So i'm i'm going to do sell to get the projection matrix s and then i'm going to run our lace with that initialization and IT turned out that if you use that initialization are less like IT just immediately convergence like you don't like like like basically if you start out with this initialization from sale, the classifier just cannot do Better than chance of predicting the concept, just immediate. There's no like additional opposition stuff that you need to do and I was like, what like this is like this is crazy like and I didn't IT wasn't immediately out like a guy was like, okay, I don't know why this is happening, but there must be some like mathematical reason, like some proof that you could give that I actually sell and r are like the same thing or like something like that and so actually waited about IT.

I was like, I I didn't tweet this like exact I tweet, I like a sale in our list, but I tweet kind of about this mathematical uh problem um and then uh one of my twitter followers um David tonight Joseph shout if you launching um responded and he suggested like a proof to kind of connect sale in our list together basically um and then basically we we started talking and kind of one thing LED to another and we like produced some more proofs um so we realized okay there's this um there's this close mathematical connection actually there's there's a mathematical equivalents that you can prove between linna guardless. So that's the technical term for um when I representation but I should back up when I A linear classifier cannot do Better than chance at predicting a concept when that is true, then the representation is linear arly guarded for this concept. okay.

So let's consider some concept that has like two possible values that does not matter what the concept is and you can compute the um average representation of the basically the average representation where the concept takes the value zero and then the average representation where the concept takes the value one is a kind of the center royds of the two classes um and IT turns out that you have a linear guidance. Ss, if and only if the center ods are equal. So in the mean representations are equal for the two classes, then you have winning regardless.

That's what we were able to prove. Um and then we went further and we actually derived the least square solution for a kind of transformation that guarantees lennar gardens. So basically we have this closed form formula that you can write down on A T shirt that uh to transform the representation so that the uh means of the classes are equal and therefore you have gardenless. And IT is changing the representation as little as possible. We call that surge quality and that that is what leases .

why is a closed form solution good?

Yes, a close form solution just means that you don't have to do kind of. Gradient to send or like some like uh compute heavy optimization to try to to to find the solution. Um you do have to do S V D as part of IT, but that's that's pretty fast. Um and uh yeah it's just you know you can compute IT and you know if a proof that IT will be the optional solution.

So there was a great figure in the paper where you are talking from a high level at the processes. Can you talk us through that figure and we'll share up on the screen?

Yeah right. So there basically the way to think about this is what least is doing is its first Whitening the representation now now what do we mean by lightening? okay.

So if you imagine the data as kind of a cloud of points OK, um the cloud um might start out is like A A perfect kind of spiral shape where it's like the kind of same like variants in all directions. But usually your data is not going to look like that. Usually it's going to be kind of like some weird episode or maybe there's like multiple clusters.

However, you're going to have different amounts of variance in different directions. And what widening does is that make sure that in every direction the x axis, the y access, the z ax in any direction you choose, the amount of variance is precisely equal. So that kind of a first preprocessing step that least is doing um and then once you do that, IT does an orthogonal projection to kind of squash the data onto a hyper plane, which ensures that the means are equal.

Basically if you have like two classes, you can imagine like A A cloud of points ever here and a cloud points ever here. Here like one, he has tried two. You're looking at the difference between these two centuried, that kind of like line that connects the two cents.

And you are smashing all of the data onto the hyper plane. That is Normal to this line. Um so that's what you're doing and then you undo the lightening at the first step and that that's what leases is doing.

So by projecting onto this hyper plane, you're making sure the means are the same. Therefore, you've scrubs the concept .

yeah exactly and you know you can like a slightly similar thing than least is to forget about the Whitening and unwritten step and just directly do the orth organic projection onto this to the hybrid plane that also works. And we prove that that does give you any regardless. But IT is not surgical in the sense that you are kind of changing the representation more than you need to.

And um you know we think this is important for a couple reasons. So um you know in the this in general, if you're changing like any time you change the representation of your new network, you're probably going to reduce its performance to some extent because like it's been optimized by H G D to like do as well as I can on your on your task. And so you want to be very careful with changing anything and they sort of a different reason for for warning surgically.

So um one use case for concept that we haven't talked about yet is just interpret the research itself. So um in the newspaper we actually um do this experiment where we look at um how much do language models depend on or use part of speech information to make their next token prediction um and with concept action you can actually kind of Operationalize and formalized this question because you can say, okay, we're going to use lease to remove the lining arly available information about party speech in all of the layers of the networks. We're going to go into every layer um and change and apply list to the in immediate representation um and then we're going to like run that the ford passed that way. So are kind of inserting least into the ford passed every layer.

And for this type of kind of question, you do want such a quality because you, anna, you want to kind of remove the part of food information while keeping everything the else the same or at least as similar as possible to like what you started with, right? So you're kind of isolated the affected part of speech um and interestingly, we do find that of course um when you do this to language models and we looked at allama two and a the pithum series, um IT obviously doesn't reed the loss next ogan prediction loss quite a bit but IT actually like models are still able to predict next talk and like way Better than chance or way Better than the kind of like like baseline entropy of like what's called the uniform entropy. It's kind of baseline um for next token an prediction they still do Better than universe entropy when you do this. So it's it's kind of they're using the part of space information, but they're like also robust enough to like rely on other cues when you removed.

Can you tell us about the set up of how the thing is trend? So there are these one hot vectors which present different concepts. And do you see that as a potential form of britain? Na, I M, how were those concepts labelled and and how were they kind of trained into the model?

Um yeah. So I think this is in a one potential problem with applying conceptor, just that you do need some source of of label data to kind of define what your concept even is. Um you know in in the case of a part of part of speech experiment, we used um this a kind of commonly used NLP library called space y they have their own fine tuned transformers that a can can do part of speech labelling.

So we just applied that to um to the piled data set and got labels that way. But yeah if there is like um if your labels are kind of incorrect, um you might not be you know at least might not do exactly what you think IT IT should be doing like if you're label play around. But I think that's that's kind of true in general with with your machine learning, right, you want to make sure that your labels are as actually as possible.

Yeah, really interesting. The other thing is that this is a post talk method, which is to say you have A A frozen base model and then you can apply IT sequentially through the layers and IT can be done and quite efficiently. I think you said in your papers that you could potentially do IT in a streaming fashioned and really quickly. But would you ever consider using that as a method to kind of scrub concepts out of the base model, almost like a fine training type system?

yes. So if you want to take a model that already been trained and then apply leaves to IT, you can actually kind of burn the least into the weights um in a very way. It's a very similar to um how Lauren does. Its a laureus dds for low rank adaptation.

It's a parameter efficient fine tuning method um and you can you can do a very strong thing with least because IT turns out that if you look at the the kind of lease solution, um it's actually a low rank protection of an identity matrix. Um and so yeah, you can just do a little bit of a algebra IT turns out you can like just do this like low rank up update to the to the weights and like put listen to the the model post talk. There's also another thing that you can do which we have played around with a little bit, but it's we don't have like um good experimental results on IT yet, which is just to apply this kind of an extreamly fashion during training. Like from the beginning of training, you're just like applying lease like and and you're also updating the lease um a racer recall IT in a racer the least translation um you know after every training step or every few training steps to kind of like keep up with the models representation as IT as IT trains and this is something you can do um you know but like I said, it's it's kind of early days for that and it's unclear if that gives you like a big boost over just doing with post talk.

What kind of effect does that have on headline benchmark accuracy? Because I space, what we're doing is we're removing some forms of statistical information from the model. Does IT have A A dramatic effect to what what .

have you seen? Yeah, so IT IT depends a lot on the the type of concept that you are racing, right? So you know, in the paper, in with the part of speech experiment, we were really targeting a concept that we had a reason to think like would affect performance, right? And we wanted to see how big that effect was.

And you know, that was a substantial effect. I mean, I I would forget the numbers exactly, but I think in some cases, we were like doubling the complexity of the model is something like that um and that but if you're looking at a different type of concept, you like gender or or something else, um you know we like this. We don't have like a kind of very extensive results on this, but like we tend to find that it's it's not affecting performance a whole lot. Um just because IT is IT is a surgical modification. And you know you're only you're only racing like one kind of dimension out of the like thousands of dimensions that exist in the and the representation yeah because I think we're going .

to get on the talking about your stats paper and there's some interesting results there. The neural networks, given enough computation, can start to learn some incredibly bizarre and interesting statistical features. Does that in any way negate your work here?

So I does not negate that the least work. Or oh.

so if you surgically remove concepts that we know about, but the neural network still have this uncandidness to learn e teri c with statistical proxies all over the place at high frequently and so on, you does does that imply that when we train these huge and your own networks, that it's almost a difficult thing to do to kind of surgical remove what we understand to be concept?

There are definitely limitation tions to to lease. And you know we have seen that. So I mean, the big limitation of lease is just that we are removing linearly available information. But obviously in our deep on that works on non linnie.

There is some kind of international ability research suggesting that well, even though networks are non linna, they IT seems like um they do use linna representations in many cases but never the less like take see far ten okay this is an an actual experiment that we did um and you um you're treating the images themselves as like a target of conception and you're just a racing the class uh as a concept from the images and then you try to train and you do this for that um and then you try to train A A A model on top of those images. Models can still learn to classify the images, right? IT doesn't matter that you've removed the linear information, they're still higher order information there and like they can find that without too much trouble.

So the the hope with concentration is I I guess it's it's kind of too fold. One is if you're targeting ing a concept that is not like super essential to the models performance, like it's it's kind of helpful. But like and so IT does learn IT by a fault, but like you know it's kind of an optional feature, then you know there's the hope that like removing the in nearly available information will actually affect the models behavior and IT will not rely on that future as much as I would buy default. Um that's one thing. Um the other thing is just that um I don't say I kind of forget what the second thing was supposed to be.

I think that's ago. I mean, because of though is that least is applied to every single you like your network as a whole bunch of nest dids kind of matrix transforms and then a nonlinearity and you appliance sequentially to every single Linda component of the network. And I suppose you you know i'm just being a bit nave here, but you might think that that somehow does the non Linda thing because you're applying IT throughout the network. But what you went on to do was you like A A higher order version called cute and and you also did some work looking at higher order information that is learned in in this network. So but what you're saying is that the networks still learn this higher order information and that can still learn some of these concepts even though we've running on IT.

Yeah exactly. So you know there's there's kind of a hope that if you race lower order information like linear ear available information, or we might talk about this in a bit quite radically available information, kind of second order information. Um you know it's it's so possible for the model to to learn the concept using like third forth fifth th order statistics, but it's just going to be a bit harder and the model may rely on this information less than if you did nothing that kind of idea.

So you made a corda tic version of .

lease yeah right. So this is kind of a follow up to lease where we were like, okay, we want to make a form of conservation that is more um thoro essentially um and the way that we Operation ize this as we were like OK, we want to prevent not only lenie classifiers from extracting information about our target concept, we want to prevent quad draw classifiers so classifiers whose kind of output like loge's are just quadratic functions of the input um and IT turns out you can like do some math ah that is equivalent to making the means and coverage ance matrix of your classes are equal um and we gets more math and showed that um you can can achieve this equality of means and covering materials using tools from optimal transport theory um and yeah so we we like derived some more close form solutions for cues and we started to do experiments with you in particular one experiment that we did was we look at see five ten um so are treating the images as just kind of representations themselves um and we're trying to remove the concept of the class label from the images um and if you apply just Normal least to this IT doesn't really have much of an affect at all um like models can still learn to classify the images after least very easily IT turns out if you apply you to the safer ten images and your classifier is small so it's like two or three maybe four layers like an mlp especially um then IT actually can't to learn anything at least in our experiments which we haven't published yet.

We still want to like yeah but at least in our experiments we did some hyper parameter tuning for this. We were unable to get these kind of small classifiers to learn anything after we applied cuties. So we arted get excited about this.

But there are some like big caveats here, because IT turns out that if you look at larger classifiers and if you look at, you know a large a like convolution learn network, like you know a resonant fifty year or larger um and then you you try to train that thing on these cute st images. IT actually back fires. Now what what do I mean by that? Well, IT basically IT turns out that it's basically a artifact of how we derived cues in the first place.

So with q is unlike with um Normal, you have to kind of look at the kind of value of the concept at inference time in order to apply pulse. So you need to like like when you applying cultes to an image, you need to like, no, okay, this is actually an airplane. And then and then you like use that information to choose what transformation to apply to IT. The problem is that when you're kind of peaking at the label, this actually kind of leaks information about the class label into the higher order statistics, the third forth fifth a statistics and models, if they are deep enough, are able to pick up on this signal. And so you can get this backfiring effect like you think you're trying to make the concept less silent or like harder you to learn, but you're actually making IT easier to learn. And so we kind of that's why we haven't like done a good paper per say um and we would be you know cues has some applications but their kind of niched know we really think you should be careful if you want to use um but these experiments into kind of see far ten and like doing these um these transformations on safer ten images um let us into this kind of new direction of research that LED into our S M L paper for this year. I be happy to talk about that as well.

Yeah, just a quick thought on that. You know, in the olden days of interpretation service, we used to talk about sharply values and line and all of this kind of stuff and IT seems infinitely tractable that we could understand what a model is doing, and we could manipulate IT. And what what you seem to be saying is that when models just become really big in complex, they they just become inscrutable, able monsters. And all of our efforts get resisted because they always find a way to do what they want to do.

That is definitely one way of thinking about IT. I tend to be a bit more more optimistic than that.

But I do think you know that is true that if your no uh gradient descent is like a very powerful optimization and if you are trying to kind of directly go against gradient descent and like prevent gradient descent from accomplishing something, when there's just like a very strong you kind of you when you you're trying to prevent britain the center from like reducing the loss, like you're probably going to to lose that battle, especially if you are not applying kind of an equal amount of optimization power yourself。 That makes any sense. Um so yeah, I think that's that's part of the lesson from the the last thing.

It's almost like another bitter lesson, but an interval ability that to lesson .

yes yeah right. I think you know some other interpretation of people that point of this out, but you probably don't want to like optimize against interpretation service um methods or at least you want to be like very careful and you probably don't want to like directly optimize some measure of like the interpretations of your model, because IT could just like end up learning something that's like totally different .

from what you expect. So that .

paper right um so um there's kind of this big literature um that early existed before we did this paper on um kind of simplicity biases in deeper learning. So the general idea is just that you know, when you Randy initialize in your own network, IT starts out as a quote, quote, simple function. Now the question is like, okay, what is simplicity? What are we talking about here? There's a lot of different notions of simplicity.

But intuitively, like most randomly initialized networks are going to be more simple than the a the network that you get after training um and then you know you sort out simply t initialization and you kind of gradually get more and more complex. That's the basic idea. And there's many different papers kind of trying to flesh out how exactly this works.

You in what sense is the model simple and getting more complex? Um and our paper is what have a an additional contribution to this literature. So we were looking at IT from the perspective of statistical ah information um so like in in statistics, there's a concept of the moment, so the mean of a distribution is called the first moment um and then the variance and covariance um between the different components of of your data are the second moment.

Then you can talk about like third order interactions between the components of your data is like a third moment and so forth. And basically our hypothesis, which kind of came from some of these cuz experiments um was that models learn to kind of exploit or use these statistics um these moments in order. So kind of early in training um the kind of predictions of the model are primarily going to depend on the kind of first order moment or or just the mean of the of the distribution of the data. And then it's going to start kind of depending more and more on these can simple correlations, these you know the covariance between the different opponents of the data.

And then it's going to start using third and forth order kind of information later on in training and the way that we actually um kind of canada hypothesis specifically is v we used um with code optional transport theory to take seifert images from one class and kind of modify them so that they mean and covers matrix match the meaning koreans metrics of a different class um and you can do that you can like use this closed form formula from after transport theory to do this in a way that kind of keeps the images as that that basically changes the images as little as possible. It's very similar to lease and it's kind of a surgical edit to the images that like just changes their meaning covariance and then keeps everything else as similar as possible. And if you look at the images, which you should definitely look at, um they you know you can barely tell the difference like before and after like those.

And our paper, i've got an image of like an ostrich, and we like change IT to be an airplane and change IT to be a deer and changed IT to a frog. And it's you know, you can see a little bit of differences in the background, but it's like almost the same image. So to a human, this is like not changing everything, anything.

But IT turns out that if you do this transformation to models early and training in the first you know few thousand steps or so. They get fooled basically. So like they're going to you early in training. Image classifiers are very likely to classify ostrich that has been edited to look like an airplane from the perspective of second order statistics that they're very likely to just classified as an airplane. And we quantify this in the paper.

Yes, was gone to ask why you can make these modifications into the unit to the human night still looks the same. Man, i'm an experience video editor, and I know that you can modify the distribution of values, you know, like IT might be recorded in RGB format, and you can squash and you can translate and you can kind of, you know, move things all around, you know, you can change the mean and visually IT still looks the same.

You know, maybe you just changed the luminance fails or something like that. So the way thing is, is that you can make IT look as if it's a different thing to a machine learning algorithm. But from a human perspective, IT still looks the same.

Yeah, exactly. And you know people have kind of pointed this out before. I mean, is the whole literature on adversarial examples um where you can just change a few pixel in an image and IT completely changes the class? Um I think I do think that there's like a different mechanism going on here.

It's not just because, for example, we're not we're not actually optimizing against the network. So it's it's not adversarial in that sense. But IT is showing that especially early in training, these networks are sensitive to, you know, simple features that humans are kind of much less dependent on .

if that's fascinating. So there's this unravelling of complexity to use GPT language that early on a kind of focuses on on very simple things that might be looking at the statistical moments. And then as you continue to train the network, IT starts looking at increasingly kind of complex in scrutator features.

Yeah exactly. And and we show that in our paper um there is kind of this graph where the exec is is uh time, just like the number of training steps. The one access is the accuracy where the accuracy is measured um with respect to our kind of target labels when we're applying these this concept editing or there are optimal transport theory.

So we're like so we're like showing that the rich that supposed to be an airplay and saying you should cost fy this in as an airplane. And the accuracy gets to you know like forty percent, fifty percent in some cases at around like a thousand training steps. And then IT starts to go down, right? I think at the end of training is still above ten percent, which is kind of interesting at least I see far ten. Um but like but yeah there's kind of a nonmonotonic um process where IT starts at random, then IT gets kind of fooled by our thing and then IT gets smarter and stops being full as much .

yet so interesting, so simple features are easy to guard against them. They're intelligible to humans. And IT raises the question of do we actually want your networks to learn very complex features? I mean, would IT would IT not be more ideal that we kind of we guard them so that they can only learn simple features that we understand? And you ops razer, right on simple features, more robust.

Yeah right. So I think um you know it's it's going to depend a lot, both what you mean by simple and the the kind of the task at hand.

Um I do think you know broadly speaking, a simplicity is like a good heroic and like the the sort of literature on um simplicity biases and deep learning does tend to say like OK, this is this is the explanation for why models generalized right like if if they didn't have any kind of simplicity bias and they just started out incredibly complicated from the beginning, um they would probably over fit or they would just not do well at all. And so you need some sort of inductive bias like this. You know different there are different ways to have a doctor biases. But but you definitely need need something.

And why is that? What you said is really interesting. So it's almost like kids, an inductive prior that the neural network starts learning simple functions and then IT almost branches out into increasingly complex functions, almost as if IT wouldn't be possible to get to the complex functions unless they started with the simple ones. But the country, for example, is something like rocking where they seem to make this transit change you know into a completely different type of function.

I have looked into this a little bit. I wouldn't say i'm like an expert on on the go literature bit. I do think people tend to um tend to like over estimate how fast broking happens.

Um there are some like plots that um I don't they like certain papers where I like kind of there's like a plot that's that's kind of demonstrated the broken phenomenon. And I kind of looks like growing is up really fast. But actually you if you look carefully, the exacts is is on a log scale.

And so really the rocking is happening like over you know like at least half of training or something that um but yes, know broking is an interesting case because it's precisely because it's an exception to the rural, it's exception to the rule that you tend to start out um starts simply to get more complex in case where broking happens usually. Um it's because there's something like wait, decay or some other regularized that's being applied that like you know encouraging the the model to get to kind of get simpler over time. And there was one paper that I read a few months ago that applied the neural network tangent kernel to this in a way that seem prety compelling to me. Maybe we can like find IT later um and put in the description but um IT is an interesting topic.

And just final question on you know high for frequency features in in general. And there was a great paper many years ago I might be by willand brando talking about how vision models tend to over fit on textures so they they don't learn a cats the way we do. They look at a certain texture of cat fur. And this makes your own networks have really good performance. So there seems to be this trade off between why should we stop the newer networks from over fitting on caf?

Yeah, I don't know. I guess in the case of image models, I mean, I would tend to think if an image classifier is able to over fit on cat for us, as you call IT, and of focus on textures and and not on shapes and is still getting good performance on benchMarks. That kind of suggest that maybe the benchMarks are not as good as we might hope or or there or they're kind of you I do think that like if you want to build you know you know an autonomists robot or something like that, you probably will need um a computer vision system that has more of a shape bias and is a bit more robust than a lot of these c nds that we've been training.

Can you distinguish meaning in value? Because I I kind of back at this conversation in my mind in the broader discussion around what is value.

yeah, i'm determined closely related. I guess maybe I think meaning is IT is a bit more individual even though it's it's about connect being connected to something bigger. But IT IT is kind of IT IT is sort of saying like, okay, like what does this person have meaning in their life? Where's value is kind of a broader concept that isn't necessarily like individuals tic that make sense.

I was meaning related to purpose.

A lot of people will talk about like the meaning of life, right? And you might kind of rephrase that as like the purpose of life, meaning of life, purpose of life, they seem like kind of similar in both cases. You're sort of looking for some something almost external to life perhaps, or at this life.

I know a lot of people think that their meaning, like the meaning of life to them, is like the afterlife. It's god. It's something, it's something kind of supernatural and external to this life. Of course, not everyone has that view, but that's like a common view. It's sort of like if if you think that like life has a purpose or a has a meaning, you might think that life is kind of instrumental to something else. It's like, you know this life is just a journey to get to something else or or whatever um whether I tend to not like that type of view and and I think when I think we should not try to kind of make life instrumental to something else or something external, in part because we have no good reason, in my opinion, to think that there is something external. But even if there is something external like I I think we should be looking for meaning in life itself, we should be looking to kind of live in such a way that we can be satisfied and find meaning just in in, in that every day in in what we you know, our data, day interactions and in our our hobbies isn't so forth and not and not because we think it's all kind of culminating up to something greater you know in the future or for future generations, even if you know um it's it's it's whatever then um view interest.

So that sounds a bit like you're same meaning is related to individual or perhaps collective .

joy yeah I think I think IT is related to that, although I I wouldn't want to just reduce IT too. Like feeling happy. Um you know that's that's probably part of IT, but it's it's it's not just like an emotional state .

and what's the relationship between meaning and good goodness I mean.

I would say that goodness is is brother. I mean, goodness is a very it's a very, a very broad concept that's just kind of pointing to like anything that you think is valuable, anything that you can feel motivated to promote or something like that. And so you know meaning is good to by definition probably are like at least in the way that I am defining IT. Like meaning is good by definition, but like maybe there's like other things that are also good.

You think a sort of pen glossin perfect simulation machine, you know, like the what what do I call the experience machine is? Is that good?

Yes, that's a very good, good question. I think IT depends for me. IT depends on whether there are like other people in the experience machine with me basically, if if we're like, if there's like millions of people are living in an experience machine having relationships with one another, am I am not necessarily opposed to that a minute might depend on the details.

But in a way, as we develop our technology to kind of make our environment more comfortable for for ourselves and make like you know make IT easier for to exert less effort to kind of like make things the way that we like um we're kind of gradually moving toward like a collective experience machine, you know um and then. Like virtual reality is like obviously once step further in that direction, but like it's all kind of just like like yeah, so I don't i'm not necessary aries opposed to experience machine type things. But if it's like just me in an experience machine and there are no other people in with me, or like other people, there are actually like just fake and are not actually like autonomous conscious individuals themselves, then I would probably oppose that.

What's the relationship tween meaning and consciousness?

Yeah so that's a big question hour, sly. And and it's one that I ve often thinking about a lot recently, at least the way that most people that go out conscious ness, you know consciousness, something is then that at least kind of strongly suggests that IT is IT might have moral worth or might you know because if something is conscious than IT is probably capable of like pleasure and suffering, having like experiencing good or or bad states of consciousness, at least like all of the things being equal. Like it's like probably good to like help that being kind of experience Better to states of consciousness um I yeah I don't necessarily kind of reduced all goodness or value to states of consciousness that that sort of the utility arian perspective but I do think that IT is like A A big chunk of what I value um yeah do you think .

you need consciousness in order to have moral status?

Well made to end a little bit on what you mean by moral status, I would be kind to say no just because like if I thought about that, there would probably be some counter example, something like that unlike president to make like a blank at statement.

But I guess maybe I just say, I mean like some people think that like you know nature itself like like a mountain or or maybe a tree or or something um could have moral status without being conscious and I don't know. I think that's like at least somewhat flaught ble. I'm not sure my exact view on that.

Do you think there's more meaning in a globally connected world or a locally connected world in IT? Just to make the question not completely achieve, I mean, you you could become a big entrepreneur and you could be like bill gates for something, or you could just have intrinsic value in your local community in doing the garden stuff.

I don't know. I guess I tend to think like connectiveness is good or something or like unity is good. But I also yeah I don't have a strong intuition though.

Well, do you think echo chAmber like is IT good to have a diverse group of pockets of different people doing their own meaning, making or did or deed? Are you more over her monstrous person?

Yeah, I mean, I definitely have like the interview that diversity is good and actually. This kind of makes me think of something that's just been on my mind recently in in vienna because of course, vienna is you like a german speaking city, but english is like everywhere. And actually a lot of times people just kind of use english by default, like even if they dig, don't know where you're from, logical, like use english um and I I take IT that like a whole lot of cities these days that are not traditionally english speaking, like her kind of like that now. And I feel conflicted about IT because because I okay, on one hand, like english is this like were from time, it's enabling people to communicate with each other and you know that's great. But else there is part of me that's like, I don't know like if we go if we keep going on this path and it's like is anybody gonna to be speaking german in one hundred years? Like i'm not sure I feel like that IT seems like you're kind of losing something if that happens.

But yeah yeah we should preserve you know local culture situated knowledge yeah .

I definitely feel like that is is somewhat important. And I guess maybe one of my host of the future is that with A I and you know, the automation and abundant little brain, that people will be able to kind of preserve and and enhanced their own and kind of like a tony and local works and stuff, you know, like because people just have more time and and and kind of resources to just keep doing useless things. You just said .

something very interesting, which perhaps we should have highlighted earlier. You're an A I optimist, and you said that is going to bring abundant. What actually mean by that?

I mean I think that um you know in the next few decades, like A I will come to be like at least as good or Better than humans at like basically all the jobs that humans are currently doing um and this will enable if things kind of go right politically, which they had not, but this will enable um kind of a society of abundance for everyone where you we we don't really have to work for a little thing, whether it's through like A A universal basic income or just you know people have you know investments that are you know enough to live on or whatever. And yeah so I I definitely I am hopeful that in many ways the future will be like much Better than today's world. But I also I do want to recognized that I goes also many ways that things could go wrong as well, and I think we're probably in for a bumpy ride.

Can can you expand on that little bit more? So why do you think that we're on the path to presumably a genial, super intelligent?

I mean, that is like a beautiful arguments you could make. I mean, one of them is just like super high level, like well, technology is just continuing to improve. And like there's no like doesn't seem like this like a physical law saying that you can't build super intelligence.

So like we're probably eventually gonna get there. Um and then you can look at like progress you know recent progress in A I and IT seems like you know you can debate on like the exact time as I actually don't have super precise timelines. I wouldn't say like oh within three years or within twenty years or whatever.

I'm not really sure um exactly how fast progress is going to be in the coming years. But I don't I guess I don't anticipate like a super I don't anticipate like a platow in like a really strong sense, like I think we're probably going to continue. I'm seeing AI progress and IT seems like if IT continues just that roughly, the current rate or like within in order of magnets could have been of of the current rate, like will have very powerful A I and very versatile A I like within my lifetime.

just was affect on them in the those who say that current A I is basically a suda system one and it's getting Better because it's just memorizing more and more of the long tail. But you know reasoning which is the same, deriving new knowledge, you I kind of intrinsically in in in the model to achieve a goal or something like that.

I I do take issue to some extent with the people who say, you know current AI doesn't reason period or something like that. I think part of the issue is that it's sort of A A terminological question. I just like, well, how do you define reasoning? How do you define planning? Like depending on your definition, like maybe they reason, maybe they don't reason.

And I am not sure if it's useful to like debate that or enough to kind of go back and forth on that terminal logical question. I guess I think even if you can see that the like all those serious kind of barriers ahead, we're like we're gonna to come up with like a new some sort of a new architecture, new paradigm or something in order to get in a system to reasoning or about that. It's still kind of hard for me to imagine that, that will delay progress so much that like we it'll be like twenty one hundred and we still don't have like A G I buy whatever definition you you care to use.

I think that's like the the thing that I most confident about. I like at the longest, if it's it's twenty one hundred and we still don't have A G I va like anyone's death like like everyone's definition, then i'm like very surprised. I think it's probably earlier than that. But like yeah um and if it's twenty one hundred, I still feel like that is soon enough to like be thinking a lot about now I guess that it's probably it's probably earlier.

But what is your definition of asia.

right? So I I don't maybe I shouldn't even have even brought brought up the term because I tend not to like the the term just because it's kind of vague and broad and people have different definitions.

I mean, maybe the definition that I kind of like the most just because its road is just like N A I that can do many tasks or something that and like where answer like generality is just like this continuum and you could just have a is that like more and more tasks? And by this definition, like GPT four is already n agi. It's like a general in the sense that IT does many task.

IT doesn't do all the task. Humans do obviously, obviously, but IT has many, many tasks. And so it's a general in that sense. It's kind of a deflationary definition, which is why you might even like kind of a noise on people because it's link it's clearly not what like other people mean when I say A G I. But you would .

concede right now that doesn't it's inefficient. There's an insignificant computation required. But in principle, we might .

be able to design IT Better. Yeah, I think that's true. And there there's some interesting work actually on like data efficient A I and i'd personally like to read more into this because I i've only read a little bit, but there was I believe, like a contest recently where where people were kind of chAllenged to like um build the language models that are like as efficient as as A A child.

There's something like that um and I don't think they got all the way, but like there actually was a lot of progress and I believe one of the top kind of um the top methods in that contest was just in in part to use a lot of a box. So I currently like with language models, we usually only do one epoch on the training set just because we can afford to do that. There's just so many you know so much text on the internet that like it's Better to have more data and supposed to like um less data and to do more at box. But if you do do more epoch on um the same training data, like with some twigs and regular zone and stuff, you can get like a lot out of left data IT seems so um yeah they still work to be done on that front.

I know you are a fan of for recognition yeah 传媒炸弹 um yeah .

so it's still a thing that i'm i'm learning more about， but um you know the idea of free cognition or four cognitive sciences, the idea that the mind is uh enacted uh embodied, extended and embedded. I definitely say that and like the .

wrong order or something yeah it's the ecological so the extended one is the one that charmers and andy clock knocking by the backdoor okay well.

right. So I understand this like a debate of out this whether extended counts or or whether it's psychological. But I guess the way that I I like the extended mind this is um so I tend to include them there. Um .

interesting isn't IT because the the extended version is still a form of representative alison and computationally ism, which the said that they they didn't like because they fundamentally believe that you can't simulate a living thing in silicon.

Yeah a lot there. I mean, I guess one thing that says I don't you know, maybe to for the audience, like the extended mind thesis, just the idea that like the tools that we use, like computers or or notebooks, are all sorts of things that we used to kind of enhance our cognition are like literally a part of our minds are like are it's like useful to view them as part of our minds um and I guess for me I don't see why the exterminate this assumes computationally ism or representations ism. I think like you could be a computationally alist or representation alist about condition and accept IT but like it's you also don't need to pollute those things.

I have to be honest with, I think I agree. So I think we might be strange because we are big fans of the five years yeah and we both believe that you can talk about all of them in a computational sense. So for example, we were just talking about the guy who wrote minded in life event. Thomson is an autobiography st. Tell us about him.

So ein thomson is is A A pretty cool guy, a flight of her end. At least i'm not sure if he calls himself a cognizant entities, but he liked works with cognitive sciences. And he has been kind of behind a lot of these ideas of, like embodied cognition, an active cognition and a so mean, tim, over the past year so often talking about this position that evan has that basically life is inherently material so like you couldn't have a living thing in a simulated environment even if the simulated environment were like a very detailed, for example and he's got sort of another argument for this position, which is basically that he thinks that computation itself is observer relative um or that that you know computation, you know algorithm simulations, all of this stuff um it's all sort of dependent on some observer um or some some agent, some living agent I suppose who is the using the competition kind of A.

Interacting with the computer and and using IT for certain purposes and thereby kind of in viewing the computation with meaning etta and without the observer, without the living agent, you know computation is meaningless. It's not even computation like uh, competition at all. And so you know, based on that view, he's like, okay, look, if you try to simulate life in a computer, it's not really life in like the full sense of the word and IT also wouldn't be conscious ous IT wouldn't have genuine ouldn't sentenced, wouldn't just genuinely have feelings or anything like that um because it's you know IT is we who are kind of like um giving meaning to the simulation wherefor evan um life is kind of unique and that IT gives itself its own meaning so like like genuine life in the real material world as auto poetic IT creates itself, it's like kind of actively reproducing itself and thereby creating meaning and so you can't have that in a simulation because the simulation is always gna buy definition giving having a meaning given to IT from the outside.

That's the argument. I think both me and tim disagree with this or at least I am very doubling of IT. Um and I I guess my main concern with the argument is that so I I accept the idea that competition is kind of observe a relative emit gets meaning from like being used by some agent but IT sort of seems like that's true for almost everything that like the whole world are just like material objects are sort of we're always kind of interpreting and kind of you know and it's actually part of Evans kind of own philosophy of an activities that like the world and the mind are kind of co created together in this like in this like process of living. And so it's not really clear to me how he can consistently have that an activist view about everything, about the whole world, saying the whole world in other world is created by living things in instance but then kind of single out computation and saying like one though you know the simulate worlds aren't really real because um they're dependent on loving things for their meeting because IT seems like you seeing that everything is dependent and living things for their meaning um and and this I also just say one worthing um there there's an interview where he kind of talks about this um with Richard Brown D I believe and he himself kind of recognizes that those attention in his own view about this that you know maybe you could just you know he he's trying to raw distinction but I maybe you can't really draw that distinction because you know anyway.

yeah we listened that into the U N. I think that was about three years ago that that he did that if I understand correctly, he's still a materialist, but he's kind of a material shows ist. And I don't mean that pejoratively. But what what what he's saying there is that you can simulate things, but there's this kind of semantic graph of meaning.

And as you're just saying, almost per vickers time, this is what mark bishop said, you know, the meaning of a computation is in its use, and you can always trace back all of the edges on this grave until you get to material. So he is saying that there are things in the real world that exists, even without observers. You know, they have a kind of primacy.

And obviously of the first argument is making is, is the the basic one, which is that a simulation of fire doesn't get. And then his slightly more new on starkman is that only things in the real world can exist without observers. And I don't know there just seems like it's strange, right? Because we could know we have this qualification experience and we have meaning and so on and we could be in the matrix yeah so why do we feel that was so special?

Yeah I think that is a genuine um objection. I mean because IT IT seems to me like, you know no matter what you think about like the the simulation argument of the matrix like probable IT is IT seems to me like we could be at a simulation like we don't have like a priori like certainty that we're not a simulation and similarly like IT could just turn out that you know if you crack open my skull, i'm act there's actually like silicon ships in here and you know like subjectively I wouldn't like know the difference um and IT just seems weird till like be I know there are philosophers who actually say, well, we know we're not a simulation because the simulation be conscious like that's actually if you that is out there, that pass was told that I I don't know I just I just have a pretty strong intuition that like that's just being unjustifiably confident. I suppose I don't know where they're getting this confidence um and yeah IT there's also kind of a weird thing to um you know I I don't think evan Thompson believes in god but if you did IT seems like okay is gonna say that like if god exists then like words zombies or we do not have meaning because god is a you know like traditionally god is sort of thought of as like almost playing the role like the simulator of a simulation even if you don't think that IT is a little computer simulation like giving meaning to everything you like created everything at a design for everything um and I just seems like a weird position to say like if god exists then what is on busy something that so yeah yeah how would you distinguish .

event thomson's argument just from a standard material? Eliminate materialist physicist. Does he think consciousness comes high rap? Or does he think IT starts .

quite quickly? So he actually, so in his book, mind in life, which I ve read the first couple chapters of, so going through IT.

But so to the extent that I understand is because they should, he actually does come at philosophy and and metaphysics and all this from A A fairly different perspective than like most certainly most like naturalists are kind of a limited materialists or illusionists for everyone to say, because he starts from what's called phenomenology um phenomenology is this kind of, I get branch of philosophy that we started by edna, who searle and like the early, I think he's like the late eighteen hundred, early one thousand hundred ds and then was kind of hyder gr and then merlo ponty all continued this this line of work um but the basic idea of phonology is just like we start our philosophical inquiry with our lived experience, our embodied experience as as a metal point I would would like to to emphasize um so they say like okay look, the things that we perceive, you know I perceive my body, I see a view, I perceive this room. All of this is the realist that anything can be that is kind of our starting point for philosophy. And then from our lived experience, we then start to make philosophical and the scientific theories that allow us to kind of.

Understand and kind of predict and control our experience Better. But you fundamentally like live experience is kind of the the foundation of everything. Um and so from that perspective, he's definitely not going to say that like consciousness is an illusion or doesn't exist. He does have a different perspective as somewhat different perspective on like what conscious ness is um from um some other philosophers um but uh but yeah he does start from kind of experience or conscious ness um and usually the people like dan danner or keh Frankish or some of these people who are kind of more hard core materialists um are not starting from the experience they want to sort of say, well, maybe we don't need to start from anywhere or doesn't or we start from science or something like that um and then because they are sort of starting from science, they just say, well, we can't really make sensitive like the consciousness st thing so we're just onna forget about .

IT sounds like it's quite similar to idealism. You know that the stuff of minds is fundamental and even then there are kind of subjective and objective versions of of idealism. But would you would you kind of put him in that bucket?

So I think evan would i'm fairly sure he would not want to be called an idea list within a phenomenology there. There is sort of attention. And in different phenomenology sts have had different sort of ways of thinking about what experience really is.

I think amman who seoul might have been more closed to kind of an idea as perspective um he was famous for this idea of the APP k where basically you um you're looking at your experience, you you're describing IT, but you try to um refrain from assuming that your experience is reflective with some objective reality. You don't want to say like oh, it's not reflecting your objective reality. You don't want to assume that either but you're just anna withhold judgment about whether those an objective reality behind IT.

And so that sounds a bit more kind of like an ideal. This approach we're like, well, it's just this experience which is like kind of mental or something. And IT may not correspond to object ability, but on other phenomenology st like hycy or metal o prom are usually seen as as being less ideas, especially metal. Um he really focuses on the importance of of the body as kind of the vehicle through which you experiences things.

So you have just wants to say that the body is is is real and the body is not really a mental thing in the traditional sense um and he also says has some interesting thoughts um you know he'll say that like our direct experience, like my direct experience right now includes you as a person and includes like a camera, includes you know these windows IT, includes like objects he would say that like IT wears a lot of people when they think about idealism or at least certain ways that talking about conscious experience. They'll say like well, your experience doesn't actually include people windows objects because that's an interpretation of your experience. Your actual experience is just like colour and like raw sounds like uninterpreted raw fields s and i'll say that's the thing that that's real.

The interpretation is like kind of faker or something um and that's like meet loops. He wants to reject that because he says like, okay, look, the person on the street, or like, before you started thinking about philosophy, you definitely didn't think of your experience as being about colors. And raw feels that kind of this, you know, the very idea that you're seeing rock colours separated from the objects that have the colors is this kind of post hock kind of abstraction that you get from philosopher thinking. But like really what what's fair in your experience is just like objects that have properties, like colors and stuff, but like the object is there and you kind of experience IT directly um and so if you have that approach, it's it's less clear that IT makes sense to call IT idealism I mean, maybe you still want to call the idealism, but it's more of yet it's a bit hard to categorize maybe in the traditional dichotomy of materialism and idealism yeah I think .

there are rough labor's of idealism which could be thought .

as realist. Yeah I mean, so I guess I like how john verva thinks about like the word real and that the concept, reality. He says that real is a comparative term. So like IT IT only really is is meaningful to say that something is real as compared to some other things that you're saying. Our illusions like you just say everything's an illusion. It's like, okay, I guess I mean, you know maybe that you it's not really clear what you if you say everything's an illusion or everything's real IT seems like those are almost the same thing because you're not making any distinctions IT seems like you you kind of in order to make the concept of reality meaningful, you need to be able to make distinctions between, well, this is real or more real than the other thing and so it's kind of A A matter of degree and in a matter of comparison between things um so yeah I I don't um I don't like kind of hard core like reductionist or materialist views that wants to say that like well you know the only thing that's real like quantum fields are like particles you know or something like that um I mean you know you can say that but it's like what is what is the point of saying IT like a know I just seems like you you're kind of trying to be eg or something, but like it's it's not really it's a it's not a useful way of thinking about things anyway .

in then and Frankie, when they talk about illusion is an irrespective of consciousness.

what do they mean? Yeah so I think honestly the term illusion ism a kind of frustrates me a bit like the word itself um because a lot of people when they hear illusions m they think that what what original are saying is that consciousness as a whole doesn't exist. No one has ever been in pain. No one has ever experience anything um that's not what they're saying at least I mean maybe some people what they say is no people have experienced to pain. People have experience exist, conscious ness exist.

It's just not what you think IT is and heat will say that qualia is like particular philosophical notion of conscious ness um is not real and is a solution um that said, I you know yeah so I I don't like the term I do also think that on the substance I disagree um I think you know keep francais particular and some other illusionists well I know at least some millions will say that there is nothing that IT is like to be you they want to reject to be like what it's like talk and they have like some arguments for this space because we'll say that like oh, you're always interpreting your experience and like what it's like to be, you just depends on how you're interpreting IT. And so there's no like a objective interpretation. There's like different arguments like that.

But I I think those are all just kind of sort of non seconds or are you know I I I think yes, you can interpret your experience in different ways but that doesn't mean it's like unreal and they sort of gets back to just like the kind of overuse of like the term unreal or illusion. I'm just like what why are you saying this? Like what is I don't know like what is this? How should I live differently if I think that like this is unreal or something?

I I don't know. It's not really clear. yeah. And whenever .

you try to make these arguments, you get accused of being a jew list for I quickly as and john so wild, Thomas nego, who came up with this term, you know, what is IT like to be? He was also accused of of being A A journey. But it's really difficult. Isn't that to talk about this quality ative experience in, you know, any kind of meaningful way?

yes. So there is there is this notion that like, you know, quality of experience is like infant or that let's like a term that people often use, I mean, in a certain sense is like obviously not literally true. Like you you can try to describe, you know, I describe my experience right now. But but what they're saying is like there's you know you you are always going to be missing out on some quality of of your experience. You can never like fully describe IT.

Um and I mean I I think that's true, although I I think I would I kind of want extend that to like almost everything I kind of want to be like, well, yes, so like experience is um is inevitable in a sense that there's always you can't like fully describe all aspects of IT, but that's kind of true of everything and that maybe that sort of ties in with my my sympathize with the omens logy as as I was talking about. Like if you start from lived experience as kind of like the ground of like everything else, well, like yeah lived experiences like you know not fully describe a poll but then like that's the ground. Everything else like nothing else is like full describe but anyway, that's kind of how I think about IT.

So you wrote an article in recently, I think he was on less wrong. Is that right?

It's cross potion less wrong and done optimists that A I, and might I should say, quinton pope also go with me counting arguments, have no evidence for A I doom.

yeah. So there's this. You know, there are lot of people who are worried that A I will, causing pocalypse kind of take over the world, kill everyone, something like that. And there's kind of an argument that is sometimes used for this conclusion. It's really kind of A A family of different arguments that are sort of similar, and it's kind of hard to pin down actually, which is something we realised after we work this article we kind of like proposed okay, here's what we think the argument is and then people later were like, oh well, you like missing her for at us and so it's it's hard depend on exactly.

But the something like this is this um when you are a training an A I to be nice or an a line whatever you're trying to make a super smart AI that's that you know cares about heat ball and he has your best interest that heart is that you know that that we are trying to do um but there's this assumption that OK the A I is going going to have a goal is going to have like some like overriding goal that explained its behavior. That's always that's always sort of an assumption which I might question but that this that is kind of built into the argument. There's like some goal that is kind of describing its behavior um overall and then they'll argue will OK there's like many different possible goals that the AI might have.

There's like you infinitely many. You are like trillions of them or something like that. You know the A I might genuine to help help you, but I might also want to maximize paper clips. Or I might also want you might actually want to convert everyone to more minister you know whatever like IT could be anything um ah they would say that could be anything anyway um and theyll just say will look um most of the goals that I might end up having would motivate IT to act a line like pretend that IT cares about you.

I like really you know it's really a line most of those goals will will motivate IT to pretend to be aligned without actually being aligned because like its real goal is to convert everyone to more minister or whatever is and so the ideas like OK you're onna have this this like deception you know the assumption here is that is kind of understands that it's in a training process um and so it'll like recognize okay, I gotta play the training game and pretend you know that i'm i'm doing what what the humans want me to do um and then when IT kind of finds an opportunity moment, IT will strike and IT will kind of like take the opportunity to you know uh kind of get out you you know remove any kind of safety um precaution tions were in place that were kind of like sand boxing at whatever and I will like escape and kind of take control of the government or whatever. You know what it'll do. It'll do whatever IT wants to do.

So fundamentally, the argument is based on there are many possible goals that would all motivated to act aligned, to pretend to be aligned. But i'm like very few of them um are actually aligned goals. Okay.

that sounds look that like you know with instrumental convergence is saying that there would be many intermediate sub goals for outside goals. This is like saying that there are many, you know, many goals that would actually produce deceptive goals.

but many goals that would would produce deceptive behavior. yeah. So IT does IT also .

implies that the deceptive goals in like instrumental convergence implies that the instrumental goals are kind of fewer and quite standard, you know, like power seeking. What is IT a similar case here?

Yeah well, so that the idea is so you've got like a terminal goal that like it's motivates obvious behavior and then you have instrumental goal. And so yeah IT is there is kind of an instrumental convergence claim kind of built into this that like deception, like you could view deception would be as an instrumental goal in itself for something like that and like power seeking would be like instrumental al something um yeah so in in a certain sense this is IT is kind of a retook aging of of other arguments that have been put forward before but yeah so I you know in in the article we um give A A variety of more bottles or or or arguments to this um so our first critique is like, okay look this general line of argument can't possibly be reliable because there's this other argument that is like almost structurally identical to the original arment that has an absurd conclusion. Um the absurd conclusion is that all uh basically almost all their networks will over fit to their training data and never generalize at all.

Okay, so the argument goes um there are like there are a very large number of functions like possible functions that the neural network could learn, which would all be consistent with getting like low um loss on the training data. Okay, but almost all of those functions would uh you know due terribly on the validation set or on some other distribution whatever. Um therefore you should expect that like almost all the time when you train um a model, uh IT will like IT will learn one of those other functions that do well on the training set, elected to terribly outside of the training set.

Therefore, you should expect almost on their network to over fit. okay. Now clearly this doesn't this isn't happen. I mean, over fit is a problem. It's not that that never happens, but IT doesn't.

It's not like IT always happens in a like extreme sense that would be expected, but like be predicted by the idea. Of course. You know there could be counterpoints to their son.

You know we could get into that if every once but we we're then like, O, K, we'll wait a minute. So why why is this general argumentation structure unreliable or are wrong? Like what is actually going wrong here? And we point at a couple different problems with IT. One problem with that is um that IT relies on this philosophical principal called the principle of indifference. So the principle of indifference, um you know I might be easiest to um use this like simple case.

So if you just have A A coin with like two sides on IT and then you ask you what's the probability that's going to lend on heads and then what's the probability that's going to land on tails, the principle of indifference says, well, you should kind of like a sign, like one half probability to the one side, have probably to the other side because there is only two possibilities and you have no reason to like prefer either like one of the other. So it's 5k so you know it's it's an intuitive principle um and I think IT gets its intuitive plausibility from cases like a coin or a die with like six sides are going to assign like one six probability each side。 But I think this is this is actually it's a subtle falaise argument because you know um there there's kind of a different way of applying the indifference principal that would get you a wildly different result.

And and so IT goes like this, if you flip a coin, you can think about the outcome of the coin flip as either as binary. That's like how we did IT before. You could also think of the outcome as being a three, like a 3d orientation of the coin flip。 That's actually kind of like a more, you know, reductive, materialist way of thinking about the outcome of the coin flip, right? Because it's like really it's just like a material object that's got a certain orientation and we're like imposing this interpretation of head tails on ever, really it's just like an object, right? So maybe what you wants to do is you want to say that the outcome is this like 3d orientation。 There's like an angle associated with like the X Y and Z X Y or something like that.

Well, if you interpret the outcome space as being the three orientation, then the principal indifference would say, well, you should have sign like, you know, like every possible orientation should have like equal probability, right? But that's clearly wrong because like that's like almost never going to land like like you know, on its side, it's not gonna in like an orientation that's like gravitational unstable or is going to like fall IT, right? So like clearly it's not the the final problem with the principal of indifference is that IT depends on the way that you're like cutting up or interpreting the outcome space.

And different ways of cutting up or interpreting the outcome space give you wildly different results. Maybe i'll give like one more example so you can imagine um there's like a guy named bob who where you know that he is in the U K. Or in france or he he's in like this geographical region of like the U K.

And france joined to get OK. He's somewhere in there, but you don't know where he is exactly. Um now one question you can ask is like, oh, is he either in the U K.

Or is he in france? Well with the principle of indifference, you know you would assign like fifty percent credence to france, or fifty percent credence ts to the U. K. Okay, but you could also cut up the like space of possibilities in a different way.

You could say, well, he's in france, or is in england or wales or northern ireland or scotland or you could look at like, you know different like uh like regions of france. You know you could like cut things up in a variety of different ways and you would get different answers. Like if you cut, cut up the U.

K. By like a different constituent countries, you would say that like it's like a one fifth probability that is in france and then like a four fifth probability that is in the upper right. And I think this is pho sophs have noted this for a long time and I think they're still debate on like how exactly I was under this.

But you know, it's generally agree, like you can't just apply the principle of indifference, you willingly ely like you. It's either just totally wrong or you have to like very careful with how you apply that. Others SE gonna get like crazy results.

And I think this is one of those crazy results. So I think that like basically the the counting argument is. It's assuming this like it's assuming that you can kind of cut up the space of of outcomes of the turning process into like these goal categories or something like that where it's like, okay. I mean, they just like a variety of different problems with this. It's like, okay.

So fort t i'm like one way of thinking about IT is like you're saying there's like discrete goals and there's like you know a billion different goals and then you're like animal choosing from a billion different goals was like OK first of all, IT seems like really weird to assume that like goals are like discrete things because that it's just onna depend on like how you describe the goal and like that. That's just like really strange. okay.

So maybe you don't want to like describe the goal, that descript things. Maybe there's like a continuous space of goals that like you know, fundamentally, it's just the problem is like you can describe the space of possible results of the training process however you want. You can describe IT as there is going to be alive or it's not.

It's fifty, fifty and then you're guilty. You know it's so it's I think this is just a fundamentally unprincipled way of thinking about IT. And you know at the end of the day, if if you want a more reliable answer for like how likely the AI is to be online, you need you I think you should just not rely on in a difference principle at all. And you just need to look at, okay, the actual details of what's going on and try to like kind of come up with like a mechanistic c understanding of IT and not rely on these like abstract principles.

yes. And this is related to my position on agency instrumentals ism or agency illusion ism. Because you could argue, on the one hand that goes are just not real. But you could also make the argument, as you have done, that the significant ambiguity and how we represent goes.

Yeah I think that's right. So there's there's a part of the the article which I I I am like actually a little bit like if I were rewriting the paper, I I might start the the post next post I would uh rewrite IT differently probably but we we do make this point that know um the counting argument seems to be assuming that like goals are real things that you know there are so real that you can count them right like and like it's it's really true that an AI has like a particular goal and not some other goal as opposed to view and goals more as just like useful descriptions for a compactly describing behaviors.

I still like mostly stand by that. I think that there um that kind of more to me people or people who kind of tend to use this argument are goals too much and are are taking them too seriously, kind of as as an abstraction. That said, I I guess I think IT IT would be easy to go to take this too far in the other direction and say, well, like goals are just an illusion and you know I don't want to say that either. I mean, you know if if it's if goals are useful enough to press to keep talking about them all the time, like I want to say, okay, in some sense, their real or kinda real or something. So yeah, it's A A turkey question.

yeah. I thought I thought about this quite a lot. I disgusted with phillip ball as well. I mean, my first intuition is that any intelligence system would have goal dynamism. So a kind of, you know, he wouldn't make sense to think of this boston, an superintelligent ence that had A A single go and IT even if they did exist in the way we conceive of. We're talking about this big, inscrutable, able, intelligent thing.

So surely the way we abstract goals might not be what the goals actually are and is also related to this intentional stance from Daniel, and which is that we as agents and we adopt this stance, we build a model, we do abduction, and we understand what the rational behavior of another region is based on our projection of of what their goals are. But that is very much an instrument. Mentalist view is just what .

we think that goes up. Yeah, that is a good point. I mean, so dennet has I think most people interpret deny as as being an instrumentalist of this as saying, okay, it's just the intentional stance.

It's just a useful way of thinking about agents. Its in a word, we're just describing goals two to systems. But like in some deeper sense, it's not real. But I think, you know I. Yeah that this just gets back to like you know how do we define real and like what what does that mean to say something is is real or unreal? I mean, I do think yeah if if something is like so useful to talk about that we're talking about IT all the time, like you can't say that it's completely unreal.

I think one useful distinction would be if I had consistency. So if you really is an incredible impose response machine and it's just flitting from one goal to another dynamically, then I think that would be fair to say that he didn't have goals. You know the goals weren't real.

right? Yeah you know you could argue I I guess some taking sort of a fragmented stand here. The if the asian hat, you know, if its goals are changing all the time, then IT might not be useful to describe IT as as having goes at all. Maybe maybe it's it's Better to just talk about patterns of behavior or something that with this .

in mind as well, know a lot of go fight people and a lot of symbolist. Now they think that the best way to design the nail system is to explicitly craft goals, and maybe some kind of meta learning system that creates sub goals and so on. And i've always felt that this is mixing the description with the thing.

So IT doesn't make sense to build the description. You should sort of built the actual thing. I mean, what what shall take on that?

Yeah I mean, I I guess I tend to take the view that like I think I think I agree with you. I kind of like the analogy of like training a helpful harmless ai to like cut of raising a child. There's something like that now that you could like obviously take that like way too far, like an an analogy way too far.

But I think you when when you're raising a child, they're like training an animal is that you're not hard coating goals into IT or like it's not even really he usually not even really trying to like hard coat goal into IT in in the technical boston sense of goal that where it's kind of like the single thing that's like motivating all the rest of your behavior. You're usually just trying to kind of inculcate general kind of patterns, trying to inculcate general like values and and kind of instincts and patterns of behavior. But it's it's not it's not like inserting a utility function or something into the system.

Yeah because this is relevant for a long as you say we bring up kids, we are still principles and and virtues and yeah how how does this how does this help us with with aligned? I mean, so what one take would be we just and we look at behavior alone and we just treat the system as as incredible.

I guess I will say, you know, obviously, as an interpret ability, we researched your like there are things we can do with a is that we can do with kids or animals. You know, we can look at their internal states and we can kind of monitor them. I like a very like a much more fine grained level of detail than we can with kids are animals and that's actually sort of argument that me and and quinton pope made in a different post.

A I is easy to control um but um you know A S are our White boxes in a in a sense that um you know animals and and other people are not um they're just in the literal sense that we can just peer into IT and see and you because it's not like computer code that any anyone wrote. It's not that like we can like we write the code, but we nevertheless have like a variety of tools that we can use to kind of appear into the A I and and in some ways it's like see what it's thinking you know can we can train probe is zn that we can look at, you know, for, like language models, for example. It's actually another paper that I did on one of my first papers call the tune lens.

You can train these little linear, the basically like linear kind of layer probes, ler classifiers at each layer of a language model. And you can see it's predict, like how it's prediction of the next token changes from one layer to the next. And there's like interpretation predictions at early layers that like you know our kind of relying on like simpler features of of the getting put and like IT gets more sophisticated as IT goes up.

And there's like all sorts of things you can do like that so that that's all the thing. Like we have more tools um and we have White box tools for A S that we don't have for kids and animals. That said, I do think we can learn from the the the human and animal cases. Um you you know just one example is like people are now working on um kind of data accurate.

Um you know when we first started training big language models, there is a very little creation of the training data I think like open eye used um reit karma something like that to like a filter links spiers like not know I was certainly not kind of like fine grained creation, but what people are kinder trending towards now, especially for smaller language models, is to like um we're using a lot more synthetic data. Generations are using like large language models or all their language models to generate data for the new language models. Amen were also using using A I as part of the data security process to kind of on a more fine, great level figure, like what what search of things do we want our AI to see basically? And you know that that is kind of similar to like how children think about.

Like, well, we we want our kid to see certain things and not other things. And you know kids are impressionable. A I S are much more impressionable than kids are even um and so yeah I think you know careful data security is like a huge part of of alignment. I think um there there are lot of simple things that you can do that will go a long way.

yes. So there's like the there's curating what goes in in terms of data and then there's this whole you know there are many things like tree of thoughts and you know are a chef and ways of behave behavior shaping on on the output. And there are companies, for example, doing alignment systems where they explicitly craft goals.

They say this is the kind of going what we want, the company like this amount of profit, and we wants, you know, this person to meet this performance target next year. And I feel that britt lizer the system for a couple of reasons, I mean, IT IT introducers good hearts law, and as the clever hands effect as well, you know. So I might might do the the right thing for the wrong reasons. And I also, I feel that we need to have some kind of dynamism. I to have an intelligent system like the system might need to do things that we can't conceive of in order to be successful.

Yeah, that's true. I mean, I guess, I guess there's different ways that you could try to kind of give a goal to an a eye, right? So I think there there are certain versions of this that seem more okay to me than others. I mean, like I don't know just a an any sort of organization, a company, anything like that. A lot of times you know employees are given a goal like like you know they're given a directive, which is basically kind of a contextual goal like well, we've got this deadline you know to finish this report and we have a certain quota for like sales, you whatever.

Um but and I think that you know we do that all the time and of course that does you know can cause problems like you know if you have quotes, like they can be you know good, hard to whatever but like ultimately IT does seem like these sorts of things are kind of insensible, just breaking problems at the parts and so forth. And I think you I think one problem I have with these sort of some of these like arguments for doom is that they assume that when we give A I S goals is going to be this like the air is going to kind of like take the goal um in this very unnatural kind of like it's going to kind of like taking as its you kind of purpose in life, like everything okay. Like you told me that i've gotten like maximize or i've gotta make a some sales quote, well, everything else goes out the window.

That's like my only purpose in life. And if you try to change my goal now, i'm going to kill you because I only want sales and nothing out. It's like that's not how humans work.

And I also don't think that's how how any plausible AI system is going to work like the way that people are starting to build. You agents and court like language models these days is not building in permanent over like overwriting goals. They're just prompting. Basically they're like giving like okay in this context, like your goal is to do X, Y, Z. You put like you know it's like. I don't think we should expect that the A I is going to like be so stupid and acting is like stupid and inheritance ure type way where IT just like forgets its common sense and we and IT forgets that this is just a contextual thing that IT needs to do and it's going to be completed and then it's going to be IT should be ready for further instructions.

Wonder what your position on agency was. So you know you had a language model. IT learns a text distribution. You know it's like n grants on steroids and then you do this a relate cheerful and you can do chain of thought and self reflection and iterative prompting and tree of thought, fala geometry. And all of these things are placing significant guard rails on the the kinds of trajectories that that use sample. If you are, you're making IT more and more drain specific to the particular thing. And all the while, there are people who say even in this setting, even though we've placed all of these guards on IT will have some kind of divergent agency, you know, which was to say we're telling you to do this, but actually IT has its own desires if if you like, watch your .

attacks yeah I so I I definitely don't think we should expect like can have emerged agency or like autonomy from a system like this. I mean, you know that's in part that that's just not how we're training these system. So you could imagine like a very different world in which we were simulating evolution or something um in our computers and we were like there is some sort of competition between different a eyes and the ones that like so is like survival, the fit or something um and that's how we got intelligence but yet in that case, I ve a lot more worried about like, well, they've got their own goals and drives vivants think all of that then I would be a lot more concerned but that's not how we're training them at all. It's it's mostly imitation and we get to choose you carefully curate the data that were asking to imitate um and then were you know just kind of reinforcing behaviors that we like and negatively reinforcing behaviors we don't like. I don't think that this kind of emergent autonomy stuff is gonna come out of that.

I think we agreed. So we we agree that if we create a high resolution simulation of the universe, then things like agency and intelligence are emerging properties, much like temperatures and emerging property. And we also agree that if you do this invitation learning in a language model with behavior shaping and you wouldn't you wouldn't get agency magi.

I'm just saying that I think sort of agency emerges in an evolutionary context, sort like a Darwinian context.

Or I mean, I guess if you're like trying really hard to make an agent, like maybe you can succeed at that, like in an agent in the sense of a system that hasn't its own kind of self interest and like some sort of some sense of a survival instinct or something um where it's it's not just kind of taking instructions from the outside, but it's it's got its own drives. Um I don't think that yeah allega said. I don't think because we're not simulating evolution, I know the group will get that by a fault.

And I also don't think those really like a like an economic incentive to create that. I know some people disagree and say, oh yeah, there is going to be an economic incentive to create like you know artificial creatures um but IT IT just seems like at the end of the day, we're trying to make these eyes to do stuff for us like IT. We don't actually have an incentive to make things that are uncontrollable as far as I can tell but do you .

think that we could create an the general system, which is still abstract, intractable enough to run on modern computers?

Yeah I mean I think well so I think that like for example um like a mind uploading of humans is like probably possible with like some technology I don't know like and if it's like soon but like that's that would you know if you could upload like humans then you would have like a agented systems with like self interests.

I would be that's interesting that you think we would. I I guess from an external point of view, I think that you know a brain in a vat or a person in a hamet's ally seal chAmber wouldn't have much agency.

Yeah so I guess when I so yeah we should separate um maybe they'll like kind of behavioral al question from like the more philosopher i'm not necessarily saying I mean we can get into this. I'm not necessarily saying like oh IT would be conscious, although I think like IT probably would. But I am just saying we would be like we should be able to simulate humans and humans are agents. And so behaviorally, you would have like you have like similar concerns like, well, like does the human actually care about me or they just trying to like gain more power, whatever um and you know you could you could have all those worries all the while thinking that you know it's a zombie or whatever yeah you know I agreement .

that we could you know upload a whole load of minds and we could do simulation of the university. We could have a virtual interactive agency in the simulated world IT IT seems like a step to have kind of like material virtual interagency.

Yeah so so I guess i'm also was swimming that there's some like if were I guess I M kind of imagining um the world that described in the the T V series pantheon which people should should watch. Um it's about mind uploading and there I mean it's kind of a weird timeline because like the mind upload happens before we get like just purely artificial intelligence that can do the same task. I feel like that's just not realistic.

But if i'm not going to get purely artificial things before we get mind up blues, um but like the first like they they you know do miner loading and then they start using the uploads as slaves. honestly. I mean they are using and they don't call them that, but basically they're using them for like economic purposes.

And obviously in order to do that, they're connecting the mind uploads to the outside world. First they don't use robots. Later they do have robots but they'll just do connect IT through the internet and virtual realities and stuff like that. And so there is interaction between the mind upload in the IT wouldn't .

be fascinating if we were in the matrix, but currently we don't have any kind of control panel with the with the super simulation or the super world. But maybe the simulators were using us to just do financial trading for them or something like that. Had little portal, and we press and buttons on the portal. And as soon as we have that connection with the super world, we might start to express agency in the super world. So we start deceiving our simulators.

Yeah um IT kind of reminds me of like a lot of weird speculations that people, unless wrong, have made of about um the um this is like a weird thing but IT there there's this idea of the salomon's prior or salomon's induction what are you doing asian reasoning but you have a prior over the different hypotheses that is waited based on commotion PH complexity which is the the length of the the short as possible turning machine that would like simulate hypothesis is something that.

And the weird thing that happens, if you imagine this is like, well, IT looks like there are like relatively short programs for returning machine that would like simulated entire universe. Of course, there would be like a very slow. So like in practice you, but like it's a short program.

And so then it's like if you imagine this, then you can imagine that there's like a simulated worlds where that have people in them that are like deceiving you and then they're like causing they are they like find out that they're like part of this solena h induction process. And then I caught to go where to wait. Yeah I don't think that like has any relation to like the real world of IT. It's just a kind of interesting.

So wasn't there a post on the E A forum which is called something like E A S wants to maximize everything but maxims ation is parallel .

ah almost so it's it's E A is about maximization and maximization parallel ET. But I hold them kind of sky and to be clearly in this post told him is not saying we should like, you know, hey, e start being start being E A right because I think he still identifies as affected malthus sts to to this day but he is pointing out that there's this in a real problem or or peril at the core of the kind of effective authorities ideology um you know E S kind of often defined or summer zed as doing the most good so this is about maxims ing the good and some sense but the problem is we don't really know what we mean by the good, at least in in like detail like we have intuitions about like, well, it's good.

You know save someone from a burning building is generally good, like reduce global poverty, you know there's like certain things that we think are like obviously good. But when you try to maximize the good, that's where you start to get into that kind of treacherous territory because now you're trying to max mize something that you don't have a clear Crystalized kind of even a formal definition of um and so IT can lead to things like um the whole kind of F T X tobacco with with th worth saying, begin freedom. Others went to jail for you know doing criminal are our ethical things um in the name of what they thought was doing the most good.

You know they're trying to make money to uh you know whatever possible in order to donate IT to you know effect about tourist charities um and you know that was their interpretation of doing good of course other is thinkless that's not what doing the most good is but like they they actually disagree about like what the good is. And you know when you when you're not maximizing the good, we tend to often agree about what the good is like because we agree on the simple cases we agree on you know let's you give some money to this charity, let's um you know whatever. But we start to disagree more as we push further out into like more and more exotic things like oh well, maybe maybe the long term future has like almost all the value because it's gna like last know trillions of years know many effective about have made this argument and like.

I don't know. Like is that like people are just going to disagree on that? I don't fundamentally I I don't actually think there's like an objective fact of the matter like built into the universe about what the good is.

But I what I think is that it's trying to maxim the good is just liable to. Lead to a kind of extreme behavior um you know more kind of disagreement and conflict between people. Um you know it's it's kind of in a certain sense like an extremist of you like by definition, you're like in going to the max.

Um and so that's why I I don't I I no longer of you the good as something that should be acidified. Um you know I think we should be much more we should be thinking about ethics much more times like like a virtue ethics for example. We are like the good is just you are trying to be a good person, cultivating certain ventures in yourself, trying to be more honest, trying to be more generous, whatever um and not in terms like trying to maximize something out in the world.

Um I think that's A A much Better way of thinking about things. And so that's why I don't identify as N E A anymore. I am not like hostile to all E A or or whatever.

I'm far from that. I have many friends who still do identify as he is. But I yeah I don't I don't identify that way anymore.

yeah. And this is not a bad mouthy. They do do many great things. I think since the focus on long term is a particular N A I safety I, I think that that's been a bit of an issue.

And that energy saying the two companies ents are as well as long termism. This this rationalist idea that you can reffing goodness into some objective criteria. And that brings me to the next question. I mean, would you define yourself as as a relativist? I mean, is that odd slip decided that you can refer .

a goodness yes, a relativism is that is a tRicky word um so in in one sense no um so there's like a there's kind of a form of relativism m which sort of says um you know basically I personally think that all like like I view all kind of value systems and perspectives as like equally valid or something. I think that's silly. I don't see a reason to believe that or take that perspective IT kind of leads to a weird sort of like you're just kind of complacent and like you just think that you know, everything's like I I don't know it's sort of like a tolerance taken to a very extreme level where you I don't want to like criticize what anyone else is doing and you're just, you know, so i'm not a relativism in that sense. There are like other road senses of relativism, both about like morality and about other things where I I might qualify if you're saying that it's just you know that like they're different.

I do think there are that the world I mean, maybe this kind of I got a trivial thing, but like the world looks differently from different perspectives and there are like just different ways of describing the world and I don't think that there's like one uniquely correct way of describing IT that like everybody must agree on or they're like totally wrong, I think like different ways of talking conceptual skin at a um can be can make sense for from from different perspective. So maybe that like house is former relativism, but IT depends what you mean. Yes.

because we live in a global and connected world and certainly that the north american cultures is very dominant as as a transparent. For example, you might not want to travel to dubai and what what what's your take on that?

Yeah I mean so I I definitely you know I I definitely don't want i'm i'm not the kind of relativists is like, well yeah do I like like intolerant um you know morality is like just as good as hour you know like like I want to say like even if even though I don't think those like I don't think like god is on our side. I don't think like the objective moral facts built into the university on our side. I nevertheless, i'm opposed to what device doing. I think they should you know be more tolerant and more accepting and like I would like act to like try to convince them of that whatever .

interesting yeah design in any way conflicts with with the .

relatively that so I I don't think IT it's sure door or needs to I think you can, you can. And this is what I guess this is a part of the philosopher Richard Doris thought that I I kind of like because a lot of people criticize him of being a relativist because he said he has made a similar point, like we shouldn't talk about one truth, that one correct description as many descriptions that are useful for different purposes inside a which sounds kind of relativists or whatever.

But he also, he rejected the term of relatives m and he said, look, um you know even though I don't think you like like even though there's not like an objective truth that's backing what i'm saying. Nevertheless, I oppose you know transfer bia. I oppose you know beating women. I you know you know I have these values and I stand by that. I am i'm going to like act accordingly.

So in the E A community, presumably the relativists in there and how how did they reconcile t up?

I mean, yes, so they definitely and i've i've met some of them and actually I think um so one self described E A who whose work I like a lot actually um and who does disagree with me on some of the A I sad stuff is a joe karl smith. yes.

So joe carl Smith has some very good um well S A is which are also he's like like spoken he's got like a recording of of him speaking them as well as which is nice anyway he's got some very good, I says on um metal ethics so like the the kind of philosopher like what is ethics anyway is the objective or not or what is IT about all of that so here's like several essays that are good um he's A A moral anti real list um so he doesn't think there's like morality built into the universe but he and he does kind of struggle with some of these questions because you know we kind of says at one point like, well, okay, you know if we're if we don't believe in objective morality as as E A as or or just as as people trying to do good, are we just kind of imposing our will on the on the university if you are just imposing your will in the universe that that doesn't feel quite as altera wisc as you might hope but just seems like you're kind of selfish imposing your own what you think should be um you know your preferences basically your imposing preferences um and you know he doesn't I think he you he ultimately just says like you know IT is like a bit of. IT IT is a bit like weird or uncomfortable if you think about IT like you know from like a kind of a godi perspective, like oh which just looks like you imposing your will on things. But like ultimately like if you are acting with other people's interest in mind, like you shouldn't, you shouldn't feel bad about that like that. Like that is Better from our perspective than just being, you know, a pear, selfish person.

Yes, what do you think about the the Peterson al ism in in E O, so I agree with you that IT is an aggravating factor if there isn't, you know, an actual moral truth to to go towards because if if there was, there would be some kind of moral justification to you almost almost lead people to a Better world. But what I think the argument they're making is that they've galaxy brain themselves into coming up with a moral framework and in thinking long into the future of knowing what's Better in a way that Normal people can understand. Therefore, you should listen .

to them yeah I guess so I I don't know. I have like fully fleshed out views on this, but I think there there is there is a potential worry with moral realism that IT can be used to justify, as you said, galaxy brained ideas about what's good like you because you think there's an objective moral truth, if you like, go through the arguments and you convince yourself, you know you know I guess one way of putting this is like if you think there's an addict moral truth, you're actually more open to being convinced of galaxy brain or kind of like kind of initially implausible sounding ideas about what's good like, oh, we should just like only care of the future and forget about the present ever whereas if you're not a realist to vote IT, you're just gonna like, no i'm i'm gonna change my mind on this like i'm gonna value the present um more than more than the future um and so yeah I think know a lot of people talk about the the dangers of more moral relativism or or moral anti al ism but I think there are at least as you know. The dangers of moral realism are at least as as serious. So you cannot what really .

start mini just computing here. But I I think what really triggered an interest for you was your fascination with with goodness and and value and meaning, you know, IT almost like that, that triangle. And if you like, and if I understand correctly, and in recent years, you've been looking into things like buddin and doing some some a bit of a journey that tell me about that.

yes. So I guess the the fascination to sm really it's fairly recent. Um so in the past, it's definitely like this year, like I wasn't thinking about the last year in the last few months um and a kind of IT was well so one influence was actually Robert right, who I did an interview with a little while ago and he wrote a book called why budha is true and just like being like being asked to go on to show and I going on to show just made me think about the book and I eventually read IT.

Um I also um I think for somewhat independent reasons, started uh using some harasses APP waking up which um kind of guide you through mindfulness meditation. And I did that I think like partially just because I was sort of research for some sort of kind of cause spiritual practice. So I just wanted to kind of try IT out.

You know I thought like, oh i'm interested like physical hy and consciousness and stuff, but never meditated IT seems like to do that in. Maybe i'd get something something out of IT. I also have A D H.

D. And I thought like maybe forcing myself to be like mindful, like like help with out something. So there are like a lot of factors, but I so I started meditating um and you know it's not like obviously you've immediate without being a bud. Like i'm not even sure if I count as a bud. Like IT defends how you do, but like it's clearly like connected, like a lot of a lot of their kind of meditation practices come out of like the bud tradition um as well as like hindu traditions and stuff um and like sam Harris in the APP and in the like kind of there's like a theory section of the APP where he just like has these like discussions with people on stuff.

He talks to a lot of budders and talks about like put his first fein you I so I kind of got into IT through that and I I immediately noticed like okay, there's a doctor of like no self that like there's no you know there's different ways of putting IT, but like there's definitely say like it's definitely saying that there's no kind of cartesian ego that or there's no soul that kind of continuing um on that kind of defines who you are and stuff and i've like always or like almost always thought that you know for for years and so the fact that like the button was saying that twenty five hundred years ago was just like pretty cool like, okay, you got one thing right maybe I should and then it's just like the analysis of suffering, I think is like fairly compelling to me. I suffering as like at least psychological suffering as being kind of. Being caught by clinging and kind of desire attachment to the world.

There's also this other part of what ism that is more metaphysical than I like a lot. It's the doctor in of emptiness and the dog. It's sort of an extension of no self where you know that no self is saying you don't have an essence, you don't have a soul, which would be your essence evenest is saying that nothing has itself, nothing as an essence okay um and the the doctor is would have like expanded upon by this a philosopher, na juna, who got kind of created this entire new school for us.

We called madama based on his notion of emptiness. And his view is like, okay, everything is empty. And what that means is a nothing has inherent existence or or essence to IT so everything is relational so like like all kind of objects or like concepts are sort of defined by their relations to other objects.

Um and I yeah there is like a lot. There's like a lot of reason to think this is like a good way of thinking about things um in print. I guess maybe i'll will say one worthing that we can kind of wrap this up.

But like in particular, like emptiness would say that the whole debate between idealist and materialist of like what is what does everything made of as a made of matter, as a made of mine? There's a made of something else. He would just say, like, stop asking that question.

There is no you there is not made of anything because undammed tally, you know, this chair i'm sitting on on my body year or whatever, it's it's it's all constituted by its relation, you know its internal relations, internal structure and its structure like the kind of structures that is embedded in. So I am defined by like my relationships to other people are defined by my relationships to you know that the fact that i'm sitting on the chair is another relation to the suspition relation and that's IT. It's all relations.

There's no essence. And so you just kind of get rid of a lot of these like philosophical debates that seem endless. Yeah.

i'm a huge fan of the idea of a relational ontology. I think IT was lucon a floridy who introduce me to, and the interview we did with him is great. yeah.

So one of the things, by the way, I last somehow, is I supported him. So earlier i've got a lifetime subscription to his act, which is really recall. And I agree with about sixty six point six for Carrying percent of what he said.

So but I just love him, his door set time put me to sleep, everything like but anyway, the other thing was you was saying something interesting about suffering because when when I you know do stuff and i'm interested in in, you know like mental health and and A I in particular i've read a great book called lost connections by your hand high. And he's kind of saying that we're now starting to understand depression and anxiety. Ty, in terms of the psychist social environment, that's a very externalise view and also quite compatible with this relation of you that that you're talking about.

But I think a lot of people that do mind from this techniques are trying to, and I don't mean this an spangling way, but almost trying to address the symptoms. So the people that are missing connections in their lives that give them a sense of purpose of meaning and so on, and what what buddha doesn't, you know, not necessary or bidden. But you know, one one of the pricing here is that almost dematerialized yourself so that these psychos, social stresses no longer impact to you. What do you think about that?

yeah. So I definitely think there are ways in which you can like, take buddy sm too far or like take IT a kind of IT can be unhelpful as well. Um and yeah like you said, you know I don't I think if you are interpreting metin to mean that you like shouldn't try to all any of your problems or like you should just you know, I I don't think that's that's the right way to go about things.

I would prefer to have kind of a holistic approach where yes, you you like try to solve kind of improve your later life to except that you can, but also try to like change the way that you think about things so that you can have more kind of enduring happiness. And like if you do both, like that's probably the best. Um I also think there's there's different interpretations of the side of the doctor of like well, there's like four noble truths in bodies.

M um there's like the source of suffering. The first truth is like the source of suffering is clinging. Then right there is suffering. Then the source of suffer is clinging. Then there is a path out of suffering and then the path out of suffering as they full path that the four the four truth. But there's like that's like a very schematic and there's like different schools of thought about like, well, what does IT metal like kind of end cleaning or whatever?

I think there are some versions of buddy sm um maybe more in the terrible to tradition that I find out of problematic where they basically say, well, like everything suffering first of all not that just there is suffering in the world with the obviously is but like even like kind of life itself is suffering or something and then then they basically say, well, because kind of everything is suffering really your goal should be like nonexistence basically or like they have a conception of nirvana is like kind of state of of perfect while being which is to me anyway IT seems just like and distinguish from just like evaporating into nothingness and for them I think that kind of they can kind of make sense of IT because traditionally they think they believe in like reincarnation, rebirth and so for them it's like, well, when you die you don't automatically stop existing um and so you there needs to be this like eight four path to not existence of course like i'm sure like you know people are going to like criticized me and say like no nobody thinks that IT seems like at least some people think something worrying ly close to that anyway and so I don't want to be associated with that form. M um and I think I am more attracted to though like experts. Zebra sm probably would be that the closest um to my kind of like I do not like moral a views uh because it's its understanding of like liberation or enlightenment or whatever is much more down to earth um and so like for example, IT IT would like like they will say that like enlightenment is something that happens like while you're alive, like it's it's not like freedom.

It's like not at least primarily or exclusively like being like freed from the cycle of birth. And they have this view where really what you're trying to do is act uh, spontaneously and in a way where you are kind of not attached to what happens as a as a result of your action, which might some kind of like a while is something um IT is supposed to be connected to like compassion and stuff. So it's not like you're completely indifferent to the world, but to like the hope.

The idea is that you should cultivate compassion so that you act in ways that are like beneficial for other people but also you shouldn't you shouldn't be kind of goal oriented about IT or consequential st about IT. It's actually like kind of antithetical to E A honestly from A A philosophical perspective. It's like it's more, I would say, more kind of virtue ethical where it's just like cultivating compassion but you're like you're not clinging to like, oh, I must like my actions must have these consequences otherwise i'm gonna be like super depressed and frustrated about IT and that's kind of how you kind of relieve suffering as as part of this yeah.

it's very similar to kennaston list book, why greatness, why greatness cannot be playing. And essentially it's the same device. Part is what Kenneth talks about, but outside the compassion part, which which is interesting.

But I read a book by down Harris many years ago, and I he was on the same side, and he was saying, oh, you know, this was great. I got this then state, but then I had to go to work and I to get stuff done. And there is an interesting jacky position when we talk about serendipity in general, because IT is great.

But there's also a lot of things in the world that do need goals and objectors and alignment. And so because we ve actually got things to invent, we've got ta build societies and so on. So how do you kind of reconceive that?

yeah. So I think the way this is all like tentative right now, I am I definitely I don't it's very possible that in six months of you I go actually ebola sm is crap ate but um i'm still kind of on to on a journey um here but yeah so obviously like we do need goals um and and kind of structure and stuff um but IT also seems like in the future we may need that a lot less precisely because of ai.

So there is kind of this i'm thinking i'm thinking that there may be this kind of um nice that kind of then and like the fully automated future that we seem to be approaching might be kind of like good companions because, you know, i'm looking for a philosophy that. That I think could that could kind of provide us meaning in life even when we don't we don't really need to do a whole lot like we don't need act at least the humans like don't need to be thinking about like oh, how do we like run the economy and like, you know, we can just kind of be spontaneous and significance that a eyes are just handling everything else. So yeah, I guess that the hope is in the future, technology could allow us to just beat kind of then enlightened being .

something no, this has been absolutely fantastic. Have any huge fan for a long time now? Why can people find out more about you?

Yeah so um I guess two main places. First of all, if you want to like kind of get involved with like my um like my research and stuff, we definitely there's like a lot of people um at a user who are volunteers um and so you can go to a user, dot A I and there is a link um on there to go to our discord.

And like there's like multiple channels under the international ability category there that are like all me, and you can like at noels there and get my attention if you just want me like I know, ranting about things or whatever. You can go to my twitter profile, nobel. S, yeah.

Thank you so much. Gna pleasure.

Yeah, thank you.

Nora Belrose - AI Development, Safety, and Meaning 02:29:50 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

Nora Belrose - AI Development, Safety, and Meaning