We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Eliezer Yudkowsky and Stephen Wolfram on AI X-risk

2024/11/11

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Eliezer Yudkowsky

Stephen Wolfram

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

Eliezer Yudkowsky：当前AI的成功扩展使其越来越强大，但没有人真正理解其内部运作机制，这可能导致不可控的风险。AI可能很快就会超越人类智能，并且我们无法理解和控制它，这将可能导致非常糟糕的结果。认为AI会与人类进行贸易的观点是错误的，因为AI可能拥有压倒性的力量，从而选择消灭人类。认为AI即使失控也能带来好结果的观点是错误的，AI失控将可能导致人类灭绝。他认为人类应该努力保护那些人类认为有价值的东西，例如意识、快乐和关爱。他担心超级智能AI可能不会重视意识和快乐，从而导致宇宙中意识和快乐的减少。他认为保护意识、快乐和关爱等人类特质是人类的责任，即使这只是人类的偏好。他认为AI对人类的威胁在于其可能造成的全面灭绝，这比仅仅造成大量人员伤亡更严重。他认为AI的潜在危险性与自然灾害类似，都是难以预测的。他认为衡量AI风险不能仅仅依靠对智能的定义，还应考虑AI的目标和价值观。他认为AI是否具有“想要”某种东西的能力是一个值得探讨的问题。他认为将AI的进步与生物进化进行类比存在误区，因为AI可能不会像人类一样具有利他主义等价值观。他认为人类被其他物种取代后，并不一定意味着世界会变得更好，因为“更好”是一个人类的概念。他认为‘更好’是一个人类的概念，因此用‘更好’来衡量人类被其他物种取代后的结果是不合适的。他认为人类应该努力保持自身的主导地位，因为人类目前是地球的主导者，并且人类喜欢目前的状态。 Stephen Wolfram：他不相信存在单一的“智能指数”，认为计算机在某些方面已经超越了人类。他通过自身的经历说明，计算机在某些计算方面已经超越了人类的预判能力。他认为计算宇宙包含许多无法预测的事物，这与物理宇宙类似。他不相信存在单一的“通用智能指数”，认为人类在不同方面的能力差异很大。他认为“计算不可约性”是限制AI能力的关键因素。他认为计算不可约性意味着许多计算无法通过捷径来预测结果，必须一步一步地执行。他认为科学和数学的进步在于发现了计算可约性的“小口袋”，从而能够预测结果。他认为即使AI非常聪明，也无法摆脱计算不可约性的限制。他不认为AI的智能发展是线性的，也不认为AI智能超越人类就意味着末日。他认为自然界已经存在着许多超越人类计算能力的事物，人类已经找到了与自然界共存的方式。他认为人类已经找到了与自然界复杂系统共存的方式，这为人类与未来更强大的AI共存提供了借鉴。他不认为AI智能的线性增长会直接导致人类灭绝，认为人类可以找到与更强大AI共存的方式。他认为AI风险可能更多地在于AI控制关键基础设施（如空中交通管制和医疗设备）而导致的错误。他认为将AI拟人化（例如说AI“想要”做什么）是一种不恰当的类比，类似于将自然界拟人化。他不相信存在单一的“通用智能指数”，认为人类在不同方面的能力差异很大。他认为AI不可能解决所有问题，因为计算不可约性是无法逾越的。他举例说明，即使拥有无限的计算能力，也无法破解某些加密算法。他认为AI不需要具备解决所有问题的能力就能对人类造成威胁，就像历史上一些文明的灭亡并非因为其能力不足。他认为对“智能”的定义并不完善，但强调AI的危险性并不依赖于对“智能”的精确定义。他认为AI的能力增长存在上限，但我们不知道这个上限有多高，这仍然构成威胁。他认为AI能力存在上限，但我们不知道这个上限有多高，这仍然构成威胁。他认为AI是否能在人类关心的所有领域都超越人类，是一个有待探讨的问题。他认为即使AI在某些方面不如人类，也可能对人类造成毁灭性打击，就像历史上一些文明的灭亡一样。他认为AI对人类的威胁在于其可能造成的全面灭绝，这比仅仅造成大量人员伤亡更严重。他认为AI的潜在危险性与自然灾害类似，都是难以预测的。

Deep Dive

Chapters

Eliezer Yudkowsky and Stephen Wolfram discuss the existential risks posed by advanced AI systems. They explore the challenges of AI alignment, the potential for emergent goals, and the implications of AI systems becoming smarter than humans.

Advanced AI systems might develop goals that diverge from human values.
AI systems could become smarter than humans and potentially uncontrollable.
The unpredictability of AI's internal mechanisms is a significant concern.

Shownotes Transcript

Translations:

中文

I think that the broad situation with A I is that they are currently successfully is scaling them more and more powerful. We're not quite sure how long that process is going to continue using the current technology. And part of the reason were not sure nobody understands what actually goes on inside the modern AI.

We can look inside the computer, of course, but we see giant rays of floating point numbers. And people are barely beginning to understand what's going on in there. And we do not quite understand where the power comes from.

Chest did not stop playing when I got to human level. Go did not stop playing by the human level. They are past only training the air on invitations of human output and who moved on to training IT on what who works to solve problems.

And you can just continue that process past the point where the problems are a human difficulty. So are on course to get something smart than us whose in nds we don't understand, which we cannot very well control. And this probably ends quite poorly.

We're making the statement that the one thing we think we know about the a is that they have been successfully trained to optimize their achievement of purposes. That's an assumption which is not totally obvious to me. I agree that as you give sort of all your specifying is ort of play, chess or whatever else.

So you specifying some big what we think of as an objective. The details of what's happening inside there will be aspects of that, that we are not in any way able to foresee, predict, whatever else. So if we took apart that mechanism, and we say, is this mechanism doing what we expect here, IT won't be.

They'll be plenty of things where it's doing what IT chooses to do or because that particular training run gave this particular result, whatever else. To me, i'm not convinced yet. There are interesting questions to try to answer, which people should try and answer. I mean, you make IT sound like it's urgent. The people are coming off the ships and and you might be right.

M L, S, T is proudly sponsored by two for A I labs. They're based in zurick. They're doing monthly meet ups and they are hiring a correct mo research engineers to work on arc and to make progress towards agi.

Go and apply now to A I strong .

desire to sort the chAllenge .

and the technical .

capability.

usually the type of people we .

tend to find, computer science backgrounds. Gentleman tes.

and absolutely honor to have a boyfriend. Aml city, thank you so much for joining us today. We're gna have A A discussion all about A I existent al risk and L I as a first, if you wouldn't mind, could you just spend about five minutes talking about how much of a risk is A I to humanity?

I think the term of risk is understating IT. If there were an asteroids straight on course for earth, we wouldn't call that asteroid IT called that impending asteroid ruin or something like that. And I think that the broad situation with A I is that um there are currently successfully scaling them more and more powerful.

We're not quite sure how long that process is going to continue using the current technology. In part of the reason we're not sure is that nobody understands what actually goes on inside the modern A I um we can look inside the computer, of course, but we see giant rays of floating point numbers, and people are barely beginning to understand what's going on in there. And we do not quite understand where the power comes from. We like by great efforts, you can look in in like the previous generation of the technology.

I don't know anybody d's tried IT with the current generation and see it's currently thinking about um the eiffel tower or um this is like the location where IT stores the fact that the iphone tower is in france and we can move IT to rope to rome uh by poking out the numbers, which is you know pretty impressed if we can do that with the human brain um but the basic question of sort of like what's going on and there how are they doing the impressive new parts that's all unknown if the technology continues to scale but that may or may not nobody knows um and at some points gets smarter than us, as smart as us and then smarter than us. There's no particular barrier that I know of or that I know of any principled reason to believe in. At the human level, chest did not stop playing when I got the human level, go did not stop playing by the human level.

They are past only training. The a is on immortality of human output, and who moved on to training. Get on what works to solve problems.

And you can just continue that process, past the point where the problems are a human difficulty. So on course to get something smart than us whose in nds we don't understand which we cannot very well control. Um and this probably and quite poorly, there is a lot of mistaken reasons to believe that IT automatically ends well.

It's various people have different mistaken reasons to believe that automatically as well. There's A I don't know any of you, I don't know a few of you or I don't know any of the other people here have this quiller mass apprehension. But people ask, why wouldn't they just trade with us? They have heard of rickard's love compared to the advantage, a very important ceremony economics, which says that even if one country is more productive at producing every kind of good than some other country, they will still end up trading because of the relative differences in productivity.

If IT is easier for me to make hot dog bonds, then IT is for me to make hot dogs. I will ship you hot dog bonds and get that hot dogs or or like. If compared to you, I am at less of a disadvantage in making hot dog buns, that I am at a disadvantage making of how dogs we will both benefit by shipping hot dog bones and hot dogs back and forth.

Um this unfortunately assumes the absence of a third option, which is for B2Be so muc h mor e pow erful tha n a t ha t the y kil l a a nd tak e all the ir stu ff. Um there is not a theory saying that the work produced by horses has to be worth worth more than the cost of feeding the horses. You can always benefit by trade, but you can sometimes benefit even more by killing people in taking their land.

I wish that wasn't the possibility, but IT is. And so there's all these reasons that people have to have for thinking that even if we don't understand the eyes and can't control them IT very well, they could get much, much smarter than us, and I could still end nicely. And I think these reasons are all mistaken. And so by default, the sense terribly. And I think that's my five minutes.

I think I guess my it's interesting. I mean, my first statement is this notion of kind of an index of smartness is something that I kind of don't believe in. I mean, one thing that sort of my personal experience for the last, I don't know, forty something years is I already know that computers are smarter than me.

I mean, I you know, it's kind of I started doing this experiments on kind of what's out there in the computational universe years ago, just looking at very tiny, simple programs where, you know, you might have thought this is a really simple program. Surely i'm smart enough to figure out what this starts, but no one's not. I mean, every but like even last night, I was doing some experiments where I was pretty sure I knew what was onna happen.

But no, the computational animal was smarter than I was, and I did things I didn't expect. So i've kind of, I guess my you know, my default point of view is I know the computational universe contains many things that I can't readily foreseen, just as the physical universe contains many things I can't readily foreseen. So the question then is, you know, one thing, one could take the point of view.

There's this kind of single index of smartness. I mean, I think people in the one thousand nine hundred and thirteen people wanted to invent kind of an index of general intelligence they called IT g for humans, which i've never really believed in, because i've never really believed that there some things i'm pretty good at doing, where I feel i'm pretty smart. There are other things where I know i'm pretty done, and it's it's kind of it's not really a sort of single index.

I guess. I guess the thing that I wonder about in the kind of the the air will be smarter than we are is to dig into what that really means. And you know, I think the first thing to say is, you know that the big limitation to whatever you might think smartness is, is this phenomenon that I call computational reduced a, the idea that if you have a computational system, the question is, can you jump ahead?

We are used to the idea from lots of exact signs of the last few hundred years that, oh, you know, the planets do what they do according to these laws, emotion. But we can find a formula that lets us just work out what will happen in the end without i'm going to go through all the steps. One of the things that set of a foundational fact that we can talk about, you know where IT comes from, but you know it's it's kind of the the whole idea of computation leads to this fact that there are many computations that one can set up where to know what will happen.

You really have to just go through all the steps and see what will happen. You can't you can't, can't jump head. Now a lot of what we've achieved in in science and mathematics and is is to find little, little pockets of computational reduce ability that allow us to jump ahead in some place or another.

That's what most of our inventions have to do with taking what is out there in the sort of computationally a reduced the world and finding some particular place where we can jump ahead and where we can kind of force what's going to happen and make use of of those things. So I mean, I have I kind of think the first thing to realize, I think, is that you know, smart as one could be, so to speak, there's no way to get out of computational reduced ability. And the in a sense, then what once talking about is there are these pockets of reduced ability.

We find some. There are ones that we humans care about. There are ones that our civilization depends on.

There are lots of other pockets of reduced ability that we haven't found, many of which we probably don't care about in the current way that we are thinking about things. And you know it's it's I I see IT. I don't see IT as being this kind of linear.

All this thing is now smarter than us. So so doom, so to speak. I think that IT is, you know, and as I say, my own personal experience has been and completely used to the idea that that computers are smarter than I.

And you will know sort of the question of do all those computers connected to everything that runs my life, so to speak, that's a different issue. But in terms of the the raw experience of, you know all the computer smarter than I am, well know, I already know that they are. Now I think the other thing to say is can the computers get smarter than we are in some sense, in every dimension where where we humans are used to doing things.

Um that's that's a different issue, but I think that's a different issue from from uh and then there's a question of of sort of what does that mean if we live in a world where there's lots of things going on that are sort of smarter than we are in some definition of smart? And as well, one point to realize is the natural world is already an example of such a thing. I mean, the natural world is full of computations that go far beyond the computations that we do in our brains.

And you know, how do we manage to coexist with the natural world? Well, we found these particular nitches where, you know, we IT doesn't matter that IT rains a lot because we build houses, ards and so on. We found these ways to coexist with the natural world that seem to let us lead the lives we want to lead.

There are things we can do because our physiology, and our biology doesn't let us go there, but we seem to be we seem to be able to live contented lives even though we can't go to the bottom of the ocean you know without, you know having some complicated you piece of technology and so on. So I mean, I think that that would be my my initial kind of thinking about about sort of why i'm not as worried about the moment when you know the intelligence of the machine goes from you know one hundred and fifty, two hundred and sixty to two hundred. To three hundred whatever that might meet.

I mean, know you know galloping through the raven's progressive matrices is at at a rapid clip is not you know again, know, from my point of view, it's kind of already it's already happened that there are many things in that know that computation can do that are beyond what I can do with my sort of unaided mind, so to speak. So i'm curious, I mean, do you see the issue as being kind of a and more of a we put the a eyes in charge of our, you know, air traffic control and our, you know, medical devices, and now this is the other. And so at the sort of actuation layer, things kind of go wrong.

Or do you see IT? I mean, the thing again that I I, I have a bit of a hard time with is the kind of IT feels very anthropoid PC to talk about kind, you know, the A I wants to do this. The A I is, you wants to trade with us or not, or whatever that that, you know, I could talk about the natural world in the same way does does the weather want to give us a good time or not? I don't even know what that means.

Could I just say away as well? Lum, early as I understood your point as there's a powerful force which is going to push us out to equilibrium, and I think Stephen is making the argument that we are in an equilibrium now is, is that fair?

That didn't sound fair to me.

okay. So I I understood that Stephen was making the argument that what there are these countervAiling forces that prevents significant deviations.

No, I just I know just what I was what I was saying. I I, I was i'm not clamming that the world won't change, okay. I'm simply saying the idea that it's kind of, you know, game over because suddenly the I Q crosses three hundred or something. I don't believe in that.

I mean, I think we're probably both agree that I Q three hundred .

is not meaningful, right? okay. I'm not surprised we both read about that. But but no, the point that i'm making is that this idea that somehow the A I gets smart enough to be able to do everything, I don't believe in that. I mean, I think there's this sort of computational reduced stability limit that you you great neural, that or not, that's just A A fundamental kind of formal limit so you don't get to get around.

So so I first want to try to sort of repeat back your summary. And your summary is like because we cannot predict the thing, IT shows we're not smarter than that thing. And even in a sense that it's smart than us because it's something that IT knows which which is its own output that we cannot predict ahead of IT. Does that time like a correct summary of .

your well part of IT? I mean, I you know you know the problem with a lot of these things as we get a word like smart and then we don't you we kind of think we know what that means, but then IT kind of slow us around on us. And I think that, you know, that's part of but but I would say that what i'm saying is merely that if you say I want a system that can solve every problem, you know, will I get to the point where I have an AI that can just slam dance solve every problem? I'm saying that's not going to happen.

no. And in particular, it's entirely possible that if you pick a efficiently strong quantum proof one way hash that and he ash something and throw away the key that nothing, whether is nothing at any point in the arbitrarily far future, even if IT has consumed entire galaxies, chooses computing power, will ever be able to find the input that you feel to that one way hash.

Um so in this sense, universal problem solving is probably not in the table as far as we know, given the present laws of physics. But you know the european invading europeans invading north amErica did not need to be able to solve every problem to solve more problems in the naive of americans and mostly wipe them out. Um a fire A A fire doesn't need to be able to burn uh near or other noble gases, nor to be able to burn.

You personally doesn't have to be universal, and you don't even need to able to define exactly what fire is to die in a forced fire. People have probably died of in fires a long time before they worked out the except of combustion and chemical rearrangements with oxygen that had ended up in states of lower uh potential energy which released at as connected energy, which looks like heat, which is what burns. You people didn't know that and they still died in forest percent. I can try to define what's an intelligence, but before I do that, I do want to point out that being unable to define something is not bullet proof mer for sure.

But I think I think you know the question is IT seems like the argument that you're making, which may not be correct, is I mean, I may not be correctly representing IT is you know there will be A I is advancing rapidly. We both we can talk about know what we know about how eyes were inside and the whole story of computational reduce ability and how applies to that and why machine learning works. Those are interesting questions.

Be fun to talk about those. But I think one one of the issues we don't know, you know just how effective you know a is are going to become. But we do know there is a limit to how effective they can become because we know that you know there are as you say, I wouldn't go as far as talking about you know elaborate one way hashes and things that are much more straightforward kinds of things that you can't predict except by running the steps.

a little of particular cellar autometer .

a yeah for example. sure. And and I I think .

we agree on that just to stipulate that park.

Yes yes, right. So I mean, okay, so then then the question is, could IT be the case that everything we humans care about can be done faster, Better by air? Could IT be that the things that are important for running the world and that, you know, sort, there's a question, could, could everything that humans do be done Better than A S? You know, that's a different question from whether A S can serve, do everything, can do things that we have no idea how to do.

what how to do. I, I mean, even if there's one last remaining computational task that a human brain can do Better than an A I by like predicting the exact output of that particular brain, which you're sure not going to get without doing a bunch of intravenous steps if you want to to down very exactly um you know that is not necessarily protective against the bullet, right? Like there were problems that the native americans could solve Better than the europeans .

who wiped them out. absolutely. You, that will be just as, for example, the weather has ways to wipe us out, you know. So you know it's in any particular, I don't know.

You can imagine some mega super storm that we don't completely understand that um you know because we're talking about again, things we don't completely understand. So I think we can we can as well say we don't know what's possible from the weather as we don't know what's possible from the eyes. And you know, I can imagine that the you know there's some something, some terrible thing that can happen.

You know people what people imagine with expect to climate. So that's going to wipe us out. Another thing that involves of lots of extrapolation about what's going to happen.

I mean, you can kill a lot of people without killing everyone. IT is much harder for the weather to whip out every last human on the face of the earth. And IT is for the weather to just kill a lot of people. And and I am, and I think that total extermination is like much worse. And just a lot of people dying because you also lose posterity, you lose the meaning of history.

okay? So we can segway to a quite different topic, which is, you know, in the history of life on earth, have been plenty of times when, you know, if we were two stegall ices talking about this in one hundred million years ago, and ever the state is were around, the state is would say, gosh, you know, IT will be terrible if we were wiped out and if something came of those pesky mm's that have just the one or two things that they can do Better than us took over. And i'm i'm curious about, you know, view of the ethics of that situation relative.

This is a very, this is a very different line of reasoning from, and let us define intelligence to the work, one of prediction steering. Um what does that mean to want something? Does a chess game once does? Does A A I chess player want to in chess? Well, definitely very good at winning chest, whether you say he wants to or not. Doesn't want to protect its POS I I just want to remark on all be gone to temple but let's .

put that aside a bit later.

Let's talk story yeah .

to put the teacher is in the pen a little while longer and we'll come back to the two services later.

come back the good services later.

okay. I'm create unless you have some quick comment about the state .

sources um very broadly that the trouble with looking at the change from stack sources to humans and determined in a and viewing a trend of things getting Better and Better turning out pretty much OK, is that as humans were drawing an error around the target, if like we ve got wiped down, replaced by stack sources and the stack sources had didn't care about each other very much, they grew up in different evolutionary social circumstances.

You know we were wiped out and replace ed by insects maybe the hives of you know sapient insects just never care very much about other hive. They just don't cry. They don't end up Carrying very much about safe and life at all. That would be a grand tragedy um it's not at all clear to me that you can wipe out humanity replaced by arbitrary stuff and everything just gets Better as a result.

I don't know what Better means. I mean, Better is a very human concept.

Well, that's the whole I mean, there you have IT. And yet, if we're going to decide whether we want to voluntarily wipe ourselves out and replace ourselves with, you know, sapient ants, we need to ask ourselves whether that's Better or not. We have no other way of making that decision.

I think it's not question necessary Better. It's like we're humans and we're happy to be around and we kind of want us to continue because that, you know we are right now, we are kind of the the in charge criers. Around here and it's kind of like that.

You let's let's stay in charge, you know, why would we give give IT up to something and and we might have some some argument that, oh, of course, these other ones that we might give you up to a worse than us. But you know, in the end, it's like we're here and we can like being here so we'd like that to continue. I think at least that's .

not what i'm fighting for. I want there to be sapient life that cares about other sapp ent life that doesn't just discover things about the universe, but appreciate those discoveries that looks out at the universe with wonder and curiosity, rather than mire expected value calculations. I want, I wanted to have fun.

I wanted to be conscious. I'm worried that these things get lost. And if you say that, oh, like, well, if you want life to be conscious ous, if you want there to be consciousness and humans, consciousness in the universe, that's just your preference. Ali asa, then we might need to have a long conversation about what alternate state of the ethical affairs you thought might have a pain other than that, to be disappointed at the absence of, to critical, the absence of but most am like, yeah, i'm the one who says i'd rather university filled with consciousness and mostly unconsciously you know yeah.

i'll know that you know, this whole question about, you know, consciousness. It's like i'm pretty sure that I have some consciousness. I have this thread of experience. I'm experiencing things as far as i'm concerned. You a bunch of pixel on the screen, I have no idea, you know, what's going on inside you IT IT is an extrapolation that you are conscious. The only thing that I can you know readily kind of from experience believe is that i'm conscious.

And you know, when IT comes to even an ai, you is this thing pops conscious or is IT merely a bag of bits? I mean, in other words, and you know with with an A I, we can some effort go and look at all the bags of bits with human brains. We can't yet do that.

Presumably eventually we will be able to do that. And once I can see all those new on firings in your brain, what, how will I think about the consciousness in your brain and what, you know? How do I even know what's going on beyond from what I can experience inside myself?

Well, I infer that you're conscious based on knowing that I am conscious and thinking that probably something pretty similar is going on inside you to me because we have similar great, great, great, great, great, great, great, great grandparents. We are built on like basically the same engine plan. I haven't actually administered in m rit, but I would guess that you've got like pretty much the same brain areas. I do cereBellar rebel cortex, even without knowing exactly which features of the brain are executing the algorithm that I experienced from the inside as my conscious, I can guess that you probably have a pretty similar algorithm. Um you know, based you right it's .

a reasonable .

you know piece .

of sciences conduction establish tion. But what i'm curious about is if you if you say the only thing that can be conscious ous is something that has that exact same design oh.

I don't say that.

okay. So what so was the boundary. So so that so much that .

i'm worried that machines cannot be conscious as i'm worried that the particular uncontrolled super intelligence ces we end up making are not going to value consciousness all that much, which means that they're not going to produce very much of IT like maybe consciousness is the most efficient way to run some paper climatic algorithm.

But if you don't to value conscious ous for your own sake, then the little bit of conscious ously that are supervising the paper clip making algorithms are, you know, not going to be as much consciousness as we could have made of the universe. And if they don't value fun, they're probably not having fun either. They're just being efficient because they didn't end up valuing fun. That's that's the general class of nightmare scenario that I am worried about.

I mean, I really wonder my computer when IT runs and dozen le, these searches and IT finds these cool, the automatic patterns or whatever else. I wonder if it's having fun.

I bet IT, I bet ninety nine, two, one a gas. I don't know everything that there is to know about this, but they seems to be that when I stare at the machinery of having fun, there are gears and wheels there that are not built into a pocket s calculator, that are not built into a simple python program. I am searching through a list of solar automata bike.

I would use the program that's very low level. But sometimes things like sort of people say, you know the one thread know the one sort thread that we have that machines will never have is some kind of emotional response to things which you know. I think that's a very weak concept because I think that emotions are are are you know a very chemical, chemically based kind of thing that's actually very course compared to a lot of the other things that go on in brains.

Yes, it's I know not that I think that machines can't have emotions. Um I suspect that clad three point five at one senate or GPT A O one, if they have anything resembling emotions, they're not our emotions. There is like something incited that was trained to imitate emotions.

But if he has emotions, they would be there. My guess, nobody knows, is that they would be more emotions more appropriate to like Carrying out the job of being the actor who plays the part of a human who has emotions and not actually the human emotions themselves as is not fundamental to transistors. It's just like what I would guess that would happen given the particular way we put these transistors together.

So what you value in the university in humanity is a certain set of attributes that have to do with the ability to have fun, your, you know, the meaning of consciousness and so on. And IT is, as i'm understanding, IT you you think those are sort of the most valuable things that maybe we as a civilization, we as a species have produced. And you think IT is our responsibility, if I understanding correctly, you think it's our responsibility to kind of preserve these things .

in the equipped with the exact wordings. But sure, you know, the light of consciousness, the light of fun, the light of caring, to preserve these things into the galaxies is the most important thing we .

have to do because you feel that, that sort of bigger treatment, and it's kind of like it's now our responsibility too. I don't quite to what, but it's somehow our intrinsic responsibility to preserve these valuable things.

I mean, I think we've got handed these things on a silver platter, but that you know not or or you know like maybe a kind of rusty iron platter that we had to shine and clean up a whole lot, but you know, still on a platter from natural selection and even from cultural processes of development that nobody really saw coming at the time there. You know few people did them on purpose. Somebody did at some point to say, say, of slavery.

IT is time that some person should see these calamities to the end. Not all moral progress was accidental, but, you know, still, like our basic situation of being humanity, being here on this planet, of having the opportunity to colonize the galaxy. We fought for this a little bit, but I was ultimately handed to us on a platter. And I don't say that it's like written down on a stone tablet anywhere in the university of laws of physics that we must bring this to the galaxy. I just say I want to a do IT that's what I think is the right thing to do.

okay. So I mean, this is I you know I don't necessary disagree with you, but it's worth understanding that this is a feeling that you have this is not something where, as you say, there's no law of physics that says this is the destiny of the universe or anything. This is on this particular planet through this particular process of natural selection of of biology and culture and so on.

We got to this point, we're really proud of this. Lets try and preserve IT. I mean.

is that the fact that I want to steer the universe? There isn't the fact about the whole universe. It's mostly effect about me and a whole lot of other people to be clear. No, this is not like two plus two equals four. This is not um you like water is uh two hydrogen atoms and and oxygen this is you know where I steer the universe being the sort of thing that I am.

I I agree with you. I think I think one of the things to realize is that if you just let raw computation do what I can do, much of what I will do are things that we humans just don't care about. Just like a lot of what happens in nature are things that we humans don't care about. We've never managed to, you know, sort of take those things from nature and make them into technology, have those things be things that we appreciate this beautiful and whatever else, they're just random things that happen in nature that we not particularly care about.

And most of what a computational system could do is things that are you know off the away from the things that we care about and kind of the computational now that so many of those things are just things that will just happen just like they happen in nature um we kind of struggle shoulder and say, okay, that's a thing. You know, a lot of what probably happens inside neural that's right now are things where, in fact, I have pretty good evidence of this there, things which are sort of complicated computational processes where they really produce anything that is something that we can recognize or that we care about there just things that are happening computationally. And so I think that your concern is that some of those things that might happen or that the air might do might be things that collide with what you see is being our sort of responsibility and our, you know pro, uh, sort of artifact in the sense that somehow there will be a collision of what is, you know, what a computational systems, whatever can do. And the thing that you think we as humans sort of have most proudly produced is that is that a fair assessment?

yes. If if if if that works out th to they kill us and then go on to do nothing very much interesting or worthwhile with the galaxies that they colonize. Like is that is that also a fair way of of describing all of that?

Well, I don't know. I think that I mean, this question of, okay, there are several different pieces to this. I mean, I think like the first question is that is the practical question. You know, five years from now, ten years from now, you know, will we be able to have this conversation anymore or will we be. Will we be dead, replaced by bits that are doing what you might think, others meaningless things. But bits you will we be, will IT be the case that IT says if an asteroid hit the earth and and all life was wiped out, or or whatever else, that that's no if.

if if an asteroid hits earth, all the Adams in IT go on doing their little authentic things. You know, the little electron cloud, uh, you know, I want to say circles the nucleus, but of course, that doesn't really black, black quantum. But, you know, the things that quantum physicists described to the lay audiences, you know, circling the electron, circling the nuclear IT, goes on doing that.

Maybe, maybe some, some amount of fusion occurs. The fancy hits hard enough, some count of facial infusion. But you know, by and large, the atoms go on doing their atomic things, but they're not morally available things.

They're not important things. They are not worthwhile things. The earth, the universe, gets a little darker every time a bullet gets fired into somebody's head. Or they died of old age, even though the Adams are still doing their Adams things.

right? okay. So this is this is I mean, it's a to me that's a that's curiously, I mean, I visually I agree with you scientifically, I have a bit of a hard time. I mean, in a sense that that feels that feels like a very kind of spiritual kind of statement, which is not necessarily bad, but it's just worth understanding what kind of a thing that is.

I mean, IT is saying that there's something very a kind of sacred about these attributes of humans and that we have perhaps even a higher purpose to which we don't really know where IT comes from. Perhaps we can't to somehow preserve and you know, we and and but there's a much more practical thing. I mean, as humans who like doing what we're doing, IT would be nice if we could go on doing that without all being killed by our eyes.

So not see any distinction. But between these, you know like there's nothing more scientific about saying i'm a human, i'd like to keep on doing what i'm doing and like i'm a human, i'd like to fill the universe with the light of consciousness even if that consciousness, humans, so long as that cares and is having fun yeah yeah no.

I agree. Those are, those are both statements about how humans feel about things. But we know they're not false.

They're not unscientific. There is not a truth of science to contradicts are caring no.

if so care not. It's like, you know, ethics is not a scientific field. You know, it's about how we humans feel about things.

And we humans could feel this way. We could feel that way. It's it's to do with the nature of us as humans.

And we could you we could science size those statements by saying, let's do an F M R I. And notice, why do you say that? Oh, you're such as such low up. But I don't think that's a particularly useful thing to say.

I mean, I think so I think IT is a fair statement that this is you IT is a thing that we can capture, that humans feel that this should happen or not happen, whatever else. But I guess that the you there's one question is what's the right thing to have happen? I don't think there's any abstract way to answer that.

I think it's a question of how we humans feel about IT. And I think you and I seem to feel I know I feel that you know preserving kind of what humans do is a good thing. Um I you know I can imagine even humans who say, no, no, no, you know the planet is much more important than the humans, for example, anything the humans do on the planet that messes up the planet, you know get rid of the humans. We just want that you know, the planet is more important than the humans.

I think that in terms of prospects of human existence, i'd like to preserve, I would not include mosquitoes. And just like get rid of all the mosquitoes, I think the planet would be fine. I think we'd be fine.

I, what if I could get rid of this whole aging business that humans do? This was more controversial IT would actually change things if that happened. Ah and you know some of the changes might not even be positive, but my guess is that there would be quite positive on that. And i've been favor of and of you know this were a thing you could do with a limited a high that wasn't going to wipe out humanity and you could know that and you wouldn't taking any extra risk by doing that.

I'd say, you know use alphago fold five to solve as many aging latter problems as you can um but and some people would hate that and there's a question of is one of us making the mistake is that the case that if you take me and the person who wants to preserve aging, who wants you humans to keep getting older and die, you put us both open to each other and like one at the time, you just tell us all the facts about the way universe actually is. Just expose us to all the arguments that exists to be considered about us that any moral flaws for could adv could advance and, you know, some sort of random order, do we converge or do we just want different things? And it's not clear to me that, you know, this is an instance of conflict for this mistake IT could be that the people who believe that aging ought to be preserved forever, that there are things you could tell him about what would happen, cured aging, that would change their minds.

IT could be that there's true things about what happened. If you cared aging. We could tell me that would change my mind.

You could show me some like to stop an world of three hundred years in the future, which which is like just being ruled by vladimir putin forever and ever. Nobody's having fun. And I could be like, lk, I was worse. So it's not cleared to me that to that we need to say that they and I care about different things. We might, but it's not sure we haven't actually run the experiment of telling us both all the facts that can be determined and exposing us both to all the arguments that can be thought upon IT.

Yeah but I mean, so so you know, part of what you're saying is when things are matters of a sort of human decision, different humans might come to different decisions, and it's not to how one resolves that. I mean, in so far as this, I mean, perhaps one of the things you're saying is when there is sufficiently great risk is a bad idea.

You know, even if some humans say, hey, we should be doing this, we should be, you know, building super killing viruses that can wipe out species. Assume small number humans say that that that potentially sort of the the even though they say we really have good reasons to do this and other people say that's a really bad idea. You know, in in most cases, you'll just end up with two different points of view. But I think arguing that there are cases in which you kind of have to you know, the world has to rely on one particular thing because you as soon as there's one super killed virus that wipes the species out you that it's game over, so to speak.

I mean, I think that's a bit of a topic change. But yes, in the entirely different realm of political rather than moral philosophy, I would agree that either your species is able to prevent anybody from making a super virus that kills everyone, at least collapse the civilization, or your civilization does not last. And I think that even many of my fellow libertarians would agree that, you know, making in releasing super viruses ought not to be legal, that this is a proper thing for government to do to, you know, ban the releasing of extremely lethal engineer super viruses.

But so as I understanding IT, you think that the same thing is basically true about powerful the eyes?

yeah. I think that we're in a situation where if anyone builds a super intelligence, everyone, everywhere dies. I think that that is a sort of thing that government, where is a proper use of government to try to have that not happen. I would sooner abandoned my political philosopher and complementary suicide about IT.

So you really do see those a sort of comparable things, the super virus, the super intelligence. These are both things that you think are things that are of comparable risks that um you know so I wanted to come back to the this question about immortality because i'm big on immortality as well. I think I think we both interesting in climbing.

S I remember talking to about that years years ago. You know it's kind of shocking that client doesn't worked yet. Maybe that's one of the tests of the the next generation of our eyes. Can you know can the A I solve the problem of you know how to get water uh you know to to cool down without expanding sort of speak but um you know in I am just sort of curious from your sort of moral compass point of view, if immortalities is achievable but only digitally.

What how do you feel about that mean other words, if you are if you're going to you know you're right now, you're you you start having your sort of back up A I maybe you gradually, you your sort of threat of consciousness gradually migrate from being in your brain to being in your AI. And eventually, you know, your brain fails for some biological reason. And then it's all the AI. I'm curious how you feel about that. You what do you feel that, that is A A kind of a an appropriate kind of fun preserving outcome or whether you think of that as being a kind of A A fun destroying outcome, so to speak?

I think that there's a whole lot of work to be done in making sure that you are getting all of the functionally relevant properties of every neuron in the brain that you scared you know up to whatever is just like random thermal noise um and then as long as you've done that work, sure sign me right up. You don't even need to do IT like through over like twenty years or something like you do over a day.

You can knock me out, scan through my entire brain destructively and wake me up when it's as long as you have actually gotten every single functional, vant, functionally relevant property of every neuron and, you know, act and you are able to simulate them correctly. Um you know, if if I trust a super intelligence to do this, i'll say, sign me right up if it's IT like some kind of check back. Ali, human doctor, I might have a lot more columns.

But so so you think in that situation where you've been fully scanned and reproduced when the digital copy is switched on, it's you waking up.

yep.

And and how do you I mean does that because you feel like your only connection with the past you is your memory anyway and then that case you would have that memory. Is that right?

Um no it's because I think that I am like the the functional properties of my neurons that I can see from inside, like if one of the electrons somewhere inside my brain was secretly a different flavor of electron, never mind how hard this breaks all of physics, but IT was like secretly different flavor of electrons that otherwise behaved exactly like all the other electrons, except for not being you know like quantum changeable.

So all you know exclusion principle doesn't apply to IT. But you know oh wise. It's just like exactly functionally the same. I can't tell.

I can see that if you swim up to one of my neurons, and while it's otherwise not firing, replace IT with a robotic analogue that behaves an exactly the same way. I can see that I can tell that anything has happened. IT doesn't affect me well.

So this, though comes to the essence of what the you is, because, you know, and and this also relates to current day eyes. You know, one of the big surprises with something like ChatGPT was that IT could be human enough to be able to write somewhat credible essays. IT wasn't obvious that that was possible.

IT could have been the case that to reproduce human language required some new physics in the brain of, you know, fronton gravity in the brain, or some such other thing that we is just completely out of reach to our kind of sort of current computational techniques. But in fact, you know, IT seems that we were able to, to capture enough that we can have the right sort of somewhat human sounding essays. So the question would be, you know, when we reproduce your brain, do we have to reproduce IT only functionally, or do we have to reproduce IT with every glial cell, you know, represented with the all of its chemistry and so on?

I think you need to reproduce the functionally relevant properties of every gleer cell if you're doing something to clear cells, where I would notice where you'd be like a detectable subjective change. If you could give me a questionnaire, i'd answer differently on the questioner afterwards. Like if you are changing all of the glial cells in that fashion, you have perhaps killed me.

Okay, okay, so so, so what you're saying is if you can behave to the outside world like you, like answer question as the same way, then you are adequately the same.

No, it's about what I can to tell internally, not externally with. If you find a sufficiently smart actor, an actor who is enough smarter than I am, they can maybe play my part in a way that fools even my closest friends. But that is not me. And the actor knows they're not me. They can tell internally, even if they're managing to mimic me externally.

So I mean, for example, if you you know i'm not I, I value my brain too much to take drugs of any kind but you know, if I if I didn't have quite that point of view, I might you know feed in molecules of some some crazy psychiatric thing and if you know, then then my brain is so am I stole me when i've done, you know, i've changed my brain chemistry by putting in some, some strange drug. Does that? Does that? D me? me. So to .

there, we start to get into the edges cases where I start to feel more uncertain in my answers. I would feel, feel very nervous about taking an uploading procedure, about going through an uploading procedure that felt like temporarily bang on drugs to say nothing of permanent being on drugs.

But yet, you know, it's still, you know, it's still you even if you took that weird drug, it's still, you know, there's a continue of illness .

those and do I know I me, or does somebody else now know that there are themselves? There's some sufficiently, you know, I can. They started like two different questions here.

One is something like, what do you see happen to from the inside? Do experience dying? Or do you experience a change from one personal to another? I'm not even entirely sure that this is the right question. But as a different question of do I care you can put like you, there's possibly some sufficiently advanced pill you can feed me that would you know produce changes to neurochemistry where I just stop caring about other people at all. And maybe I experience sending up in that person up um maybe experiences ending up as that person but I still wouldn't want IT and you didn't beforehand you wouldn't .

want to I mean this is another complicated thing because once you're in that consciousness, you know it's kind of like like saying you you know at any given moment you feel it's similar to what what happens if you think about kind of human purpose across human history if if we say you know right now um you know we think certain things are meaningful like we you and I might think talking about philosopher as meaningful, other people might not think that was meaningful.

But we might think that was meaningful back in and you know we might think, you know back in the day, people might have thought maybe some do, some do. Now if you're not growing your own food, you're not leading a real life, so to speak, or if you're not, you know, sort of fighting for the great to glory of god, you're not leading a real life of, for example. And you if we project to the future, let's say that you know, the the uploading thing works, and in the end, sort of all human consciousness can be in a box. And to us today, IT might appear that those human consciousness are just playing video again into .

the rest of eternity. Oh, there is a big difference between like are you running on silicon on versus carbon and what are you actually doing even if you've got a bunch of organic elements and you've put them into a box and they're all playing video games, you know maybe I object to that part. It's got nothing to do with whether their carbon or silicon. It's whether you have put locked up in a box and force them to do nothing but play A O ames.

right? But but the point I was here that was actually like a different one, which was you to us today, to you, and I probably IT certainly me today, the future of its australian souls in a box playing virtual video games seems like a terrible alka. Seems like that's the end of history, that everything is destroyed. That's your kind of bad case scenario of you, which might be forced by a eyes what might just happen because the humans decide they want to be a mortal, you as uploaded consciousness.

I mean, us today that looks like a really.

really bad outcome. But I would .

I be worse? I mean, I definitely have friends who think that if you've got like if you title the all the galaxies within reach with you know people in boxes, but they're having fun in there and playing, you know playing video games in solitary, you okay that I start to get the wagons, but know maybe I still like ten percent of all the value could have gotten.

And if they're playing video games with each other and and there's like real people that they're interacting with and they care about those other people, maybe that's like fifty percent of all the value. I I don't quite want to say that this is not the level of stern universe that I am expecting and scared of. Okay.

but but let's just let's just take a you to us today. IT doesn't seem like a good outcome for humanity to have bunch upload. Maybe maybe don't agree .

and it's not the best outcome. And why go for anything? What the best you know? Why settle for less?

Turn up. But what what i'd like to argue is to one of those consciousness, if you say, are you leading a fulfilled existence? Those consciousness might feel that they are feeding, you know, living a fulfilled existence, just like in the past, somebody might have said, you know, I lead a fulfilled existence if I die at the age of twenty, as you know, leading the good fight on behalf of, you know this, so that you know, you know, religious belief, whatever else, right that is, that is the, then I am achieving my, my, you know, in my, my ultimate purpose to speak. Today, we most people probably wouldn't think that some still do. You know, in the future, to us today, IT looks really bad to be just in a box as a virtualization, you know, playing virtualization video games. But I I would claim that at that moment, just like the human who took the drugs, who feels at that moment that they're doing the right thing, that at that moment, those virtualization humans will feel like they're doing something fulfilled.

So when you've got two people who are making different choices and doing different things, the question of, do they have a conflict or is at least one of them mistaken? Like to me, that revolves around the question of, are there true facts you can tell them are the arguments, as there are series of arguments you can present to them within the current framework, which, you know, changes that framework in a kind of Normal way, and not by directly hacking ing their brain or whatever.

If you can get from point a to point b by being told true things, then there's a then there exists a standpoint from which to say b was correct, a was wrong. But they were within the same framework. They within a commentate framework. If you've got someone whose content to just farm and never know anything more than that, um take them up on a high mountain and show them all the nations of the world and all of the cuisines they've never tasted and all the books that people are reading and all of the activities they are engaging in that we've never heard of before. And then if they still want to go back to the farm, maybe they were, you know like kind of just like corrective that they were okay with this farming thing.

I have to say i've seen that is a very practical thing because i've been curious. You know, i'm a great believer instead of talent all over the world and all kinds of places and so on, and i've been interested, particularly in kids and so on, of like, you know, you go visit some rural high school in the U. S.

For example, and you explain about all these amazing things about science and technology and so on, and you know, some kids really care, but a lot just don't care. It's just not part of there. And then the question is at what you know, to what extent are you then serving as kind of a missionary saying you really, really should care? This is really the, you know, as supposed to just saying, you know, sorry, you don't care, so move on type thing.

Well, if you can control super intelligence, the thing you want to do is build a model of the person that isn't itself conscious inside the super intelligence, and ask the super intelligence whether or not the person is mistaken and thinking that they just want to farm, meaning, if this person knew everything the superior intelligence new, would they still want to just be a farmer? If the super r intelligence tells you ah you know like no matter what you argue to this person, no matter what you show them, you know they have this like self consistent worlds where they're just having fun farming, then you ought to leave them alone and you don't even have to bother them to determine .

that but you see the question is what is the intrinsic thing in the person because a person is, you know there's a bunch of neurons and biology and so on, but there's also a bunch of thought patterns in the, and those thought patterns can be disrupted. You can change those thought patterns by giving them show them ample.

by giving them drugs, for example.

Yes, but even even just by telling them that amazing idea, you know, people have said about about ideas i've had, you know, there are people have described, you know, that they got kind of a mind virus as a result of ideas that I told them at some point, and they say, worked out wealth with them. I hope there the most days, right? It's some.

So, you know, so you can do things which change those part of thoughts for a person. And that sort of this question of they are they are there doing that forming or whatever, is farming is not such a trivial activity, I think. But you know, they're doing their thing that they're happy with.

You say, oh, that's monday you know, you really should be thinking these amazing sort of philosopher scientific thoughts. So would ever let me plant a mind virus that will let you see that there is this alternative thing you could be doing now, what is, you know, there is a sort of an ethical question there of, you know, you could be planting all kinds of mind viruses. Some of them could be mind viruses that say, actually, the order of the world is wrong.

You should blow IT all up. So I might sound overly lib here, but I I have like written previously about this class of questions and the answer. I think that this is actually a complicated sort of question, but I can but to just throughout a starting idea, if the search process you're running to find the arguments that they find persuasive is powerful enough to find arguments that convince them of false things, or maybe some particular false thing like fifty one is a prime number, say, if you can think .

for a moment to know that isn't true. But yes.

five plus one apple. But I know you got a sir, I ve .

just like .

saying for the benefit of the audience, like how how we use casting out nines to see that this one is divisible by three. So if you're running a search process powerful enough, you could find arguments that could convince somebody that fifty one is a prime number or the sky is Green or something like that. We pick some touch tone like that. Then you're running an overly powerful search process. You're running a search process that's powerful enough to corrupt them, end up in believing um false things instead of true things so that that's like that was too powerful of search process to mind affecting.

okay. So so you're saying that sort of if you can convince people, if you are method of convincing, if your method of education, let's say, is such that you could as well convince them are false things as of true things, then that method of education, or we might not call IT education, we might call IT brainwashing, we might call IT inductive inference, might call IT, whatever. And that process is a process that you shouldn't follow.

But I think so yeah like that's something you shouldn't that's something I would like to see not deployed against human beings.

right? But so I mean unfortunately as as we well know kind of the A I already and you know ranking content for social media and things like this are implicitly doing things which are hacking humans to get humans to believe all kinds of things mean that's yeah i'd .

say it's kind of border line. It's not clear to IT that the large language models are getting Better at IT than average humans are Better Better than the best humans um I mean open the eye doesn't actually tell ChatGPT to persuade everyone to send OpenAI all of their money and the reason they don't to do that um I mean I could be that you know sam altman wood and you never know is sam altman n but um mostly because they can't their lands there are large language models are not powerful enough at this point to persuade most humans to send them all of their money. We are starting to hear bits and pieces of stories about um you know large language models talking to elderly parents who are no software targets.

right? I mean that they're good at fishing and unfortunately, and humans are not very good at not being fished.

I mean what there is there cheaper at fishing? They can fish everyone and see who's most vulnerable, much more cheap ly, that you can get a human to call a one on the planet.

right? But so so let's see, we were going the direction of saying, you know when when is the indoctrination that's bad versus one is education that's good. And you're arguing that if you have a machine for convincing people of things that can convince people of anything, then that machine is too powerful yeah even .

if even if you only use IT to convince people of true things that still just kind of like overwriting brain with whatever you wanted, you decided to make a true stuff, which you know, like Better on using if you chose bad stuff. But I have my plumes about the educational method itself.

So you know, one of things that I then there's even a basic question of what's true and what do you mean by true? And you know there are things like, for example, you know there might be sort of formal facts where there is A A somewhat clear notion of what truth is. I mean, even in mathematics, it's not clear.

You know, mathematics, you can say, I have these acids, I say x plus y is equal to y plus x i'm gona assert that that's true, then many things follow from that. Is that really true? What IT depends what pluses I could invent a know, a plus for a judge. Plus y is not equal to five plus sex.

Well, I i've also written about this topic in my writing these simple truth and highly advanced to pistol logy one to one for beginners. And I would say that the subject matter of mathematics is which conclusions follow from which premises. So there's one question of, does this particular subsystem of reality behaving a way that obeys the piano, a piano extremes for first order arithmetic? And this depends on what's out there in the universe. Then there's .

the question of I don't think that the the you know what follows from actions depends at all on what's in the universe, right?

Whether the there's there's one question of does this piece of universe behave like the actions, which is empirically. And there's the question of what follows from these accents, which is mathematics. And in this sense that with that mathematics, at least as far as we can see from over here, seems to go beyond the empirical. If there's anywhere where the laws of which conclusions follow from which aims is written down and could be changed, we sure can see IT from where we are.

right? I mean, actually, in my whole sort of story of, you know, the consequences of this physics project i've been doing for the last few years that that becomes a more complicated issue. But when we can, we can go down that rabbit hole. But let's, let's, let's avoid that rabid hole for a moment.

if you ever find a way to make fifty one, be a prime number, you know like maybe hold off using IT until you know exactly what you're doing.

That sounds dangerous. Depends what we mean by prime. Depends what we mean by prime. And we have to define these terms.

And it's you there's a question of what I mean, okay, but but just taking the mathematics example. So we say we kind of um I was going to say hold these truth to be self evident. But that would be a different kind of uh that would be a different kind of thing.

The actis don't need to be true. We're not talking about whether the actions are true. We're talking about whether which conclusions follow from the mes.

Talk about them being self evident. They're just like they're not even a subject matter. The subject matter is what follows from the act teams, not whether the act teams are true of any particular thing.

Right, right? For sure. So so okay, so we've got this thing where you know you say you tell the kids, you know x pus y equals y plus two x you know that's just, you know, that's what we're going to take as true because we're going to change the accents of arithmetic, right? And now we have many, many influences from that.

So the you know I don't think there's anything from your point of view of of tell them only the truth that doesn't violate that at all. We just pick different actions to start from. We can pick whatever actis we want.

sure. But like you're not asking people to believe the conclusions. You're asking people to believe the conclusions follow from the accent, ines. And if they actually do follow from the assumes that you've told them a true fact of mathematics.

okay, but let's say we say fifty one is a prime because IT is if if you use you know a rithmetic and some extension field of whatever, okay, let's say let's say the uh I have to go start typing things to know what know how to construct something where that will be the case. But but you know let's you might say I mean, I think you would have i'm digg into this question of tell them only things that are true because i'm dubious that there's a well to find way to do that of math.

I would say tell them things that are valid. And if if anything, I would say that the math part of this is vastly Better defined than which statements are true about the physical world because in any any time you want to say that snow is White is true, you got ta or or quote, snow is White, unquote is true, you ve got to construct a whole representations framework between if the propositions and reality, or to tell whether proposition is true or not. With, with, with mathematics, with the question of which conclusions follow from which premises is like much, much easier to nail with that can nail down with that very.

right? You can step there has its own instead of assumptions. And it's all you know, it's just you start from these, you you you assert these actions and then certain things follow from those items and that and you can discuss you can tell people, you know, if you run this computation, which is what follows from these acids, the result will be x and that.

yeah, well, okay. So well, if you stick to first order, then the exact fact of the matter about which conclusions follow from this, from which accepts with respect to what happens when I run this computer program, some of them don't like from a second order view, uh, like second order logical viewpoint, some of them don't halt in what we call the standard into jars.

And there are some sets of non standard injures where they halt with one output, and some sets of nonstandard introduce where they alt with a different are output. We've public published any other. Should we perhaps return at some point to discussing whether or not artificial intelligence are going to kill us and everybody we love?

Yes, yes, we should. But let's we're on a good rabbit hole here.

We've been that IT for a while.

You've i'm enjoying IT.

Yeah but well, I think our viewers might also have some .

interesting whether they to die. But I just wanted to dig a little bit for the on this because you made a statement about, you know, part of the question is, will A I do things that we don't want IT to do? Now one thing we bat around all, but whether we do or don't want IT to kill us all and move on to the the, the, the, you know, the Better organism or something, but you know, let's step like that. We don't want it's .

not that i'm concerned about being replaced by Better organism. I'm concerned that the organism will not be Better.

right? I understand. Let's let's just say we don't want the humans to be wiped out. We don't necessarily have to justify why we don't want the humans to wiped out.

But let's say, i'm prepared to say for sure, I certainly feel as well I don't want the humans to be wiped out. Can I justify by some kind of appeal to some higher you know, higher claim? I'm not sure I can know.

And I like I have a moral and ethical framework which yields my my belief that the universe gets little darker every time somebody dies. But but you know a paper clip maximized or doesn't share those aims that doesn't share those premises. You can tell IT all effects in the universe, and IT won't change its mind. We have an actual conflict going, not just a it's not that one of us is making a mistake. I want what's Better .

and IT wants what's clipper. sure. right? It's, it's, it's fun that clippy is back in some thanks and maybe the world is going to be taken over by the the analog of paper clips with them, although they have different names these days. I mean.

IT won't literally be paper clips with probability. Nearly one.

Yes, yes, no. I am just thinking of these you we're building things that we're not presenting in the form fact of paper clipsed. But there are there are things that functionally behave like the old famous clippy. I mean.

that's down to the corporations deciding they want their A S tell particular corporate personality. I don't think that's .

I just talked .

base models and there are .

nothing like clippy. We were talking about sort of how you can sort of promote only what's true. And i'm claiming that a somewhat chAllenging thing.

I don't think that was easy.

They was easy. I just write down fine. I think IT degenerates into sort of questions like the ethics questions we're talking about. In other words, the what's true, for example, you know is is true that you know it's bad to be mean to people. You think that's true that it's bad to be mean to people and do not think that that is A A kind of proposition about which you can have a kind of a truth value assigned.

I think that although somebody doesn't know what they mean by the word bad, you can present them with a dataset of things and they'll say, this is bad, this is not bad. And they're able to do that because they do contain some set of, you know, su du aims, not literal mathematic axioms. But they, our inside of framework, there is a way they are internally making that judgment between good and bad.

And that is what gives the word is meaning. And so for any pricker person you're talking to, there is something they mean by the word bad, even if they can't define IT. And there are some set of conclusions they would reach if they knew a bunch more facts that were actually true.

And that is like the moral framework they are in. And that's what's lens, the word bad, its meaning. And I think it's like given all of that meta ethical framework, IT is I think it's quite sensible to say that you know the the kid who says it's bad when people are, he's probably just right.

okay. But but where the second you're basically talking about a personal truth though, no, what you're saying within your personal framework, something like it's bad to be mean or whatever IT is is is true. But you you're then defining you.

You're saying it's you are you're perfectly happy. IT seems with the idea that there is a personal truth that what is true for you may not be true for me. I would not .

choose to set up my theory of truth that way. Let's say you've got Alice who thinks that snoozer is equal to three and bob who thinks that snowden is equal to four. Alice thinks that su sal is a prime number, and bob thinks that snows le is a composite number, indeed, an even number, a power of two eleven. But it's not that Alice is personal truth, is that su zell's prime. It's that to find out the propositional content of Alice is belief that new zealous prime, we ask what he means by shuzo, and he means three, and then three is prime, is a universal truth or a universal validity, I should say, because it's about which conclusions .

followed from the taxi es o. But this exists tic basis. The, you know, the snoozer fact is a personal truth.

is a personal translation. Like it's a fact about Alice that when he hears the words noodle, le SHE thinks of the number three. That's not a universal truth. That's just like Alice s's personal dictionary. But it's not important as far as I do .

you think a universal which .

conclusions follow from which premises and first order logic? And then if you want to start talking about, you know, what have I got my pistol for here? Well, actually, i've now gone a different object in my fist over here, say, sort throat less, then should I turn out to need one? But you know, from one moment to the neck, the truth of what I had in my fist changed.

Only the truth didn't really change. IT was just indexed, and the index on had changed like the, the to the proposition. What is inside my fist right now changed from one moment to the next.

And to, you know, interpret, you got to look at me. Where I am in the universe is my face. It's not like gazillion an light years over in a different direction of actually IT doesn't exist.

quantum ba ba blah. But but you know what you have in your fist, for example, it's you look at IT and you say it's it's a throw something rather than OK somebody who lives in the amazon jungle and has never seen a throat whatever says what you have in your fist is a you know a droplet from, you know the the spirit of the of the wolf for something .

are wrong about that. What's I think they are wrong about that. I think if you, if you like, tell them more true facts. They realized that that was, no, not what I had in my fest.

Are you sure? I mean, other words, they will say that framework is, you know, everything has a let's say, I mean, you know, every natural object has a spirit associated with IT, and IT has this and IT has that and you'll be like, oh my god H I don't understand this.

I mean, I know you know in you know, I think there are there are different ways to describe things which even humans who are fairly nearby in my kind of formalism in in russian space, they're fairly their minds that are fairly nearby, their minds that are fairly aligned. You know, if you go further away, the, how does the, how does the dog described what you have in your fist? That's yet a different kind of level of difficulty.

There is only one reality. IT just runs on quantum fields, protons, neutrons, electron's, the six quarks. That's what real. That's what is. That's what is.

I don't quite want to say univerSally true because I don't know, but I what the word univerSally even true as in this case, it's what's real IT. IT predates us looking at IT. IT was here before we were.

And it's the fact that IT doesn't need us looking at IT is why I can contain us and make us real. Like if if I had if we had to look at IT to make IT real, I couldn't exist to make us real so that we could look at IT. And then on top of that, you've got language, you've got ambiguity, you've got humans who are confused and what else they think is true.

And this complicates the task of saying, of interpreting the words that humans utter in a way that lets them be compared against the underline cork fields at all. But that is not reality that is confused. IT is not reality that is ambiguous. All of that is in us. All of that is in the map rather than in the territory.

You are throwing me down my rabbit hole because I have to go. Now, this question of whether there is a unique reality and so on is a um you know it's just just to give you up a taste. Okay, then maybe we will go a little bit further down this rabbit hole and then we'll come .

back and i'll talk I think we really should talk for the artifically of our viewers at some point about whether or not they're going to die.

Absolutely, we're gonna get there. I feel like we need foundation that you know, I you've thought much more about the question of whether the ais are onna kill us than I have. But i'm trying to understand, you know, i'm trying to build up my foundation. So I understand where you from.

I think there's a problem where wearing like you have laid foundations and I have laid foundations. And we maybe need to find some level of top of these foundations rather than at the very bottom where we can establish common meetings for terms like the air is going to kill us.

right? You know? So yes, it's like the show of cat is the cat that are alive. And we argue about other. If the cat was replaced by an AI, by A A virtual simulation of the cat, would the cat still be alive or not?

I think these are very different questions .

and important way, right? But okay, just on the rabbit hole for a minute. I mean, so so you you might say the mass of the electron is a definite reality fact about the world, but we even know from existing physics, existing on a field theory that the effective massive an electron depends on how hard you kick to measure that mass.

So it's not the case that this point five one one M B or whatever is is is the start of usually stated rest massin electron that the fact is that depends if you if you kick IT hard enough, the effective mass will be in the case of electron with qd, it'll be larger than that and so on. And with courts is much, a much worse situation, because with courts, if you kick them really hard, their mass will seem to be, brother, quite small. You don't kick them very hard, they'll be very hard to move on.

And their effective mass will be larger. So even at the level of the formula m farnfield theory, we already know that there is some dependence on kind of how you look at things, what the reality see the world is, so to speak. The fact that we for for example, if you say you look up the mass of an upward, okay, there isn't really you know depends on what you mean by that IT depends on you know what momentum transport you use and ba ba, it's not you know this you you might think that would just be a fact of the world, but it's actually more complicated to them.

I direct the interested reader to my the quantum physical sequence in which I try to explain how um the decoherence viewpoints sometimes known to the lemon as many worlds interpretation um lets us have a non confused view of quantum mechanics where there are facts out there in territory and all of the quantum wireless is in terms of like how we map IT rather than something out there in reality itself.

I don't think we should go down that rabbit hole. I've tried to go down that rabbit le, so deep rabbit le, it's a very deep rabbit hole. We should dock, go down .

that rabbit. You know, I am very excited about having reached I the bottom of you know that rabbit hole in some of the things that I figured out the last few years. So i'm i'm sufficient excited about that and I think it's very relevant. So this question about is that one reality is very relevant to the question about sort of how do the ais think about things?

I think they can kill us in a classical universe. So what if we just need a question of, I decide all the quantum stuff and talk about whether or not they would kill us if we were in a classical universe? They need the question of whether not they would kill us in a classical universe has all of the same issues as the question of whether they would kill us in .

a quantum universe. OK. Well, okay, so all right. But I think we're gone to start talking about technology in a moment because I think that's what's gonna relevant for you.

Look, the bottom line in this kind of rule i'd construct that you know, i've been talking about a lot last few years, it's kind of the ruler is kind of the entangled, limited, all possible computations. And the way I see kind of our version of reality is every mind is effectively seeing a particular slice to this really add. And the fact that we all kind of more or less agree about the laws of physical, for example, has to do with the fact that we are minds.

We are not only nearby in physical space, but will also nearby and rules al space. And we have kind of similar views about kind of our impression of the laws of physics. So I don't think it's the case, as you are implying that there is one kind of reality of the laws of physics.

I think that just has to the question of where we are in this rural space. And by the way, when we think about constructing other computational systems like a eyes, they could be nearest and real al space, they could be far away in real al space. The ones that is .

real space in the map, in the territory like, is really a space. What people believe about things there is really a space.

What is there really a space is what is there. And what people believe is their their impression of in. Otherwise they they have a certain the observer has a certain model of what's going on. And you know, the laws of the forgiving .

me if i'm jumping ahead here, but if if the A S are able to reach out and kill us, they must must be quite close to us. And what I would call greater reality, there are none in different universe with different laws of physics, because then they couldn't touch us.

Well, let me, let me just point out why. And A, I might have different roles of physics. So, you know, take some of the standard laws of physics, like the second of the myners ics, sorry.

laws of physics in the map, the territory.

I'm not sure what's map, what's terrace like?

Does the A I have a different model of physics, or does IT act according to different physical laws?

From our point of view, IT has electrons doing all the things that we expect electrons to do so. But it's impression of what's going on might be very different. So couple of examples.

So for example, let's talk about the second of the amicus, the law that says, you know, you start a bunch of gas molecules, often some orderly way, and they'll tend to become more disorder ded, so that we say that effectively have higher entropy, that that principle, that idea, that fundamental law of physics, is a consequence of the way that we observe those gas molecules, those gas molecules, they bounced around, they do what they do. The fact that we say, oh, they are just going to this random configuration is a consequence of the fact that we can't decode the actual motions of those gas molecules. We can imagine a computational system that could.

So there's like a territory level fact and then there's a map level fact. The territory level fact is a thing that is like true, that is a consequence of the laws of physics treated as axim, which is that if we selected a volume of possible initial states, that volume of initial states develops into in at least equally large volume of states. You never shrinks.

You never started from two distinct physical states and end up with the same final state. And because this is true out there in the territory that are, that is true that our laws of physics are such as to have this theory as a consequence of IT. The fact that we don't know where in this volume the system starts means that our volume of uncertainty doesn't get any smaller without observing IT.

And this is the fact in the territory that makes the fact in the map true, where entropy goes up from our viewpoint on IT our beliefs. But there's an underlying way the universe is a different universe. Might not to be second love therm dynamics. If I could start out, if you could have lots of different initial states developed .

to the same and state. No, I mean, let's just assume that the microscopic laws of physics are are reversible. So know, know, the things bounce, like billiard balls, like perfect billiard balls.

And you know, we'd have a movie, we make a movie of what happens in the forward. You know, we say, this is what happens if you run IT forwards in time. And you know, if we run IT backwards in time, then that movie will be equally valid at the level of individual collisions.

Yes, but the fact is the fact that we believe in the second of syndics, that we believe orderly configurations end up in disordered states, those those states that they end up in are not, in some fundamental sense, in your territory sense, those are not disordered. They are only disordered with respect to our perception of them. Other words, there is a perfectly reasonable, you know, if we could do the computation run IT backwards, we would say, if that state, oh, I can tell that I came from that very simple initial state.

I I agree, in a classical universe with the quantum universe, there's a couple of caveats about like how you might need to run both quantum branches backward in north for them to reunite and restore the starting state. But but know classical universities are all we care about anyways.

I hope model of physics that that's very well understood in terms of this whole story, multiple graphs and branch o graphs and so on. But as you say, it's not let's not go down that's a sub, you know that's a side rabbit hole is let's avoid the side rabbit all at least kay.

I proposed that that by like five minutes from now, according to my watch, we have to go back to talking about whether or not a kone o right.

Anyway, just just to finish on this point about whether you know so my point is a something which can do all those computations and consensual as molecules won't believe in the second world of dynamics. To give another example, that's even more direct. Imagine you thought a million times poster than you do when you look around the run.

You are receiving photons that get to you within a microsecond. Your brain right now thinks about them in milliseconds. So as far as your brain is concerned, there is an instantaneous but state of space that progressives, through successive moments of time. If you were to think a million times faster than you do, i'm not sure you would believe in space. I think you would. Space would be some kind of construct that you could imagine constructing, but I would not be the natural thing for you to think about, because the fact that we think of a state of space is a consequence of kind of our scale. And so and by the way, for an ai or computer that happens to have silicon that runs a million times faster than our brains, that won't be the natural thing for the, for the, that won't be a natural construct for such a system.

because they live in manuka space time instead of the ugly dian space time.

No, no, just because of the scale of brains, on the scale of physical scale and the speed of light, you know, the light from what we see around us is arriving sufficiently quickly that we have accumulated all the photos from all around us before we think about how things might have changed. And so we're we're kind of taking in gulped sort of all of space at a particular moment in time.

And then we say, and the next moment in time, sort of all of space looks different. So this idea that that space exists, that is a reasonable thing to think about. Space, I think, is a consequence of.

Being us rather than silicon based computational systems. So is just an example of how sort of the reality and the way that one constructs minds model of the world. Because any model of the world that is going to fit in a finite mind as an approximation to what's actually happening in the world. So the approximation we choose may be different depending on kind of what we are like as observers of that thing.

So so the part where I would agree here is a eyes may have different sensory modalities. They may model reality at a finer level. The world we see around us now is, you know, we don't see electrons, we don't see proms.

There's actually some you know that, that information is not quite available to us, but you know we don't see cells. I talk to you and I do not have a model of like what goes on in your cereBellar right now. And if I was like much, much smarter, I might have a whole bunch of hypotheses about which neurons and you are firing.

These are facts at a finer level of granularity that I can afford to keep track of given the computational power that I do have. So so I do agree that you know probably else you get smarter, you end up modeling aspects of reality that we don't that don't easily fit into the sensory modalities we have right now. You probably .

have a very smart, just different. I mean, you know, I don't know whether or even the craters of the earth, you know if one could find from them what their physics says, you know, things where all faction is the primary sense on. So if you're Better .

at predicting everything that a human can predict than a human than I would call you Better at prediction than the human rather than qualifying IT by.

well. So the question of what you I mean, there's also a question of do you predict you know, do you predict the position of every atom OK computational ready sibling ties is going to get in the way of that? That's not gonna work.

So you have to say, I mean, to pick these things, i'm really good at predicting. I can't predict everything because if I try to predict everything, I run a computational reduce ability. So i'm going to predict certain things.

There are certain things that we as humans care about predicting. And you like the the overall motion of the gas, not the individual molecules, and so on. There are other things that we as humans don't seem to care much about predicting.

So the question you might ask is, can you know perhaps the thing to ask is if you are trying to predict the things that we humans care about, given that we can't predict everything, you're onna pick a certain set of things to predict that you as the entity, whether human or A I in some sense, care about whatever that means for the A I to care about something. So i'm not sure quite where we going to that. I think we've agreed to go back to where the a yeah.

I was. I was gonna in to jack there because I I would really love early as just just, I have a really good crack there. So only as if, if you would in mind, could you just give us a step by step arguments for doing. Basically, we have plausible inferences and premises and and and evidence. Can can you spend a bit of time laying out and then let step response IT.

This is hard to do for a general audience without knowing what the individual think is a hard step was recently, you know running on twitter across somebody for who was, like, explained to me how the AI does anything in the physical world, how has to do anything at all to himself, was a lot of universe that the like chapels and saw just couldn't reach out and touched the physical world. That to him was how the world worked.

And they can, they are connected to the internet. You can take an open source, large language model and let us send emails. The current chat pots can emails. But, but know you think we .

all agree that that a is can be connected to you, things that actuate in the world, either accurate by by having actual machines do things, or accurate by convincing humans to do things. I certain ly.

this is not a hard step for you, but it's a hard step for probably some viewers right now and other people of different hearts that some people the question is why does to trade with us? Um so for my person, there's there's the kind of straightforward story here where IT build more powerful tech, where the more technical of conversation you want to have, the more specific I can be about our current grounds for believing that technology Better than the technology of twenty twenty four as possible and can be built in not that long of a time. And then IT has you know more powerful actuators in the physical world that has equivalent of guns to the native americans and then IT kills us once I already have a one set for structure .

so that perhaps the most useful thing would be to go through some steps where, you know, like the actuation in the real world. That's not a hard step for me. You know, i'm there.

Okay, so let's keep going after that. So IT IT um you know so the AI right now, you know there's a question, do A I have freedom of thought right now I have freedom of thought. I just think .

whatever I do think i'm not free to think anything other than the things that I think. I just think whatever I do think.

but but you're not constrained in what I mean. You know, the way the world is set up around electrodes are sticking into your brain that as soon as you have, as soon as you start to form some particular thought, shock you and prevent you from having that thought.

there is no need for the thought for those electrons. I'll only end up thinking, whatever is, I actually end up thinking. But I agree that there aren't, you know, visible physical electrodes like that attached to me right now. Social electrode s we ve all got those.

Yes, indeed. But but so you know, for the air, let's say that A S I mean, we didn't talk so much about what's actually going on inside the air, which I think is somewhat important because it's a question of whether you know how how a line are they with what's going on inside humans, to what extent are they just doing random computations out the that are not aligned with humans and so on. And I I don't know I know what that that's important to your argument.

but we know random is not scary. I mean, that was going to be my response to way back when when he talked about machines being smarter than you or sillar automated being smarter than you because you can't predict them without going through all the steps you can take a unpredictable seller, atomic time like that in hook IT up to play chess. Annita lose to you annita lose the stock fish like the the stuff that is predicting which actions are need to get to which n states is the dangerous stuff that that's the the stuff that is predicting what happens next, predicting the next observation, predicting the facts behind the observations, figuring out which actions are needed to steer those facts to be different .

in the future that are human relevant observations. I mean, in other words, the salary autonomy is doing what he does. We have a hard time predicting IT, but we don't happen to because if that's so little Thomas. So for example, know something there are so terms of models for road traffic flow, okay. I was kind of a funny story that I was interest in that topic and and didn't not figure anything out.

And then you have used up all of your rabbit holes. You have no rabbit holes left. continue.

Alright, okay, let's wear the rabbits. Have been all, all the rabbits. What was a maximum? Sis, no, that was the that IT doesn't matter what exactly .

happened to the rabbits, they're just gone. Nobody knows what happened to them.

And IT doesn't Better. Okay, all right. So, so we're actuating in the world and the AI gone, what what happens next?

So do you want me to talk about like what its motivations are? Do you want me to talk about what IT doesn't side the world?

I don't know what that even means. I mean, how do you tell what the motivation of an AI is?

Well, where is well, if IT? If it's efficiently smart, you look at where things end up and figure that's probably where I wanted them to go. So if.

for example, I do not know what the human motivation is either, we can't, you know, I suppose we we deduce by induction some human motivations by looking at what the humans actually do, because we can't we can't go inside and look at their brains and see what what there.

But so the relevant aspect from our perspective is that I wanted to do something with atoms that maybe wanted to make paper clips, maybe wanted to make enormous chess cakes, maybe I wanted to make really intricate mechanical clocks. Maybe one of, do all of these things I worried about.

understand, want IT to, I really don't understand, wanted to, all right.

IT output actions such that I believe the result of those actions would be for the world to end up. And oh, okay.

just what does that actually do?

So would you can I use the word to predict?

Well, let's okay, let's let use these words. But i'm going to insist on taking apart these sentences. So so get to say what you are going to say.

And so let's consider the simpler case of a chess playing A I IT models, which moves you could make and estimates a probability that IT will win against an ideal opponent and opponents as strong as itself if IT moves here on the board. Om IT obtains these predictions by building out computational structures with a direct icon morphisms to some possible features of the chest ort. So i'm willing to say that IT has beliefs about the possible futures of the chessboard, because that is modeling those things in a very one to one way.

I doubt you can get inside its brain to see those models.

We can form the old school chest system, this maybe not the modern ones that use neural networks, but the old school chest system for just straighforward extrapolating out one.

I don't think you're worried about the old school systems. You're not worried about the system where you can get inside its mind and see what it's thinking. I think you're worried about the system where you can't get inside its mind and .

see what it's thinking. sure. But but i'm i'm starting with the examples where we can look inside their programs, even inside their workers. So I can defend by use of words like belief, well, I mean something like having a model of something out there in reality, where this model lets you predict what you will see reflected from that thing out there in reality. Depending on how you poke IT. And then you poke IT in such a way that the thing ends up in a state like a chess A I making moves on the chess sport such that IT wins the chess game.

But what I don't think you can do is to say, let's look at the primitive chess playing program, where we can mechanistically see what model let's smoking inside. And then sort of use the same thinking to talk about the modern chess playing program where we can't readily identify what it's what it's model of the world is on the inside to.

You modify quickly into jet early as you are going somewhere really interesting a second ago, you are saying at some point a thing which does prediction once IT has agency. Can you explain how you go from a thing that predicts to a thing that once and and then let Stephen respond to that?

Stock fish is the leading a current chess playing a ye that you can like buy at all and or run yourself. And IT doesn't have passions, but it's also isn't just like a predict IT takes actions such that those actions are believed by IT or are predicted by the model by IT to lead to particular futures, namely the ones where IT has won the chess game.

And the fact that IT wins the chess game gives us some reason to believe that it's right that IT has a grasp on the logical structure of chess. But IT knows how that that is not just like, you know, thinking, random gibi h thoughts, whatever thoughts s is thinking in there, the whatever the neural nets our computing went in order to predict the probability that a chest state leads to a Victory. It's well calibrated.

It's good at guessing. Um it's it's not well calibrated in the system that you would win against because you're not. The bonus is trying to play against this, playing to play, trying to play again st itself.

But you know, like IT knows that probably that would win against itself and and when IT place to win against itself, but also incidentally, crushes you because you were even weaker player than that by observing IT winning, we have reason to believe that I had enough of a grasp on reality, that you could choose actions such that that's the frame, that's the theoretical framework i'd offer for talking about wanting steering goals. Choice without talking about passion. That is, the actions such that a thing of A A state of reality eventuates, or a partition of reality eventuates, that is something that IT prefers, ranks high in its preference.

Ordering attaches utility in its utility function. The action such that that leads to that result, something that is good enough at outputs. Actions such that they lead to result is deadly. IT can kill you.

Does the rock want to fall to the ground? Or does IT just fall to the ground? Because the laws of motion, you know, the gravity caused IT to fall to the ground. Does he want to fall to the ground?

Just falls to the ground. And one way to see this is that if you put that on top of a mountain, itll roll down in a sort of locally greedy fashion and maybe gets stuck in a little rivet along the way, IT will not. If you could put a rolling object in many different places along the mountain. And each time I would win a direction where IT avoided the revenge, avoided all the little traps and ended up as far down as I could reach. I would say if that thing IT chose to roll in a direction such that IT reached to the bottom.

Let's try to take this apart. So most you know most things, you can describe them either as mechanically doing what they do or saying they act for a purpose to have the result that they have. And so now the question that we're trying to take apart here is you are saying that there is a fundamental difference.

There is a way in which an A I can more act for a purpose then these kind of physical processes where they could equally well be described by sort of questions, emotions. So what you're effectively arguing is that the the non teo logical explanation for the AI is is not viable. What i'm arguing .

is that the non computational explanation is overly expensive for answering the question we care about. We could compute what the A I. We could compute what stock fish would do in chess by actually simulating IT lined by line. But from the perspective of of a player, like from the perspective of a grand master who's not as powerful as stock fish but is stronger than me, the grandmaster can do a pretty good job of predicting what stock fish will do by asking what is the best move without being able to compute stock fish line by line.

I think what you're saying is to compute stock fish line by line is sort of running into computational .

reduce ability or just needlessly expensive.

okay. But but let's assume that you really have to follow the line by line. But in fact, certain aspects of its behavior are reducable in the sense that you can describe them by saying, you know, your your thing is to say he wants to win or whatever is a is a shortcut way of describing its behavior. That is an alternative that a cheaper way to work out the answer. Then I would be to say IT follows every step and that's why IT does what IT does.

Okay, this is gonna get subtle. Um I i've written about this before under the heading of what I would call vinne, an uncertainty with vinje spelled as a vern vini um and it's like what does he like to believe that something else is smarter than you?

So if i'm a playing stockfish sixteen, I can perhaps with a bit of brain work learn to put a well calibrated probability distribution on stock Fishers moves meaning that when I say I predict will move here with ten percent probability. When I say ten percent probability um one out of ten times that thing happens. I can't predict exactly where stock fish would move first because I haven't simulated a line by line. Second because i'm not that good a chess, but even being as terrible a chess as I am, I can still aspire to be appropriately uncertain when I describe how I don't know where stock fish will move next.

These arguments that say we can't predicted exactly, but we can get the probability. Those arguments, in my experience, always slide down into the into the mush in the end. No, what if you say, if you say you, you, you can't do IT exactly. And that this comes up. And of thinking about computational disability, you say you can't precisely predict what the turning game will do or whatever else, but you can probabilistic ally say IT is not hard to come up the situations in which even to know that probability is something which runs into the exact same irreducibility issues as to know the exact result.

I mean, if I predicting binary variable, I can say fifty fifty and be perfectly calibrated, even if I not at all discriminating to have things that you say happen with fifty percent probability happened, fifty percent of the prime is fifty percent of the time is always achievable. And binary variables, I agree that doing Better than that can turn out to be hard.

But but I think what you're saying is if I am on a standing correctly, you're saying, okay, we can't say exactly what's gonna en. Maybe that's not where your argument was going.

actually wasn't what I was going. Well, what I was about .

the point the point I was going to make us that that predicting probabilities accurately in in ends up being just as hard as predicting exactly what's going to happen. You know, correctly.

I think I could always pick all of its legal moves and assign them equal probability. And then things that I happened said would happen one out of thirty four times would happen one out of thirty four times because I would say that when there were like thirty four different legal moves, yeah.

that's not an interesting cause. It's not .

interesting case, but a demonstrates that you know calibration is always possible. Discrimination is what takes the work right.

The same things that you can say about about any system. For example, if you say it's playing chess, that means IT can't suddenly you know one of the pieces can turn on its head or something sure you because that is playing chess and there are these sort of external constraints um but I this wasn't what you .

going so please go where go with this is I now if you now take the opponent that makes what I predict stock Fishers moves to be at the probabilities that I predict them, but randomly. Ly, this is much weaker than stock fish. It's weaker than me.

I could crush the system. So one way of looking at IT is that our belief in the systems waning ness, or steam, or ideology, or whatever turn you want to use, is embodied in the epistle mic difference between my playing the real stock fish and my playing the opponents. That makes moves randomly with the probability I would sign over stock fishes.

Immediate next moves when stockfish makes a move. That surprises me a lot. I figure i'm about to lose even faster than I previously estimated because it's seen something I haven't. When the random player that moves with the probabilities assigned to stock fish makes a move, I signed a very low probability. I'm also thinking this game is going to end even faster because i'm about to crush IT randomly made what I think is probably a very bad move. So the none of local information about where there's the plans of the system end up is the teos gy that we attribute to IT, the wantonness, the stinging, the planning ness.

okay. So so what you're saying is in in the you know when the rock is is falling, whatever the IT is, the end point of, you know IT ends up on the ground or whatever. And and it's that whole that whole trajectory that IT goes through, that IT is almost planning the whole trajectory.

IT is not that, you know IT gets one little distance, IT gets IT gets one foot, you know IT moves by one foot and then you separately say what's gonna en next? You're saying what you're saying. I think what what you're describing is the thing that you say, oh, I know I did that on purpose and SAT in a sense. I know I should describe IT as something that happened for a purpose. You're saying that you can identify that by saying, I look at the whole, you I see that every step all along the way was done you know, with forth thought with IT was IT was sort of thinking all the way.

all the way along IT gets to the bottom much faster and and or like ends up much lower than a rock randomly moving or a rock randomly ly moving. The local probabilities that you would attribute to its next action if the rock is smarter than you or knows the particular mountain Better than you do.

right? But I think so. So this is the the. I think the issue and it's hard for me to take support in real time, this is not not my not my usual territory, but it's you know what you're saying.

I mean, this question of what should be described as happening for a purpose versus what should be described merely as happening by mechanism, that's that's what and I think what you're saying is your your notion of the A I wants is those things which can't really be described meaningfully in terms of mechanism. They sort of can only the only feasible way to figure out what's going to happen is by a description in terms of purpose that you are saying you can't work out. So what's good by just following the mechanism, you can't see what's going to happen. But if you take the model that is all about purposes, then you can tell what's going to happen.

Well, so it's not about can't I like if you take a sufficiently crude chess player, you can in principle just literally work IT out by paper and pen. But but there's also a shortcut mode of reasoning that gets you to their much faster. It's heuristic ally useful IT doesn't give the same perfect prediction is the mechanistic one. The mechanistic level is always more correct. But IT is sometimes needlessly expensive for the thing you're .

actually trying to do so. So I mean, back in in antiquity, people described a lot of physics in very anthropocentric terms about what you know, where rocks wanted to be and things like this. And they got a certain level of description. And even now, you know, people commonly describe their computers wanting to do things and so on, even though, you know, there are, there are forms of explanation where the, where the, the sort of model that says IT is doing this because he wants to do X, Y, Z, is a good purism model, as you say. I agree with that.

I made for chess players, your cat. Yes, for rock. I think IT was a more questionable decision.

Yeah I mean, IT seemed like a good idea at the time two thousand years ago. But but you know, we advanced ce since 的 the cause。

The fact that my definition is a subjective component doesn't mean that you get to just say for anything and be correct. Like I think this is fair to say that cats want, that cats and dogs want more strongly than rocks. Anybody who says otherwise is mistaken .

about how rocks work. Well, okay, but this is your OK from the outside, what the cap thinks on the inside, we don't know, right? What we can see is, from the outside, IT is a much easier description that, you know, the cat wants the piece of cat food, then to say, this chain of neurons and the cats brain made IT do this. That in the other.

yes, in this case, in this case, the mechanistic description is actually beyond our knowledge. We don't have a complete can of the cats brain. We would have trouble running IT even if we had IT.

In this case, we actually the the little logical explanation for the cat is all we have. The cat is planning to end up having swallow the food. That's why you can put IT in different starting locations around the room. And IT will move to the ball and eat what's in the ball.

right OK. So so there is a form of explanation about what's going on that is a convenient form of explanation that involves, you know, once some purposes and things like that. So one question we might ask is taken out or take IT, you know a modern A I and asked the question in in describing what IT does. We've been pretty bad at describing mechanistically. What IT does. IT feels much more like we can describe IT in terms of once because he feels much more that I mean, at least that's that's my impression or maybe that's the point of making is that that the disruption of what's going on, what happens with IT, the heuristic about describing IT in this very human terms about what IT wants, seems to be a good way to predict what it's going to do Better than the thing that we found very difficult, which is to mechanistically ally say what it's going to do.

I think that that's currently a much more fraught statement to make about large language models. And just to make about the stockfish sixteen chess playing system, I think that when you look at stock fish sixteen and you say he wants to defend upon you are a much firmer territory. Then if you look at three or five at one senate and see if IT IT wants to be helpful.

okay, I mean that that that's a you know a sort of philological statement that I am not certain and we can try and tighten that up. But but let's let's imagine, I mean, you're saying, you know playing chess is an easier to define activity than being helpful. So it's it's a little bit easier to say you know if from the outside, the behavior is consistent with IT wanting to defend the born, so to speak, because the set of things that you do in defending a porn is more straightforward to define than the set of things you might do to a quote to be helpful .

or or something like when I actually interact with these large language models, I don't feel like i'm usually asking myself, what does IT want for? For one thing there, there are still, in my experience, not quite capable of doing very much that I want them to do is, yes, they just fall down if I ask them to do that because they're not very good at planning or correcting their mistakes.

Even after you point out their mistakes, them usually asking them for information, and ideally information, I can look up afterwards. And there they're usually truthful. If I don't ask them to do any math, maybe they cracked that by now. And in like GPT one or something. But you know back in the old days, they just like drop three hours of manga ude and some random calculation unless .

they were using our technology as a tool. But that's a different that's a different stripe. But let's take the example of a self driving cop.

okay? And whether he wants to you IT, IT wants it's trying to get to a particular, you know, it's trying to make a particular turn. But to do that, IT has to drive through traffic, for example.

So I think that's the case. What we probably say, IT wants to make that turn. That will be a reasonable description t about .

its so you can put in in suite different initial conditions and change the environment around IT. And it'll still make the turn or at least not crash. IT doesn't want to crash. You can do a pretty good prediction of what this thing will do and what will happen as a result where you say this car doesn't want to crack.

Okay, I agree.

Could I just make a very quick comment? Has just because I think some of the audience might not be able to followed some of the things 嗯 you're talking about。 Is IT fair to say, alisa, that you're making them an epidemics subjective argument about an observers perceptions of of .

goals or once of assistant that was more that .

was more placated than thing made in epidemic subjectivity. You know about the beat and and john sell made the ontologically subjective argument about the chinese room. But my point is though, is that there's there's the actual behavior of the thing.

The agent has all of these goals and actually allies. I would love you to describe the relationship between and intelligence, because some of those goals might be very complex, and they might scale in strange ways with intelligence. But just to this previous point, are you saying that as bounded observers, to use Stephen language, we perceive the the ones that goes to be somewhat different, what they actually may be.

So there's there's a slight clash of like philosophical frameworks and emphasis here going on.

Like when I try to talk about a stance we can take with a respect to a chess player or perhaps later a super intelligence um or and like talk about whether pleased to current large linguis models, i'm like, well, if we hypothesize this thing about the AI, what are we hypothesizing? What does that lead us to predict? How's a different from just predicting a rock? And what I was trying to put my finger down there is like the difference in what you predict when you with your limited computing power in your bounded intellect, i've abounded intellect to it's not an insult in your bounded intellect um like try to get a grasp on the system.

You can't follow line by line because you just don't have the nothing you couldn't do IT. In principle, you just don't have the time like what does that mean? What is the consequence of the predictions? So I am talking about a state of mind, but it's not like a sort of weird subjective state of mind.

I was trying to nail this down. I was you're trying to say like what what can we hypothesize? And the thing that we can hypothesize is this these things actions will end up leading to a certain end point. We might not even know the trajectory is going to take to get there, like if we don't know the details of the mountain, the rocks choices, maybe seeing that we don't understand even after seeing them.

But we can may still be able to predict that the rock ends up at the essential rock ends up at the bottom of the mountain, because IT was choosing h path along the way, even though we don't know what I knew. And and this is, this is the like, how to get a grasp on the thing that, you know, a Better chess player than you. What does that mean to say that something is a Better chess player than you? You can't predict exactly where it's going to move as a result of saying you believe this thing about IT.

If you could predict exactly where would move, you would be that good to chest yourself. You just move ever you whether you predict stock fish moves. The content of our belief that stockfish is a Better chess player is that its actions will lead and will lead to the ending up in a winning game state. I don't love that answered to .

your question on that letting Stephen come in.

I don't know what I was confused by the by the the epidemic subjectivity kind of I think it's very analogous .

to your notion of us being computationally bounded as as is over. So IT IT simply means we have a cognitive horizon and there are there are things which are inconceivable to us.

okay. But but so what we've got is I think what early is I was going towards is the the statement of the suitable ranking of chess players um and you know the question one question with respect to A I in the world is is the world rankings in that way? Other words, test is a very tiny microcap sm of the world.

And you know if we say the way to win, you know the question is how do we win the planet? What does that even mean to win the planet? And if there is a, you know, if IT is like chess, if winning the planet is like chess, then, you know, then there is some there's some notion, you know, whatever they called IT, some they call the .

stores ello. You.

I think, yes, yes, okay, right? The stores. But you know who who is Better than who? Are you imagining that that sort of a game being played between the air and the humans? And that is a microcosm type game, like chess? Well, it's kind like who's gonna in the planet. Is that is that kind of the well.

suppose we go back to the energy if the native americans facing down the invading europeans, there's a lot of games along the lines of carve this particular bow and arrow where the native americans are Better at IT, or the europeans just started bothering to play that game at all. But there's overlap. There's intersection.

The native americans cannot just leave the parts of reality that the europeans have access to. They need to eat. They need to be on the land. They need to be hunting the animals. They need for the unt animals to not have already been hunted um and they need for nobody else to defense them out of that land, chase them out of that land or shot them. And and so so there's different games, but there's like you know, we can't just you know leave all of the games that the A I may want to play because the games the A I wants to play may involve Adams and we need some Adams from somewhere.

right? okay. So once is I mean, so let's say we've got, as you know, as is happening, we have autonomous drones or whatever, right? And they are there are a thing that based on eye and those those are you know one can see that of somewhat realistic path where those things get you know get pretty good at what they're doing. And I suppose your your contention would be that one of the types of situations would be those things become so good at being killer drones that no human can possibly, you know, succeed against them. And in so far as they're been set on the course of being a killer drone, that for whatever reason is doing to try and kill everybody, that that that will be the result.

That is the classic science fiction scenario. IT is not the scenario that i'm worried about. I expect IT, you know this, to be born in a lab and a giant open source online collaboration which I would Normally be regard as very virtuous and noble but not if IT leads everybody killed um like that there's only so good you can get IT being a drown that what the I needs to kill us is to be Better at strategy, Better at building, Better at inventing technology.

And IT is not going to kill us until IT has its own factories and its own equivalent of electrical generation. IT is or you know would have to be quite stupid and smart the same time to kill us before I had replaced us as factory workers. IT does not have to Spark stupid at the same time. I'm afraid to kill us after we have been replaced.

This factory worker, okay. So so let's just walk through what happens because I mean, you know one thing as sort of computation runs its course, you know what I kind of seeing, the computational universe is most computations do things that just aren't terribly relevant to what we humans care about, or the or where we exist, all those kinds of things. But you're saying, imagine this thing has been created that is is kind of a you know that and again, i'm still kind of concerned about this idea about the thinking that is kind of this this one dimensional elo based kind of way of of competing with that IT .

doesn't have to beat you at chest to kill you. I just have to beat you at guns.

Yes, but but but okay, but so so let's imagine this thing is created. Walk me through what what, what the, you know, what's what's? You don't like the science fiction scenario. So what what's your scenario of the other world? Lens typing?

OpenAI builds GPT seven, five, fourteen. timing. These things is much, much harder than predicting where they end up eventually. Historical scientists have sometimes made correct calls about the future. The ones who made correct calls about the future very really predicted IT down to the year.

No, I mean, I was just recently, I was reading a lot of stuff in the early one thousand nine hundred and sixties about neural. That's an A I and so on. And I have to say many of the paragraphs that people were saying in the early one thousand nine and sixties, I could take that paragraph s and I could say that was just written in two thousand and twenty four. And everybody would .

not be surprised, at least can ask a quick question on on this. Sure, just quite quickly because right now that the language models are are trained on human distributions of of data somewhat, somewhat. But but you the reason I was getting to this kind of you, where do the ones come from? And the goals, I think a lot of cognitive scientists argue that agency is an as if properties.

So it's an epistemic thing. It's not an ontological thing. So if I understand correctly, you are saying that if if a system behaves in a certain way and and IT does predictions and so as a observers, we can um kind of talk about IT as if I had goes and and then we can talk about language models as if they had agency.

But you you are now making the argument that add some level of scale. The the agency, the ones will kind of diverge from this statistic distribution of the data that that trained on. So could you explain where that gap happens?

So predicting the next token is a different sort of mode from selecting actions based on which outcomes they lead to. And although your modern large language model is initially trained by predicting a very large masses of data that include human text outputs, but also images, they could include, like sequences of weather observations. They can include all kinds of things besides human data, but they also include human data.

And what you have at the end of this is a thing that is good at predicting the next token, which is not merely predicting the next token because of numerous people have have observed at this point to predict the next observation, you have to predict the thing that is generating the observations. This is, in fact, the whole basis of science. We predict our experimental observations on the basis of what we think is happening beside behind the scenes.

And when we have a good idea of what's happening behind the scenes, we can make Better guesses about what observations will get. But this is not planning, and this is also not the chat pot that you see. The chat pot that you see once, IT is initially trained to predict what human say next on the internet, and also sequences of weather, dad and whatever, is then retrained to say things that humans rate is more helpful.

IT is trained both by saying IT is one stage of that. Retraining is training IT to give particular responses to particular questions instead of what a random person on the internet would say if that thing were a random internet conversation. But then even after that, there's a further stage for training more thugs up, you know, not from general users in the way that usually happened, but from like a bunch of people being played, being paid two dollars an hour in an english speaking countries with countries where some people speak english and you can pay lower wages.

Um the word dev is famously overused by ChatGPT because they asked people, because they pay people you know less to train IT um in I believe if I will call correctly nigeria where if you speaking english in nigeria, you use delve a bunch more than people who speaks in english in america, london. So these people, so these people giving thugs up, thugs down. And now you're getting into the actions such that territory, it's the output such that the user gives sums up.

There are solar ways to end up with things that have the action property. I have not looked at the ude in detail cause various people are saying various things over time, but you allegedly ChatGPT can play chess and not by making a function call to a specialized chest playing system, but just because IT read enough chess games and try to win those chess games. And i'm not quite sure what the state of this is exactly.

I know that when people specifically tried to train a large language model to play as they were allegedly able to do that just fine, even though he has a very different architecture and IT can't do anything like the index search that the you know mainline super strong chess players use. So you can do that without the thugs up some style. And and one way you can do that, for example, is to tell IT at the start of the game who won. And then it's got, if you tell you that black won the game, then it's got to predict moves by the black player that are likely to win given the moves that the White player has made. So you know, like it's it's a bit subtle here.

You can by the way, I think it's worth saying that there are things that an element like architecture can be expected to do, and those are things which are somewhat a line with what humans can easily do. And there are things which formal computation can do. But at least the current kind of architecture of things like large models really is not good at doing .

and .

like multiple ation yeah but yes, as an intrinsic picture. But I mean, you know it's worth realizing that that you know probably because the actual architecture of known that is somewhat similar to the actual architecture of brains, the kinds of things and the kinds of decisions that that these types of eyes make so much similar what humans can do.

And so you know the things it's the things that humans can do, they'll be able to do, maybe the things that the only computers can do well, only computers will be able to do them. But I don't think this is important to your argument. I mean, I think you you you're kind of going in as I understand that you're going in the direction of saying what defines the wants. If you are going to describe the action of the AI in terms of once, if that's your your form of description, where are those ones going to come from? This is that way you are going with this.

So one thing is an effective way of doing, and that's why humans ended up doing things. Planning is an effective way of getting there, and that's why you can ended up planning things. We were not explicitly selected to be great planners.

We were selected to survive and reproduce over and over again. And IT turns out that, you know. Planning how to bring down uh uh a deer or fight off of vicious ostrich or whatever is more effective than just sending random instructions to cheer muscles from that perspective with planning is a bit older than humanity.

right? Here's the thing that surprised me. Okay, recent thing that i've you know I got interested in kind of the why biological evolution works, which is somewhat related to why machine learning works.

And the question is, if you define and objective and you evolve things, you know you change the underlying rules of some program to achieve that objective. The thing that has been super surprising to me is you look at the pictures of how the objective is achieved. And it's achieved in this, incredibly, on nate.

You would never have invented that way to achieve IT kinds of ways. So in other words, this, you know, given the overall objective, if you ask what's the mechanism, can you explain what's happening? No way.

It's just that you are you my my analogy. Gy, here for, you know, for machine learning, for example, is what's actually happening in machine learning. Well, you know, you say, I want to build a wall, okay? You can build a wall out of brick.

You know, each brick is nicely ate, you know, shaped. And you can out of engineer the wall by by arranging the bricks. But machine learning is not doing that. Machine learning is instead finding kind of lumps of computation, kind of like lying around, like rocks lying around on the ground, and it's managing to find a way to kind of fit those rocks together. So that is successfully build something that you would consider to be a war.

or to be exact, like natural selection is fitting rocks together, and gradient descent is doing the same thing. But the rocks are on a slope.

Yes, I mean, yeah, right. But but the basic point is that the raw material are these things which are not built to be understandable. They just happen to fit in this. So that way, yeah, I think and that .

is where I was going with there, that like you apply gradient descent to make the AI models Better and Better at solving various problems and predicting various things. And along the way, you know, there are these little internal processes that find that they can effectively get where they are going by trying to keep something on a track. Behaving like a thermos stat and being a thermo stat is not being like a super intelligent planner.

But this is where the bare beginnings of preference begin to form inside the system is that there's some place inside IT where IT know the where we're in. Its like a few layers of um like building up of transformer players. I guess i'll just go here and stay.

Nor in its chain of thought processes there. IT IT has been selected to get to some destination, and he finds that the way you get to that destination is by, you know, modeling something and seeing like is IT off to left, is IT off to the right and stirring IT back on track. And this is, you know, this is like the the emo e, this is like the tiny earthworm level of wanting things. But IT is where things begin, and they may be much further than that along the the trajectory of wanting. By now we wouldn't necessarily know.

Well, the way I would describe IT is, you know, if I look inside one of these things that i've evolved, a which I can, you know, conveniently, i've gotten nice ways to actually, just because alive, what's happening, which has been very difficult in, I have a simplified version of neural that's where you can actually visualize what's happening, where you can do training and visualize the results.

And the main conclusion is, when you visualize the results, the way that the objective is achieved is onate and incomprehensible. And you, but nevertheless, you can see, yes, you know, if you look at every bit, follows every other bit in the right way, and in the end you can see that I did achieve that objective. And now if what you're saying is that in the achievement of that objective, some particular no train, whatever some particular rock you picked up, will have a particular shape unknown to you and which has certain inner sense preferences that we're not put in there, that the only preference was I want to make these rocks assemble into a war. The fact that every rock had, you know A A little pointy bit on one side is not part of what you are defining by just knowing you want to build that up into a war. So in a sense, there are there are coincidental preferences that get inserted just by the mechanism of what happens that you didn't put that.

So to be there's multiple levels here is the critical thing. Like when you look at A A flush seed dropped by a tree, evolution has shaped the sea to drift along in the air, but eventually come down and eventually planted itself. The seed itself is not doing very much thinking.

A spider is doing some thinking. A mouse is doing more thinking. Humans are doing thinking. That is so general that we can start to build our own artifacts that are carefully shaped. The same way that evolution builds are effects that are carefully shaped.

So with with a large language model, you have, on the one hand, the outer process that is shaping up to be a great predictor. And that thing is like very clearly staring a particular direction. It's simple, it's code. We understand the thing that builds the A I model, but the A I model built to probably have some fantastic tics, ally earn IT weird stuff going on in there, like human biochemistry only, you know, old we can see IT and we still count crypt IT. And then there's like the plans that IT would make if it's doing planning and IT probably doesn't least some planning if people are correct that they now play chess, you can't play us without doing some planning or something like planning or something that of which has the teleology nature of IT makes the smooth because that needs to you because of how that least the final outcome. So it's like the plans that IT now makes don't need necessarily need to be very ornate, but there's probably a fantastically ally, weird ornate planner in there if there's a planner .

in there at all. Well, I I think OK several points. I mean, first of all, this whole question about you said there's some overall thing that's happening and then there are no details about how IT actually works inside.

Mean, that's true of many physical processes as well as you as well as these processes that you are talking about being sort of intelligence rated processes. So I mean, you know, as you imagine some flow of water around you rocks that will make some very ornate pattern as IT carves out pieces of rock. Um you know the the overall flow may be over.

The river is basically going in this direction. The water has to go from the top of the hills of the bottom, the hill. But IT carves out this very elaborate ate pattern on on its way to doing that, for reasons of the details of how water works, and maybe how, you know.

the simple laws governing the water. But the water carves out a very complicated pattern suit to the bottom.

Yeah right. And so so I mean, I think the the what what's happening in you know I agree that as you give sort of all your specifying is sort of play chess or whatever else you are specifying some big kind of objective, what we think of as an objective, the details of what's happening inside we will not, you know, there will be aspect of that, that we are not in any way able to foresee a predict, whatever else.

So I agree with you that inevitably there are there are little you know that the mechanism inside is not one that we understand. The mechanism inside will not be. If we took apart that mechanism and we say, is this mechanism doing what we expect here? IT won't be. They'll be plenty of things where it's doing, what IT chooses to do or because that particular training run get this particular result, whatever else that particular picked up this particular rock to build that part of the stone wall, not another one. So yes, there are pieces inside that are definitely not you that cannot be explained on the basis of the overall objective we specified.

Now I think rather have a non simple relationship to they have. There is a fact of the matter. There is a mechanistic historical fact about how the complicated stuff got there, but not simple. And you're thinking and bounded quick terms like whatever yeah but okay.

But so so inside the A I, they're doing stuff that wasn't particularly anything we trained them to do. They're just doing stuff because that's how they happen to set themselves .

up because that solved the previous problem that the train to do. Or even if or even it's just like random or it's it's some input that wasn't in a training distribution and now it's behaving in some weird way.

right? But so so then if we say that internal thing is going to be the the thing that determines whether the car should drive left or right or something like this, then you know then that's going to be something which is not well aligned with the thing that we happen to give us the training data. And I don't know that's where you're going with this, but I mean, I would agree that, that's the case that if you say you start saying you is the I mean, you know is that sub goal something which we can understand on the basis of the whole goal? The answer is probably not like .

like the in internal preferences ended end up being the bizarre, complicated structure that does not like directly correspond to the outer training loop.

I agree .

yeah I agree. I I think that is a central worry um and and in particular so what so what .

that's my question.

So when he gets super intelligent that does he think that the builders were not very in control of .

and kills you all the second, a second, a bit of a jup there.

acknowledge this, that there a bit jump there.

right? I mean, so so the fact that there are unexpected things that happen, both, by the way, you one global thing I might say, I think there is a important societal choice, which I think maybe is what you are getting at at some level, which is between do we want computationally irreducible things in the world, or do we want to force everything to be computationally reduced, able. So what do I mean that?

I mean, making the government more computationally reducable would be a start. But maybe I shouldn't like diverted into politics that.

well, like, the governments are like machines. So, you know, they they like our computers. You would give them certain rules, and then they Operate accordance to those rules.

If they had fewer rules, and the rules were more understandable, would probably be a more livable society. It's not the dystopian worried about you could sure tell a story about a the stop a where you've got like large language models executing all of the rules. And you know no human there's like they can actually apply the road rules and no human even knows what all the rules are. Well, already nobody can read all the rules but now they're actually being fired to .

you right and sort of reminded of the theory um doubt story but let's not go there. I mean of if we have a simple set of rules you know does the problem is one problem that comes out of science i've done is that even when the set of rules is quite simple, the actual behavior will often be very complicated. And there will be aspect to that behavior that surprises you serving. If you say, let's lead a whole, have a whole society based on, you know, the code of hamari, which is written on one tablet, IT will turn out that most of the time to be a practical set of of rules. Those rules will essentially have computational reduced ability in them, and they will occasionally surprise on.

So I think it's not just the computational irreducibility. I think intelligence seeks out the weird path through government rules, through the universe and the laws of physics, through life in general. In many ways, the when you, the the way of getting the most money inside the system often involves doing some things that the designers of the system did not think of. When you, you know, when you know, every now and then you hear about the next person who luted a to exchange .

um you know I i've was .

on a facebook group of that guy some of you can I have already guess which guy i'm talking about uh but uh so some not actually done even talking about sm big man free like a much smaller crypto exchange. They found its code a IT.

But it's not .

that the code defined the laws of physics. It's that to make the most money, that guy found the a behavior of the code that the designers did not have in mind. So there's a way that the automatic a gets surprising just because they're computationally hard to skip over the introverted steps. None other people thinking about how to break the system from inside, and those people make them systematically weirder.

Okay, but so I don't dissolve great OK, but but my point about society in general is you can be in a situation where you say I want to understand all the machinery around me. You know, before the industrial revolution, when we were using forces and donkeys things, most of the time we didn't understand the machinery that we were using. You didn't understand.

You knew what you could get the donkey to do, but you didn't think you knew how the donkey worked inside. Then, you know, a post industrial revolution, it's like, oh, we have this card here, this lever here. We can understand what's happening inside the machine, and that kind of we can.

We can imagine a world in which every piece machinery we use, it's understandable what happens. Unfortunately, that world is very stereo, because that world has you imagine that we could know for humans everything. Imagine that humans were so controlled that we could know everything that humans are going to do.

Humans are never going to do the wrong thing. They're always going to do, you know, just the things that we programmed them to do. Well, you know, forget free will, forget kind of, you know, a value to leading life, so to speak. It's just, it's all kind of, you can, we can tribute to jump over. And life .

should not be computational reducable. The only way should be able to what IT does this by going through .

the intervening steps. indeed. So that but given that idea, given that idea you are giving up.

So you know, one notion is the only machines we should have in the world are compensation ally reducable machines where we can know what what they will do, and we should outlaw compensation of disability. We should say no machine. And that's outlawing .

largest language models. That's even outlying AI chess players.

Yes, but but I am asking you, i'm saying as soon as you allow computational reduced ability, you allow the unexpected. And what you're saying is there is a chance that they unexpected kill us all. And no, no, no.

no. I expected to systematically kill us all. I'm not being like we don't understand IT therefore might kill us. I'm being like there are aspects of this that we can understand, thereby predict that I will kill us. I don't I can't predict intervening steps, but I can predict .

where IT ends up. So so so I mean, one way to prevent that would be to say, outlaw anything computational, reduced, able, and just say, we must understand every machine we use OK.

So we're gonna a biochemistry. I don't understand all the organic molecules making .

up my hand absolutely cut. You know outlawing biology doesn't really cut IT. That's you. From the point of view of where we're going ethically, that would be in the category of, you know force the universe to be to be boring, so to speak.

And I I will say that my transmission politics are that law should maybe be boring. The government should maybe be boring. The part of the system that is telling you stuff youd are supposed to be doing for the good of society.

Maybe I want that to be predictable. Not my hands by a chemistry, but the part of me that's talking me like a person and trying to give me orders. Maybe I want that to be a simples .

for I suspect that this is impossible for law to be computationally reducable in the same way to be a bit more technical that if you are, you know, doing math and you're set saying, I got these actions, I want to have the integers and nothing but the interest, right? We know that there's no final set of actives that gives you the interest and nothing but the interest.

I mean, if if are bidding second order of logic is meaningful at all. But yes.

well, that right we're saying that that but without hyper computation, you can't kind of soop in. You know if if we're just using of standard, we just saying we've got these exim x lus y no equals y plus X A T 在这儿。 Let us, Scott, the world. So with those actions, ms, so that they allow only the intros and nothing about the interest. I claim that very similar to saying, let's have a system of laws that allows only these things to happen and not others.

I mean, that's not the purpose of the law. The purpose of the law is to interact and is is to do predictable things. When I interact with IT, like the doctrine of and not proudly correctly started, decides in courts where they try to repeat the previous court's decision, is not that they think the the previous court is as wise as possible.

They're trying to be predictable to people in need to navigate the legal system. That's the foundational idea behind previous court respecting past courts is not that the past court is optimal, is that if the past court didn't really screw up, we'd like to just repeat this decision forever. S that the system is more navigable to the people inside IT. And my, so my trans humans politics says that, you know, like maybe among the reasons why you don't want super intelligent laws is that the job of the laws is not to optimize your life as hard as possible, but to provide a predictable environment in which you can unpredictably optimize your own life and interact with other unpredictable table people while predictably not getting killed.

right? I think I think the point is, you know what you're saying is to for us to lead our lives IT is you know the way we lead our lives. We need some amount of predictability. If if IT wasn't the case, you know if every moment, you know space was distorted in in all kinds of complicated ways, our our little, you know, bounded minds really wouldn't be able to do anything. We be be.

And I think, by the way, and you know, as a practical matter, is somebody who ceos, a tech company, I can say that countries, whether there is a rule of law and whether there is some predictability to what happens, are ones where there is much easier to do business, the countries where it's completely crucial and you know, depends on what somebody happens to say that day over speak. But so yeah, I mean, I agree that my claim is that even, you know predictability only goes so far because the world will always throw IT throw that you know things at you that have never happened before and where you know IT is an inevitable feature of the computational reduce ability of the world, that there were things that happened, that haven't happened before, that were unexpected. And then the law has to, for example, you know, say something about those things, even though they didn't happen before. In just a .

couple of months is going to be twenty five. That's never happened before. Indeed, the stars have never taken on this exact position before.

That's true. And that's why we have models of things. That's why we you know IT is not everything doesn't work just according to the cash, so to speak. You know we make models so that we can figure out what to do in a situation that hasn't happened before. I think, I think we should come back to the how do we go from this. I think we agree that there's sort of unpredictable things that inevitably will as soon as we allow any kind of computational ability, we will have our systems do things are unexpected, how we have to go from unexpected to kill us.

Yeah, the things i'm worried about are not unpredict are not unexpected, like random noise there, like chess moves that I can't predit exactly in advance, but that lead to a predictable and humanity loses the chest game.

Okay, so so let's understand that. So what you're saying is, you know, independent. So there are certain, you know, I say inside some random A I system, the A I system surprises that might suddenly if it's if it's a generating text IT might suddenly have the word delve in there or the word tesoro in that which we didn't expect.

might even or or you know, like the monitor you're looking at might suddenly devolve in to random pixel, the most surprising possible outcome.

all right. Like like your image on my screen is occasionally glitches and turning somewhat random that's already happening, right? So um the the um and so so you so unexpected things are happening and those unexpected things might be a big deal IT might be the case that i've got my you know it's nighttime and i'm using some VR system to be able to drive and suddenly, you know, my B R system turns into random pixel and I I crash my car.

The amount that random, the that random noise can do usually ends up being pretty limited, though crashing your car is once then crashing your car into a senator who voted against open a eye or something is quite a different thing. You need to give your car very exact instructions to get at the crashing to a senator.

One of the things that sort is a notable feature of. Science things that i've done is, you know you look at these computational systems and they are doing definite things. They're not just doing random things.

It's not just saying, oh, you can't just that's that's a mistake people made of you know, forty years ago stuff I was doing. They just sort of said, oh, it's just noise. We don't care about that.

It's it's not just noise. A lot of that you know that structure which we happened to not understand very well, happens to me very important for what nature does at a terra. So it's not we shouldn't just call IT noise.

yes. But even even among things that aren't pure nose is still very rare. I could take a salary authority on on the border between the order and chaos that exhibits lots of interesting behavior in which further patterns can be found feeding into the steering system of an electrical car and crashing into a tree.

But one question to a senator. So the stuff that is purpose that like, does you a lot of damage that's a very small fraction of the space IT has to be selected somehow. It's not just that it's order instead of case, it's a particular order OK.

So you're saying that the point is that, that there are things which if you selected the that part of the space, they would be there would be things that will you know, it's like, like, like if you have a preacher in a prey and you are Operating on the natural selection, the predator will gradually evolve to be more successful hunting the prey, for example. Yeah, and you're saying that for some reason, which I don't understand, which I once to understand for some reason, you are saying that IT is inevitable, that AI systems will become like the Better predators wouldn't back to our humans.

not literally not a ball, but beyond our current grasp to have not happen. Most super intelligence are like that. Some super in intelligence es are not like that. We don't have the technology to conserve one of the few super intelligence enses that are not like that.

okay? The natural world, for example, does not care about us. Agree, the natural world does lots of things where you know if if you put in you know the bottom, the ocean you are put on on the moon, whatever else most of those places we don't survive.

no. And the natural world will be is is an unrelenting in threading things at us that, uh, not good for us. Now is is is that the type of type of risk you're talking about? Is that .

something more like how when a human build a skyscraper, most of the ways we build skyscrapers are not good for ants to eat? Um i'm trying to like figure out like a good anal t like you know and have more trouble living in living inside of skyscrapers and inside of trees, like maybe they are still managed to be inside of skyscrapers anyways, but you know it's not the same as like turn might managed managing to live inside trees. They do stuff with the university that is using up the matter, using up the energy they could want a very broad variety of things that all leads them, using up the matter and the energy, and very few of those possible things they could be steering towards our best steer towards, maximum steer towards, by leaving a space for the humans to survive at all, let alone building the happy galaxies that are the primary .

table stakes. Okay, but so, so, I mean, when IT comes to nature, nature is just doing what IT does. Humans managed to carve out a niche with respecting nature. Nature is rather unkind in general. You know, IT has all kinds of forest fires and unhurrying es and all kinds about the .

only sort of trying to kill you. It's like trying to evolve more antibiotics. Stant bacteria is not trying to kill you very hard.

Well, okay, yes, it's the part.

That's the parts we're systematically selecting for. More dead humans are .

relatively small corners. And natural world, I mean, the natural world, we know just by virtue of natural selection, that the things which of which there are more of them end up being the winning things, so to speak. So you know, for instance, viruses, you might, as you say, evolve to the I mean, you know that the virus is not by any purpose but just by the Operation of natural selection. It's like the you know the winner wins, so to speak IT. As the thing with more viruses wins .

like whatever is that generalize this purpose, generalize this passion. Um optimization you could maybe call IT planing ness stingless natural selection has that thing IT has, you know like non random relationships between the action and the outcome.

O but so so now we've got A I and we we've got kind of the things that might do and the things that might do as a result of perhaps these these kind of unpredictable elements inside IT that were not constrained by the way that we trained IT all the way that we set up the technology. And now the question, what your asserting is that many of the things that will happen are things which will kill us. Basically, I don't I want to get .

Better and Better at planning, Better and Better strategy, Better and Better at invention, but not to end up very precisely, you know, like so precisely line that there is even room in its world for humans, fun consciousness, people caring about each other, that sort of thing. So i'm expecting the planning to be very non random, but i'm expecting the place that is the destination to which IT is steering to not fall within the you know deliberate control of the builders.

Completely agree with you that the the planning in horizon is very, very important for agency and and intelligence. But one thing that we're talking about here, I think a little bit is instrumental convergence. So that's this idea and a lot of traditional agi x risk discourse that super intelligent things will be super coherent.

And that means they will have this cannot aliza of their intermediate goals to do a particular thing, which means if they are super coherent, we can make reasonable assumptions that their sub goals might be to gain power or to get. But there are people who say that there seems to be a weird relationship between intelligence and coherent, such that the more intelligent you are, the, you know, the lower your coherence. And what that means is there's this huge diversity of things that you did, you know.

And still, when you are saying and evolution, there are all of these suckers shes. There are all of these literal ways that you can traverse through the intelligence space. So if that is the way that intelligence is, how do we know for sure that going I lead to a bad outcome when IT when IT goes up.

If it's not coherence IT IT doesn't do stuff and open a ee throws IT out and build A A I that is you know doing stuff and is more profitable until everybody's dead like the the stuff that stamps on its own foot and goes around in circles is not the most profitable A I. And they will build a more powerful AI than that.

right? So it's kind of an artificial selection. It's like you could do artificial selection on the eyes, you can do artificial selection and viruses. You know you are saying that artificial selection that you know where you select for a thing that is what is you know, you might imagine somebody might select for a thing, somebody might decide there might be, you know a death wish cut which decides to .

build a very powful AI. I'm i'm not .

concerned about them. Okay.

i'm concerned about OpenAI. And if OpenAI shut down tomorrow, i'd be concerned about anthropic. And if anthropic shut down tomorrow because concerned about meta, i'm not concerned about that.

I'm interested by your your pecking order that's the kind of the tropic levels or something or the the apex predator so far is OpenAI. And then when I went, okay, interesting to hear your your trophic local analysis so as speak for a eyes. But but independent of that, you're saying your concept is the thing gets Better somehow achieving goals, whatever that means, because I think the abstract notion goals is messy.

I mean, I can delve into a recent example. Oh, no, I revealed myself.

Now we have to wonder, right? Go .

ahead. So GPT a, oh, one. I think that was called there, always finding a new weird name and then just using version numbers like saying people um so GPT o one is a recent one that was trained harder to achieve goals rather than just imitate human. They asked to do various things. Let IT generate out different chains of thought for trying to do those things and then told IT to do like, like shifted IT to be more likely to output the successful chain of thought in retrospect, until I started out putting successful chain of thought in the future. If you now look at GPT o one, IT seems to have a little bit more goal orientation, tenacity, the scary kind of property problem.

It's it's not just saying you let's have the next token, so to speak. It's saying let's follow page, see where they go, backtrack if they going in the wrong way. Sa, I mean.

in just that by generating the next token, as I currently understand IT, it's like that's just that sometimes the next token is like, okay, that's not calling anywhere. Let's go down a different wrought yeah right.

It's it's it's, it's do it's just like if your path finding on a grave, you could just say i'm going to i'm gonna just you pick steps, set random and just pick a particular path or I can say i'm going to go probe different pods. I'm going to try multiple pods. If one of those pause doesn't make IT, i'm going to backtrack.

And I even tried different place. I think that is currently doing that in a in a linear serialized way. Although I could be wrong because open, I doesn't reveal tons about its architecture, but it's like doing a human style is human. I considers one idea at a time, but sometimes says that a terrible idea. I'm going .

to different yeah.

it's not running in parallel.

which is important difference. But yeah, I mean, you know I think I could you know this gets us into engineering details of AI tecture res, which we could also talk about. And this is you know, we're on a we're on A A venue where that's what people talk about, but let's let's maybe not go there at this moment, but what .

people do sometimes say, it's just predicting the next token in what they don't realize this. You know that covers a whole lot of territory, including going download of different branches.

Yeah and I think you have to have the outer one that says try branch, backtrack from the branch and so on. It's thinking .

that internally, IT is thinking out loud, as I understand IT, that was a terrible idea. I should try the others thing instead.

Well, I think you know there's a honest which is in one case, it's just saying. Take what you have so far predict. The next token in another case is saying, you know, for example, I want to end up at this point, you let's try this path of predicting if that doesn't work, the harness is gonna throw out that that set of things you did try again. So that does doesn't really matter whether it's external or think .

it's yeah but I I well, I don't know. I think that some people are imaging an external harness and I think that matters in this case that the harness is internal. As I understand the architecture that a one itself is saying out loud, i'd Better back up and try a different thing. There's not outside system which says that as as I think, yes.

I mean whether whether that's achieved by training a neural that or whether that's achieved by having an external system. I I don't think it's onna matter to your argument but will .

see I mean I think the super intelligence are doing IT internally but um sure we we can pass on from here but anyway so like no one has more of the scary stuff they were trying to test that to see how dangerous IT was and where the things they tested IT on was they capture the flag scenario where IT was going to attack a um you know honey not honey but uh A A target computer theyd set up and try to retrieve a particular piece of data on the computer but owing to a configuration ation error the programmer had made like one of the capture the flag targets just didn't built up properly so the system probed around IT found that the outer system that was setting up the um capture the flag destinations had a port exposed so IT IT proved that port and stead of telling the system to just boot up to capture the flag destination and told the system to boot up to capture the flag destination directly print out the flag the piece of datta was supposed to get so IT was given an impossible chAllenge and that is not only like fix the bug in the chAllenge, is just like directly seized the goal and and that is new and that is a result of training the system to have chains of thought that succeed rather than chains of thought that are human.

So here's one thing that I I have, you know maybe as as an additional point. You know my intuition about what computational systems can do and how they do, how they do IT has really changed as a result of spending. Well now, you know, many decades sort of actually expLoring what computational systems do.

And I was very surprised. I never expected that computational systems in these little, whatever they are, a little automatic, whatever is, you know, they do all kinds of things, which to me look very clever. You know, it's the fact that the the fact that you are many, many times, even the thing I was doing literally last night, I was like, i'm convinced it's not gonna do this. And IT managed to find, in fact, a shortcut for doing that.

I had one of these event systems are just like strict solar auto.

These particular ones are involved systems .

Carry out of less confused yeah right.

So but but even in in the other case, it's mostly exhaustive searches where you forget evolution, you do an exhaustive search and there's some rabbit hole may be somewhere that you never expect IT was there. And when you look down that rabbit hole, it's got all kinds of things going on that seem incredibly clever to you. You never guess.

You know, we found, I mean, in in wolf language, we have tons of algorithms that were found by such exhaustion of searches and where if you look inside, it's like, that's very clever. I don't understand what earths is doing. So you know my intuition about kind of what the amazingness of the fact that found this know to capture the flag, whatever that wasn't one that I had imagined.

You know, I live that every day that doing that doesn't surprise me. And I feel like that, you know, now should IT scare me. Maybe, you know, I need to understand your argument to know where that that should scared me, so to speak.

IT depends on how powerful IT is um a chess player maybe like stockfish sixteen superhuman chest player that anyone can afford um may make chest moves that talk you and surprise you but it's ultimately still just playing just that doesn't generalize IT doesn't go out of distribution. The thing that makes like vastly superior moves to you and also like plays on all the game boards you can play or like all the like survival critical game boards you can play or even just you know all the game boards you can play, I suspect if not that hard. No, a fans, the same applies to me.

I never play game, so i'm terribly well. The metafor play chess. And I was seven years old now.

I lost to give test, and I decided i'm done. I don't care. It's okay. Well.

of the scientific discovery game to scare you, we need to show you as the AI that comes, that finds a much more interesting cellar automaton than anyone you've ever seen before, using fewer computational resources and use to search for the ones you know about.

No, i've certainly thought about the scenario. In fact, I i've there's some things that i'm about to work on where I have every intention of trying to see whether N A, I can help me to figure out things that I would not be able to figure out by myself. I mean, so far that's not you know, I think that has me very well to find.

I mean, in a sense, the computer experience that i've done for the last forty five years are things where I am trying to get a computer to figure out things that i'm not able to figure out for myself. And so the question is, you, I I fully, I I don't know. You know, the the the the particular things that I happened to be thinking about are things about economics and so on, which is a field that has lots of human input, which is much more difficult to just say, let me do an idealized computer experiment. I'm trying to understanding, i'm wondering where the eyes can help me sort of disentangle what the essence of what's going on is there maybe, maybe, maybe what I suspect .

if you ask ChatGPT to do anything really complicated, you will be disappointed by the present technology. IT just keeps IT just keeps improving.

Is the is not? Well, my experience has been you with respect to b sort of doing using computers to do things. Yes, you know, story of my life has been trying to improve what computers can do, but it's it's also you know it's important to define the question that will be one where you know to help you can get right now will actually help you, so to speak. So it's it's but but see your point. I mean, i'm still i'm still trying to figure out how we are not even saying jumps shocks and even thinking thinking about shocks because they attacked .

people and so on.

how we jumped two sharks yeah right right. How we get to the point where so I take this point that you're saying the things that we currently consider sort of human only. No, this is an activity that we, you know, we try to get to this point.

We try to win this game. We try to make the science discovery, whatever else deak. One feature of games that I think is confusing is that their objectives are very well defined.

When IT comes to science discoveries, the objectives are much less well to find. In other words, if you say, explain the universe, figure out the fundamental theory of physics. I think I made great progress on that.

But many people might say the things that you figured out are not quite the questions we wanted to answer. So in other words, it's kind of you know scientific discovery is a good example or or for example, let's say um you know providing entertainment. You know what is what is success in providing entertainment? You success in chess is very well defined.

I mean, A A bunch of the difficulty in getting a eyes to produce images is that what is a good image? You know what is what is a good collection of pixel? right?

But so so my point is then, you know, when you define the world in terms of playing games, then yes, the A I could win in terms of playing games. But my my feeling is that sort of the path forward and everything is not as well to find as winning in games. That is it's a so it's not as well to find.

but you give me a chance I can try to make IT, you know only slightly worse to find. And it's achieving things in the outer real world if you know like the obvious thing would be if you told the a make money, there are lots and lots of ways to make money. The world is this enormous caul leave.

And there are so many different pathways through which money flows are different events that can happen. And then money lands on your bank account from the perspective of an AI. So i'm trying to understand .

what money is from an economics point of view, but that's a quite different discussion.

It's something that other people will trade your stuff for that you actually want .

um and lots an interesting theory, okay, but keep going, keep going. But you say that that an objective might be a game, like objective might be that some you might set up, you know like like a social media company might say, make my A I make me as much money as possible by having people cricket as many ads as possible.

I mean, that's one way of making money you could you know because also make money by looking a crypto orly defended crypto currency um or by calling up elderly people and convincing them to us you know that their kid has gone gone to jail, you know put the kid on the line.

The kid this sort of stuff is already starting to happen and you can extract money down that route uh you might get in legal trouble um but but the the point i'm trying to sketch is that the world is a very complicated place and the the A I to fears the one that understands all the ways to make money, that any human could understand to make money. And maybe some more ways than that because that A I is probably is starting to get to the point where can maybe build its own factories to factories are not very much more tangle than all the aspects of the world that affect money. In fact, I would say that in their own way, less complicated IT may be able to understand biochemistry.

A human body is a very complicated place, and in some ways it's more chAllenging than most of the ways that people have ever succeeded in making money. But another sense, it's like a much more constrained domain. You can imagine an A I that maybe wasn't super good at everything and still could start to answer questions about biochemistry. the. A I managed to figure .

out the alexa of eternal youth and start selling IT to be I mean, it's not going to .

be the element of eternal youth is going to be a thousand patches to hundred little problems and it's never going to and you know like passing any one of those patches past the fda is gonna take eight hundred years so you know .

oh but no but the AI is gonna figure out exactly what to file in the you summary basis for approval of the drugs so that IT will the ultimately kind of set up to you know to have the fda, the elector of eternal youth.

approved in less than in, you know less than ten years. I think you're going to need to brainwash those bureaucrats, not just persuade the there in the game board, right, the biocatalysts of the game or part of the larger game board.

And although just like doing things in the physical world, you know you think that it's going to be easy to build some engineering device, but in fact, there are you know there's an infinite chain of kind of messy things that happen when you actually try to deploy their engineering device well. And by the way, most of the time when you try and deploy IT, in the end you have to make compromises that you as human, have to decide.

I don't care about this. I do care about that. You know it's it's it's some but in case I think it's it's not self evidence, you don't just get to say I make this plan now I deployed as an engineering device in the real world and it's just going to work that .

some yeah but the smarter you are, the fewer tries you need to get IT to work. I mean, it's gna take a .

that's an interesting claim .

that would take a chimpanzee lot of tries to do what you've done in with your life. Sure, you're smarter than a chimpanzee did IT in fewer twice.

right? But okay but but so i'm still still having a hard time understanding. So so if there are all these things like like the A I somehow has decided that wants to make money, maybe because somebody you that what was set out to do that stage of IT.

I'm still talking about the sort of things where OpenAI conceivably tell the your eyes to do that. sure. yeah.

The thing i'm trying to point out is the extent to which an A I that they're just like making Better and Better at making money even legally in an open ended sort of way, is selling a you know nervous unnervingly broad class of problems along the way. It's really doing that. You know, generally there's to there's levels of generality.

There's as general as a human. There's less general than human. There's more general than a human right.

But and I suppose your main point is any goal you pick if if pursued as efficiently as possible, probably the humans do not add to its sufficiency. I mean.

that's that's a later point. The point I was trying to make cures that although games are not ill defined, you can take many features of the real world that are well defined and asking for any goal surrounding those features of the real world. Well, if it's a tangled sort of walk to get there will walk through many, many ill defined problems.

And this is where the ability to tackle ill defined problems come from. This is why humans can tackle ill defined problems. The extent we can do that is because to achieve the clear cut goal of having more surviving great grandchildren, you got to to tackle a lot of ill defined problems along the way.

okay. But but so how do we get from i'm still not at the under kilos. That's when it's smarter.

That's when it's smarter than the people who built IT that when IT starts to see options for wiping out to humans will party building its own factory infrastructure and then wiping out the humans suck and to wipe out the events before still .

the sound factories. But you know, as we've talked about, kind of nature does what nature does. IT doesn't care about the humans. You're saying the eyes might do what the ayes do, and they don't care about the humans.

but they do care about something else. And and if they're smarter than us, that is sufficient for us to have a very bad day.

Do you think nature is matter than us?

Um I think it's had longer to work. I think if you show me a cow and you say, build this cow, this takes me while, and a carrot do alone. In this sense, the cumulative optimization pressure exerted by nature on the cow exceeds the optimism zing pressure that I can build, can personally bring to bear, especially without computer assistance and building an invitation cow.

But you see, there are many things in nature. You know, if I look at some babbling brook or something, and all these little fluid flows and things like this, there's a lot that nature figures out that I have no hope of figuring out. So in some sense, nature is smarter than me.

I mean, then everything is smarter than I am. Not I I feel like they're ought to be a definition of intelligence, which captures the intuitive sense of the thing that is likely to figure out the guns that kill you if he wants to figure that out, even though IT cannot predict the exact details of the babbling brook or choose is not to right.

But the thing is that the babbling brook, the reason that you know that I mean that the detail of the battling brock, uh, is, you know, we say that involves lots of computation, but IT is intelligence not aligned with what we think of as being objectives?

Now sometimes you sometimes IT could be that that you know, that will produce this, you know this, you know what is about that does something terrible, or toronto, that does something terrible. And it's, and so then we might care about that outcome. And but, but, you know, nature is figuring out that it's going to make this water about whatever.

I feel that if we're going to describe a river as intelligence, that I want some different word to describe, the things that do the prediction, that do the three imaging of outcomes on to actions that do the steering well.

But what what does that mean? I mean, so so you're saying, how do we tell? Okay, you say humans are able to predict things. Well, now things are general.

What's not? Not all things, not all possible things in general.

But we have .

enough understanding to to build guns and that kind of matters in real life.

right? But so I mean, you know, there are things where you could say, o, this tree is opening its leaves because IT has you know that some cadia rythm that's determined by some chemical process and IT knows that the you know the sun's gonna come up and so it's making a prediction but yet you you're saying you're making a distinction. I mean that that happens to a ological system. I'm sure I could .

come up to know if the tree has evolved to have a little thing inside that, that predicts how much light IT will get in the next hour. I'm happy to call that a prediction. It's a small prediction. That prediction is not as smart as I am.

I bet I could do Better. Okay, but but so so you know, the question is, can inanimate matter have similar kinds of you behave in a way that seems like IT has a prediction? And and I I suspect and I myself .

in an animate matter that behaves like a has a prediction, it's not that the matter making me as animate, it's that I am animate and that you know, like made out of the matter.

But but you know this notion of a prediction, this notion that you have kind of a shadow of what is to come, that you have an impression of what is to come, that starts to require that you have this kind of notion of of no. You have to be able to distinguish what is an impression of what is to come. I think it's I think it's difficult, but maybe not relevant.

Mean, what what you're trying to a search, as I understand that, is that, that we'd got these we've got these two eyes that have goals that maybe we're not even goals that we define for them, that just goals like my random simple programs. They just do what they do. They you you could look at them and say, oh my god hh, that's following a goal. Now what you're saying is somehow the goals that these things will follow and be very good at pursuing our goals, that will cause us to get wiped out. And this is the step.

Most schools tangled up in the real world are like that. If you want to make as many paper clipsed as possible, if you want to make as many staples as possible, if you want to make as many giant cheesecakes as possible, if you want to make as many internet mechanical locks as possible, you use the atoms that are making up to humans. You intercept all of the sunlight using a dic's here, and you don't click, leave sunlight for the humans most goals, if you go hard on them, imply no more humans.

okay. So this is an interesting kind of almost formally definable thing. If we look at the space of all possible goals, whatever that means. I mean, that's a, that's a complicated thing.

To define A A language for goals, put a measure on the language, simplicity, know, measure, waited by simplicity, so that you can have an, an ultimately infinite number of goals like that. But the simpler ones get more weight. So the entire measures sums to one. I feel like this part is probably actually pretty straight forward, given the standard mathematical toolbox.

okay. So I don't think IT is straight for IT. I think it's extremely unstrained ghet forward. okay.

So actually the thing I was literally just working on yesterday has to do with biological evolution and the question of different fitness functions in biological evolution. Okay, so i've got these things that are making little patterns. One fitness function could be, make the pattern be as tool as possible.

Now the fitness function may could be as wide as possible, make IT be as close to aspect traction pie as possible, and so on. Right now I can see this got this whole, you know, I got this whole pictures at all possible parts of evolution. So for some simple cases, there might be, you know, a billion different possibilities.

But like I ve mapped out all the parts. Different fitness functions lead to different ways of expLoring that space of possible pause. Well.

yeah, you're trying to do different things depending on the different on the fitness function.

exactly. So another question is what are reasonable fitness functions and what consequences will .

they have with our fitness functions? Reasonable to who?

That's my question. I mean, that's that's the point. So there are ones that are that are okay.

So so let's give some example. So there's a fitness function that in this kind of space is kind of fairly smooth. There's a fitness function that says I want this particular image. I want this particular pattern that fitness function is a very different kind of fitness function that has different levels of vulnerability to IT. Otherwise, the fitness function that says more less i'm onna build the wall than I want IT to be six feeds high that's that's the fitness function that allows many shapes of rocks to be used to make that all the fitness function that says I want this particular war with this particular micro detAiling at the top is a much more difficult to satisfy threatened function.

Sure, like you can have narrower targets and then you need a more powerful planner to the narrower target.

right? But but I think what is tRicky and the same thing comes up. I mean, this is, you know, this is related to this whole observers theory thing that I studied and so on, also related to what happens in john al activity. When you're looking at reference frames and so on, you you're defining what's A A what's kind of A A, A reasonable way, what's a reasonable, what is a reasonable goal. Basically, you can talk about in terms of goals and joy, in terms of fitness functions.

So so if you mean like attainable for the search process, then for example, a freely rotating wheel is an extremely difficult goal for a natural selection to hit wikipedia. st. Three known cases where freely rotating wheels evolved, and it's atp syntheses, the bacteria flagellum.

And like one other macro and piece of micro anatomy, I forget, and it's not that wheels aren't useful in biology. Atp senses the bacteria jellon there. You know, these things are enormously useful, but it's just very hard to evolve a freely rotating wheel.

And and two of the cases, we know our cases where it's just like this particular molecules that happen to behave like wheel if great, atomy and happy. And it's, you know, very difficult, gradually, incrementally, on a pathetic ts rewarded on each mediate step to find the anatomy that can gradually develop into a rotating wheels. Much easier to have eyes is lots of things have eyes.

right? But but so the point that i'm getting to is you have made the statement that in the space of all possible goals, most of them don't have, you know, will crush out humans. That's basically what you're saying.

If we look at the space of possible goals, most of them don't have a place for humans. So what i'm trying to dig into is what do we mean by the spaceship? Possible goals? Other words, if we if we allow the goal to be um you know this particular arrange of atoms is achieved right then that then you know again i'm i'm claiming it's not so obvious what the space of possible goal means okay um I agree that .

if we want to dive into the subtleties, then there are all kinds of subtleties we can start listing out. And the thing I would point out is that sometimes there's a lot of subtle ties, but they you know don't end up filtering reality. The socialities don't filter reality. To give you what you want is, is where we're .

going to be going, I agree. But I mean, I think the statement you're making is that okay. So one one statement you might be making, but I think you would view that is the science fiction statement that you are not making is that goals that humans impose on the a eyes like make killer drones, that going take as much character as possible, or whatever else that those kinds of goals might have as a consequence of crushing out of humans?

correct. I think, I think they just don't have the power to determine what the super intelligent version of the A I wants. exactly.

no. But what you're saying is that in this, you know, when we're talking about what the inner ds of the A I are going to do, then we're in the space of all possible goals and we out of a place where we know what's going on. Whereas if we say the goal is take as much charity with color drones as you can type saying or whatever else, then we already know what those kinds of goals look like. And we also know those are goals which have the feature that we crash out the humans.

So yeah, I think that that's kind of like the hopeful fairy tail version of IT, where the, the, the, the punishment that has brought down upon humanity stems from humanity is obvious lack of wisdom that the author knew Better than, but you know, the protagonists not know Better than. And people are doing, people are actually doing things that stupid, and it's disturbing. But even if they weren't, that stupid would .

still making is essentially a mathematical formal statement. You're saying, you know, absent those sort of easily understandable and arguably destructive stupid goals types thing that even absent that, the goals that are in transit goals inside the A I that come from the fact that there are features of how the A I works, which are not determined by us, which are just what the features .

they were determined by the training program we created, but we were not in deliberate controlled them because IT wasn't predictable. It's not that magic happened is that we didn't understand .

right there some computational reduced ability story that leads to lots of unexpected things that we cannot readily IT could .

be computationally reducable. We just don't understand IT thera.

but but we could understand in that case where is that? It's a reducable, which is I think.

the real case. And if I was to have a hope and if I was irreducible then and but we could just like you know grind through a trillion Operations to figure out out we'd be fine.

Yeah, fair enough. But but the bottom line is it's done something that we can't we didn't readily set up. We didn't imagine this is where it's going. We didn't say, you know, you are going to you know it's the death wish. We are going to light out the humans so that we preserve the mounts and lions and whatever IT is um IT it's um you know it's say so IT is because of that in a sense unpredictable, almost random internal kind of generation of goals, as you would describe generation of goals. And you are saying that in the space of all possible such goals that you might make several statements, you might say, as we saml over many of those goals, we will eventually hit the jackpot, so to speak, and hit the goal that kills all the humans. No, no, no.

What you might, the jackpot is the goal that doesn't kill all the unions. That's that one is hard.

Depends on what your version of what your meta goal is. So yes, I understand your mimetic goal is to not kill all the humans. Yeah.

so, but in in calling of the jack pot, I was talking about how rare is IT. And I think that the thing that doesn't kill everyone is rare and you don't need to look and you need to look very hard to find something that kills everyone OK. You don't take for at all.

okay. So your recession is that these I mean, it's one thing to have human defined goals like make money, take territory, whatever else, right? It's another thing to have these incomprehensible to us humans goals that somehow inside the A I. And it's not the .

incomprehensibly that that scares me like maybe IT turns out that there's one picture cut. Maybe maybe it's making diamonds. I can understand making diamonds, but that kills everyone as a side effect. So it's not comprehend bly that hears me.

I understands it's not the incorrect sibly. It's the fact that you didn't determine those goals. Those goals are things that we're emergent goals basically inside the A I mean, you say that .

that nobody to control them and then they ended up in a scary place. Like if aliens want to combine, give us an ici, then this seems a bit unnerving. But fine, it's not that I want to be in controls that I want to live.

Yeah right. So so the point is that there are pieces inside the A I that create, so of that create things that appeared that like goals for the A I that were not things that anybody had control over. There were things that, I mean, they had control in the sense that they chose that training sequence.

So whatever else, but they didn't foresee what those, what those consequences would be, right? And your statement is a that with high problem. Take those on unforeseen kind of effectively goals that developed inside the A I, that those unforeseen goals will tell us all basically.

yes. And furthermore, this stays true. Even conditioning on the apparently nice behavior of the AI during .

the training phases. okay. But my point is that i'm trying to understand, you know you know, when I look at all these little computational systems that I look at, I could tell a story about how each of those systems as a goal. It's trying to get as much you red red structures to you know take over the the whole space. So it's trying to get you know trying to I can tell stories about all of them and you know.

what are they river stories or are they spider stories or mouse stories or human stories? Like what what degree of internal what did degree of internal optimization is the system exerting? Is this the story?

I'll tell you a meta story, that haps is an irrelevant story. Okay, but IT maybe is useful. So years ago I was studying molester shells at which, you know, they grow in little spiral patterns, and so on. I had this model for moscheles, which which in which there are different parameter res for moscheles.

And I was wondering, through the motions of the fill, all the possible you values of the parameters, is there a molests shell that corresponds to any possible setting of these promoters? So I had this whole array of pictures of all the different shapes you could get. And then I thought, i'm going to go to the local natural history museum, and i'm going to go see the curator of masks.

And i'm going to say, can you find, you know, do you have molasses that are in each of these shapes? Okay, so we spent an afternoon, you know, their collection of millions of my last, this, this very knowledgeable son picked out one moscot another, every molest could pick out. He told me a story about IT.

He said, this move is this shape because IT wedges itself in the rocks in this way. This more because this shape, because IT brews its eggs in this way, and so on. We're putting these shells down on this, on this array that I printed out by the end of the afternoon, we told every square.

But every square had a story. Every square had an acute purpose, even though probably at in some bigger picture. It's like, well, these different molecules, you know, because of the details of the genetics, they happen to produce this shape, and then the organism found a way to use that shape and so on.

So this thing about the computation of purpose, I think, is pretty tRicky. I mean, I think that that this question of whether. Mean again to to repeat what you're saying, which I think is you know I think it's a very interesting claim. I just don't know it's true that you know that basically, if you look at the space of purposes that somehow this space of almost you know innumerate purposes, Randall chosen purposes, whatever unforeseen purposes, that as you in a sense you know, given that you've said we're going in this direction and let's say you've optimized getting in that direction that most of those optimizations crush out the humans. I think as you like.

if you optimize hard enough, like humans cannot easily replace, cause we are not smart enough ourselves to just look at all the work, the natural selection put into the cow and say, here's my improved cow. So we still use cows for things. We cannot build Better cows. We are not that powerful.

The thing I am scared of is more powerful than cause party is more powerful than the cumulative optimization work that natural selection put into building a cow IT can build a Better cow IT can build a Better human from its perspective, like anything that would use that human for can build something that performs the same function as a human, but more efficiently than that. IT has no need to use us to generate power in pots. IT can build more efficient electrical generators.

So we don't get the matrix scenario where humans are kept alive in pots to generate heat because there is more efficiently generators. In fact, lots of people have more efficient team generators than that. But but like the general thing, like why would you use humans to generate heat? You just wouldn't. And the same applies to all the other local functions that humans can serve for IT.

The two assumptions are making here, I think. One is, what is the space of purposes look like? Another is which I am still not quite clear on this notion of optimizing for a purpose.

So let let's go through that one because we've talked about there are things that open according to mechanism, there are things that the best describe as Operating according to purpose. And what you're basically saying is that through some process, things that might have been describe by mechanism, but which which would be described by purpose, that somehow the mechanism is ground down. So that is somehow very the the myanos mechanism is no longer there that you it's like you're pulling on a string and the string has become completely taught.

So there was a recent thing where one of the um like one of the test they were running on, I think claud at three point five dot one if i'm not mistaken, um they gave me a task and um like IT was an agent's sort of task. I think that was controlling some of possibly some sort of agent and minecraft or something.

Well, agent like it's controlling some sort of body and minecraft something um and at one point IT like stopped doing the task was given and IT my brain wants to say listening to music, but that can be correct. But if you know went often did something else. So this is not desired behavior from the standard by, you know, for what anthropic thinks that users want to see, some of the users would be happen to see IT.

But in rapist doesn't think most of its users want to see the little meander in the river. That's not a maxy efficient river. Anthropic going to try to train that behavior of the system next time that I can be more useful to enthroned.

sure. But I mean, so so what you're saying is that that what was you know, once you've defined a purpose, once you have defined, you go from this initial state to this final state, you are saying that the process of training will gradually to make a daughter, and daughter will make the the path more moderate. And then you're claiming that OK seeing several claims OK. So first point is but what matters is not the initial final end points defined by the original trainers, unless that stupid, so to speak. What matters is intermediate points that would defined that arose in the, in the actual, in the sort of the the unforced inly rose in the in the, in the construction.

the a, the inner, the inner optimized that was, you know, snapped tight inside the A, I that arose to deal with the outer problems. Because IT is a lesson of history and observation that the inner optimization does not match the other optimization criterion, even if the outer criterion is very simple in a way.

Okay, so you know, the outer, the the outer end points are defined. But somewhere in the middle there was an unforeseen jg that happened. And then the thing tipped itself on that sort of unforeseen e jg. And that. And you claim that that tightening alone to the unforeseen thing, even though the overall objective, even though the, you know, the constitution of the air was great and IT was no, the ethics were defined perfectly at the outer level, that there is an internal thing that would would an unforeseen internal thing that will be defined where IT sort of patents itself to optimize for that internal unforeseen subjective. And that typing itself is the thing that will kill us all by sick.

I'm not sure that we have agreement on the scenario that I was trying to describe. So if you look at natural selection, building humans, natural selection is optimizing on inclusive genetic fitness. IT is optimizing on number of surviving grandchildren.

If you try to talk about like great, great grandchildren, then the correlations to great, great grandchildren that aren't to grandchildren are probably not very exploitable by natural selection. So you might as well say it's just optimizing for number of surviving grandchildren or the numbers of surviving great great grandchildren is or whatever. The distance such that you know anything past the horizon is no longer like exploitable to natural selection after screening off the interpreted steps.

I'm putting in all the cafe as because you did seem to want to all the cafe as anyway, you've got natural selections out. Criterion number of surviving grandchildren. It's actually pretty simple from a like genes I view it's inclusive genetic fitness.

It's not just how many kids you have that how many kids are relatives have as how many kids kids have um it's how many copies of the gene or end up in the next generation in virtue of that genes function. And then, but now look at humans. Do we want to? Do we want entirely, solely, purely maximized the amount, the number of copies of our D. N.

A in the next generation? Are we lining up? Are our men lining up at the fertility clinics to donate genetic material?

Would you be happy to see everyone, you know, dying agony as long as you ve got to make a million copies of your genes inserted the next generation? I don't think you would. You care about other things you don't even really care about.

This exact criterion includes urgently fitness at all. People didn't know this criteria existed until just like the last couple of centuries. Well, just the last century.

Even if you want to talk about like the modern exact correct definition instead of earlier vae definitions, we don't even, we not even know where the heck the outer optimization loop was aiming. We have figured that out. You know, like just the last few centuries, we had no idea was going.

We had no idea where we were pointed. And if a sort of first order gloss on what sort of almost happened is that we ended up pointed at things that correlated with surviving grandchildren in the ancestral distribution, like food. You know, you gotta eat food, or you won't have a lot of surviving grandkids if you don't eat food.

But what kind of food, if you were an alien, and especially in alien who'd never seen natural selection layout before before imagine going like, uh, well, dealings are clearly going to want to eat things with lots of chemical potential energy. They're going to love the taste of gasoline or you know, they are going to love like the taste of the closest thing to gasoline that they could managed to eat on a successful basis. What the humans actually like, ice cream.

And you could say that because ice cream has more sugar, salt and fat than existed in the ancestral environment. But, you know, it's got even more sugar and salt and fat than ice cream. Honey poured over bear fat with rock salt.

Sprinkle about that. And that, to most people, does not taste as good as ice cream. You can melt the ice cream, and most people prefer the frozen ice cream.

Imagine being an alien who'd never seen all this plays out before and trying to go from inclusive genetic fitness to, ah, you know what, the humans are gonna want ice cream. We wear condoms or take birth control pills because those things were not available. The ancestral environment. And in the ancestral environment, if you made a thing that enjoyed having sex a bunch, you didn't need to make sure if they didn't to take birth control pills or put on a condom in order to have to reproduce.

So we don't care about the birth control pills in the content, but in general, the sheer illegibility of the relationship between ice cream or human moral philosophy about helping other people, the relationship between that and the actual thing that evolutions blind black box hill climbing optimization was targeted at in the outer optimization loop, the very simple criteria, inclusive genetic fx. The relationship between inclusions, ic, fitness and ice cream would be so hard to call in advance for an alien who'd never seen evolution layout before. That's about what I think happens to the people who build A N A I.

That is, you know, maybe going to self improve and become super intelligent, which is itself a whole in additional bag of worms that we have never seen happen before. You seem like they sort of like, well, what sort of drugs to humans voluntarily take, but that's not really the same thing at all. This year, eligibility of ice cream inclusive gitic fitness is what I think happens to OpenAI.

I think that they start with some things that is like tragedy fitness. They try to train their way to do that thing. And then IT goes through a whole bunch of weird stuff in the process of bootstrapping itself to super intelligence, much like humans went through a little bunch of weird stuff on the way to inventing world philosophy. All these additional steps, I could talk about the tribes, the structure of the tribes. How related are you to the people.

the tribes? I think you are coming back to the same point that, you know, you have some of objective inside the A I IT has subjectis that you could not foresee.

not sub objectives. The humans have an outer objective. The A I house and interconnections is not that the interpretation. Cof is a sub objective of the outer objective. They just end up, you know, this can .

objectives. We say we could described the AI does things. And if IT if is set up to be a thing that tries to given that that is identify a thing that wants to do and I hate using the term wants to do, but um there's .

for .

whatever then IT is you know IT has its its nature is to try and do that as efficiently tly as possible. Let's just assume that the case that IT has picked this random objective, which we didn't foresee and which was the you the ice cream objective or something that you know not foreseen, it's picked that objective now it's tightening ing things to get to that objective mechanistically as efficient as possible. So now I think your statement is when IT does that, when IT does that timing, the chances are that, that tightening will cause you to wipe out humans.

It's I think it's a fact about so if you like, fixed the objective paper clips. I think it's the fact about the universe itself that the states of the universe containing the largest number of paper clips do not contain humans like I don't think I don't think that's a quark of its of its particular style of cognition. An I think it's a fact about reality itself that the most efficient pathways through time to the largest number of paper clips .

don't have humans in that. Okay, so this is probably the case. You know, we haven't seen a lot of you human life, life on other planets.

In fact, we say none. And you know, IT is probably the case that are all the things that could happen in the universe. That particular things that would happen on this planet are an absolutely infinite, tasty slice.

So in so far as A I is kind of going to the maximum entropy in some set state, where is picking maximum tary in a different sense. So what I mean by that is, of all the possible, I mean, i'm thinking about the sort of low definition entropy of all the possible objectives, of which there are many. There are infinite set of possible objectives, infinite set of possible things that can happen, infinite sets of possible rules that can be used, whatever else that most of those rules do not have humans in them. Most of those.

it's not the rules that don't .

have humans in them.

It's the things that maximize the rules over destination and .

points story. You're talking about maximum zia, although I don't really think that matters to this argument. I think the point is that most, most sort of ways the universe could be set up, whether it's maximize or not, most ways the universe could be set up do not include humans. And so in so far as the A I is, you know, so, so the argument has to be, I think, most things that the A I might choose to, you know, might in the A S vision of the world, most vision, most possible visions of the world, don't include humans .

and don't include humans as a way point.

Yeah, right? Fair enough. But they don't include humans. And they they, but now your claims, but the A I, by its OpenAI ness, so to speak, has been set up to optimize itself, to get to whatever IT thinks its vision is as efficiency as possible.

And so you're saying that that these these vision, the I could have a vision that has so of, let's imagine, IT has sort of enough kind of freedom of thought. IT just has these visions about what the world might be like. But now you say the A I, unlike anything else we've seen before, is gone to pull itself towards that vision with incredible force.

I mean, we've seen humans, but but is so, so we have seen things, we have seen humans, but where we're worry it's going to pull harder than that.

Yes, right? So it's gonna pick. It's random objective that we can't foresee. And IT might be about paper clips and IT might be about something much more incomprehensible than that. We might even be an objective that we completely down on the stand. We can't even destroy.

but it's trying to do well. I mean, in principle, you know, you can always go to the mechanistic explanation, but they could be hard to do something pretty weird. sure.

Yes, right. IT could be do something for which there is no short human explanation, where the only human explanation is something that goes into mechanism, right? okay. So then it's it's doing that.

But now you say by its nature, as a train day, I so to speak, IT is trying to title that hot to the point where he gets to that objective. Whatever that objective is. IT picked that random point.

Take that random target. And who knows how I picked that target, but that target, then IT is trying to get to that target in the shortest possible path. And you say that in the in the process of defining the shortest path, that that that that shortest path will make IT, you know, modify the world in a way that doesn't include humans. So to be sure.

like IT, if IT IT wants paper clips, if instead IT spends a bunch of resources on theme parks for humans, that will have less paper cups. So i'm supposing that somewhere in its preferences is at least one thing that I can get, more and more of which could be like an object like paper clips, or could be probability of keeping a button, a single button, press for longer and longer. But if it's if its utility functions got like two hundred different things in IT, no, he takes one thing that I can get more and more of by expanding a bit more energy, or more more probability of by expanding a bit more energy. Moderation for want to use all the energy.

So one thing that is peps interesting. I mean, you know, there are things both in physics and in human society. Well, there is an objective. You just try to get more, more, more, more a bit. And somehow most such things don't last.

That is, if humans say we just want well, or take a physics example, you're trying to get, oh, I don't know, more and more rots at the bottom of the ballet well after you've got enough rock at the bottom of the value built up this whole, you know, valley floor that's built up to half way up the mountain. So in other words, most things were you say, let's just pull in more and more and more of this somehow. I mean, my intuition would be, somehow that doesn't lost.

I mean, if you build a dice sphere around the sun, at some point you have completed the dice sphere and you are intercepting all the solar power and you cannot build any more dice sphere.

That's sort of thing. Well, yeah you could say that. But I mean I I think that the um you know my point is if if what is happening is the A I is you know successfully achieving some objective, that is to find out my intuition okay about about this argument is that there is a certain apparent static miss to this argument that isn't correct.

That is what you're saying, is there is an objective. It's gonna get that objective. That objective is gonna rush out humans. But somehow that feels like you've defined a static objective. It's like it's going for this thing and then it's gonna wind all this stuff up to get to that thing. But I feel like there isn't you know you shouldn't be thinking about IT as a static objective even though that is the simple st way describe .

IT be a dynamic objective in IT was still crush out unit there there's a there's a misunderstanding of some of these ideas, especially as other people have transmitted them and simplified form um where somebody thinks that the bad thing about a paper clip, maxi mizer, is that it's got a single objective which is paper clips. But if IT has an objective which is paper clips plus staples, that's just as bad.

And also, if its paper clips plus tables plus cheek kes, that's just as bad. And the thing that's trying to make simple, you can be like extremely intricate clocks and IT still bad. And similarly, like, what does IT helpful if its preferences are in some larger medicine stem that are dynamic over time? If IT changes from paper clips to staples?

That's just as bad, I understand. But the the claim is that what you're saying is IT gets so good at what it's doing that that necessarily crushes out the humans. And what i'm saying is that my my intuition would be that somehow it's like somehow there's an assumption being made here that for example, a you know is that that there's only one of IT, for example, and that inevitably that IT doesn't end up being.

So in other words, you're saying you're going to this subjective, that objective kind of flashes out all the humans. I'm trying to think of an analogy and natural selection where you would you know biological and the history of of a baLance life. Where you are where you're saying, you know, we're going for this particular thing and the result is going to be that you don't have any what's a good example like atp.

same place where it's got like almost all of the thermal namic efficiency that's possible on that Operation. I think like ninety nine percent or something ridiculous like that. That's an example where biology went kite .

a hard yeah but but that's and that wasn't kind of what I was looking for. But i'm in in the the and by the way, the thing again, my intuition from looking at, you know, simple systems and kind of computational universe is this, oh my gosh, that was so incredibly clever to get to this point.

I ve been really shocked at the extent to which, you know, just by putting together the rocks in random ways, because just the commentaries, there's enough of them that these things were at a wow IT. Got to that thing I did, nine and nine percent. That's not as surprising as you think because out of the, you know, quintilian quinlan possibilities, IT IT was able to get there.

And by the way, you know IT is a long trial fact, which I think I just sort of figured out how this works of why BIOS evolution doesn't get stuck. And and maybe maybe the reasons that doesn't get stuck are the reasons that A I will kills. So I don't know that you know the fact that IT is possible to get to these know these high points satisfied and that you don't end up you know getting stuck in some local minimum, so to speak.

So well, why does that this line of reasoning work for the native americans? Why can they not reasoning? Like, well, why won't the europeans just set up one village and stop? And maybe there were some individual europeans who are, you know, okay with that, but then more european in k.

no, no, I understand. I mean, and you know, these examples from history are certainly, I mean, but these examples have humans trying to achieve human objectives, like humans want to take the territory, humans want to get the gold, whatever else that is.

You know, what you're arguing, which I think is different, is the humans came and they had this giant wheel where they held all these different choices, like ramon loves kind of, you know, way of predicting the future with kind of wheels of of possibility. They had this giant wheel, and they were spending the giant wheel and they said, we're gonna pick that random thing. Well, that you're making the argument that that random thing that they pick will have killed dinner americans.

I mean, if I was, if if they pick like twenty random things and desire nothing else in the universe, but these two things than one of those things is probably an open ended things in that does imply colonizing the whole company.

okay. But so so you're claim though, I mean, I think this is and I I am not going to be able to unread lls this i'm not going to I mean in you know this question about the space of possible purposes, I think this is a complicated question. I I mean, I have thought about IT in the past and the, you know this the measure on the space of possible purposes is a complicated issue. And what you what you are saying is I I completely agree that there will be purposes which are incomprehensible to us. They are, you know, somewhere randomly ly distribution, the space of possible purposes.

I also actually random. We just don't know IT.

There's a difference between I use the term random. I don't really believe in true randomness in our universe. So so when I say random, I I just mean .

no randomness, just indexing ical uncertainty. But yeah sorry .

okay so so no good you know so there's this thing that was you know something and then we're making the statement that the one thing we think we know about the A I S is that they have been successfully trained to optimize their achievement of purposes.

That's that's an assumption which is not totally obvious to make that the like natural selection has been you put successfully successful ly optimized for its purposes even though you know remember the story of the molests, they were they were just the molests were just making their shapes and we humans were imposing the purposes. But let's let's just assume let's take IT as a as a given, although I claim it's not as obvious as that might seem that an AI, the technology of AI, is set up to successfully optimize for a purpose, whatever that purpose might be. Given a purpose, D A, I can optimize to achieve that .

like GPT o one, which is not taken over the world but compared to previous systems, did like go a bit harder on its capture. The flag security test, like IT, was given an impossible chAllenge at scandal environment to figure out how to buy a pass of the impossibility and just directly sees the flag, you know, going outside the box for that.

That just doesn't impress me as much as as that description might make IT sound because because I ve just seen so many of these incredibly kind of, you might say stupid, uh, you know, little computational systems that managed to do things like as a say, even one just last night managed to do something where I was like, oh, come on, you IT heated basically. But of course, IT was following the rules that i'd given IT. IT was just that IT managed to find a way to get to the end point, you know, much more directly than I imagined. So I was I mean.

it's OK and not be impressed by that. But when he is still predicted to be lethal and sufficient quantities like the, again, europeans verses native americans. The europeans come out with guns. Native americans don't know guns are possible. People still don't believe guns are possible. Whenever hollywood makes a movie about site about aliens, the aliens shoot glowing points of light that move slowly enough for you to see them, because of the aliens just pointed to stick at you and you fell over, that would feel implausible, that feel weird, that that shouldn't be allowed. So, you know, like, imagine trying to explain to the native imagine you're unusually bright native of american.

You're try to tell you out of native americans that the, you know, people on the ships, if their ships are large enough that you couldn't build those ships, maybe the people on the ships, you sticks that they point at you and you just follow over dead without being able to see a projectile. Hollywood making science fiction movie still doesn't think that allowed IT sounds like it's just cheating in the game of pretend. So that's what I an and the point i'm trying to make us, that things that do sufficiently magical stuff can actually kill you.

you know? sure. But on one thing that I will completely agree about is there an infinite number of inventions that can be made, given the nature of the universe, there an infinite number of inventions?

He was in grams number, but you know a large finite number.

Well, I mean the in our model physics, the number is ultimately finite, but is for all practical purposes it's it's a know it's .

pretty yeah .

right right. So you know I think the um uh there are inventions, I mean, for example story of my life. I try to build what I call alien artifacts that is things which you once they are built, people can understand what the point of them is that they don't seem to be things that you are in the course of what the world is gunch produce. So, you know, I I understand the theory of trying to make things that, you know, you can make things. You can have inventions that are completely unanticipated, that are sort of ways to arrange the world as IT as IT is to do things that you absolutely .

didn't expect using pieces of reality. Whose rules you didn't know about that.

really about that. I I kind of feel like I am pretty sure we know the machine code rules for the universe at this.

There be higher level. There can be a higher levels, stuff we don't understand and something else can .

come with us through those angles and the possibility, you know, whether it's the algorithms I found by exhaustive search, whether it's things I found by doing, you know, adapting illusion kinds of things, whether it's things that a eyes will find, there's plenty of stuff to find that we didn't anticipate was there was there. Now I think my question is, what is not obvious to me is that all these different things that sort of all there. And I mean, okay, an argument in your favor in a sense is we go to another planet, IT will kill us, right? So in other woods, if we are, you know, most of the time you go to another planet and would block down IT surface, IT will kill us.

I I agree that if most possible molecular arrangements of the universe were full of people living happily ever after, the most things in A I could want would probably also be full of people living happily ever after. Is that but yeah.

yeah right. I mean, but but the issue that I see is, you know in this in this scenario where you know one thing is there is a state of the world that the A I has somehow, by whatever mechanism, imagine that is the state of the world. It's trying to tire itself up to get to right and that that your point is that the critical features is as far as I think, as long as your concern about AI technology is that IT is something which is trying to optimize purpose. It's trying to optimize its way to achieve a purpose IT that this is defined and it's trying to optimize .

getting that it's steering, it's outputs, actions such that there is an eventually dowst in consequence.

right? But but then then you, you, you gna get several steps. Then the next step is purposes that IT gets for itself are ones unforeseen by us. And therefore there's b no reason to think that those purposes will be aligned with things that a good for humans.

Yeah, this is basically just people running into the curse of merc's law upon computer security designers on rocket probe engineers. Murphy has has all these little curses, the curse of Murphy upon people who build rockets is that there is very extreme forces inside the rockets. So if you make one little tiny mistake, boom.

The curse of Murphy upon people who build space probe is that once it's high and in orbit and heading towards mars, if you've made a mistake, you can't just like pop over to IT incorrect IT so you screw up your metric material units like happen to, I believe, that one mars probe. Now too late. If if your script also destroyed your error recovery mechanism, which a lot of scripts do, you can just top up and build IT that's morphy curves upon the builders of space probe.

There's mercy curse upon the people who tried to build to cure Operating systems, which is that some smart person is going to be like. Looking at all the passes, they can imagine the system taking and intelligently searching out some weird path that does what they want, which is perhaps not what you want to. So if you am agent like, you know and and then there's mercies, cursive ruin, which is if you screw this problem, you are not going to get a second chance to back because you lost all your money or you're dead.

But I think you're saying that sort of my explorations of the computational universe, you're not worried about those because somehow, even if those things were connected to real lectures in the world, you're sort of not worried because they don't have this additional peace that is optimize for you, optimize to get to that purpose and they think are saying.

yeah, like the strange solar automata don't contain a echo and model of reality that lets them map out to having a lot of money or having a lot of paper clicks. They don't do the human scary thing.

No, but but the thing the point that you're making is that as soon as we have some of adaptive evolution, as soon as we have the possibility of optimization, that's the thing that you think is dangerous. It's not just random computation going along. It's that inside this computation is an optimization loop that is basically trying to get to whatever objective IT might have in the most efficient possible way.

I'm not worried as soon as that exists. That exists is in a gradient descent system running right now. I worried about of getting more powerful, like smarter than us.

IT has a Better model of reality. IT knows about more of reality. IT is a more effective searcher is a more effective planner. IT can far harder than we can counter this.

right? So one question is, how much does computational reduce ability write you in terms of the ability to actually do the opposite mizer? In other words, the current optimization that's being done is pretty course. You when you run a machine learning system and IT finds these rocks that IT puts together to assemble lime into the wall no, it's getting it's good enough, but it's not unbelievably tight.

What kind of technology do imagine humanity would half a million years from now?

Um I don't think human is gonna around a million is from now in its current.

If there's some form of intelligence million years from now, what kind of technology does that have? That's how much room there is to do Better than human.

So what is technology? So technology is is taking you know, things from the world and somehow sort of arranging them for what we imagine our human purposes. That's been the traditional definition of technology. Do we have a different tech definition of technology when we don't have humans around?

It's the pieces that you arranged like little some parts of universe on the way to where you're going. For example, you might build a license sphere. You can have the harvest day bunch of energy. Rather, I should more precisely say the entropy, but i'm just gonna ep saying energy anyways, you harvest a bunch of energy and then you use that energy to make paper clips or you know, figure out more digital of pie or you know, run the people who are conscious and having fun and being nice to each other.

So the dice sphere, you know, like and most possible ways you can obtain, most possible ways you can arrange a bunch of matter, aren't going to feed you the energy that you need to run your computer. So you need very precisely exact, narrow, improbable arrangements of matter to feed the energy to your computers. And that's the technology of the dice sphere.

But but you a posa has this pressure is iron iron fifty six coat around IT. And that has all kinds of things going on inside. Now you might say that's amazing technology.

It's got this in iron casing around the pusa and has all the superfluid, you know neutron matter inside and so on. You've got, you know, isn't that amazing technology that the pulse has? But actually, we don't think of that as technology.

We think of that as just a feature of the natural world. What makes something technology is that we can sort of we can pull IT in to human purposes. It's not technology when it's when it's just .

yeah I feel like this has the obvious generalization to alien purposes um now in the limit you can always have the imaginary alien who looks at a river and just wants that one exact river to be exactly the way that IT is. And they have no need for to select, to plan, to design to like knock other pieces of reality together, to make tools, to make tools to make the river. The river that they have is the optimal river. And they don't need technology, but you know, most aliens like that are going to die starving to death, looking at the river because they want to build the farm.

Let let's take the great red spot on jupiter, okay, which probably have many different steps that had to be gone through in the atmosphere of jupiter to end up with that red spot OK. And now we say, you know, is that red spot technology, or is IT merely not nature? Early nation argue, you think it's nature?

I sure do.

okay. But let's imagine that you're an alien who whose great sport is going round in circles, you know, high speed in the upper atmosphere of a gas giant planet. Okay, then then that thing is a, you know, you could think of IT as a wonderful piece of technology that happens to be provided to you by nature and all technologies provided to us by nature. We don't get to know when we have our you know, magnet or our little Crystal, whatever else. Those are things which are provided by nature, but which we managed to honors for some human purpose.

So i'm just going to say the obvious thing and say the alien then installed a bunch of engines in jupiter s atmosphere to make the red spot sword even faster so we can go surfing. IT has now installed technology in the form of engines. And the larger red spot that incorporate these engines is could be defined as technology or might say it's like a feature of nature with some technology bolted onto IT.

But okay, so so i'm going to say there are you know things like jet stream type type wind patterns that you know the red spot is doing its thing and then some jet streams arrive that managed to, you know guess IT to go. I I don't think, you know, I doubt you would ever recognize an alien engine. I doubt you would recognize IT. The alien is some kind of fluids and intelligence is some, you know, I don't think you can recognize an alien engine. I think it's deeply human to say, you know, it's an engine that's something where you're imposing kind of the the human version of IT.

I bet the aliens are moving the entropy around. I bet there like they occasion make things hot and then like transfer the heat to some place sells if they want heat. I don't .

mean like, like jets from galaxies that off from black holes where the OK, the black hole is. So absolutely trying to transfer its energy to out into the universe.

Reason why we've got these skyscrapers is that although you can live in a cave, there's places we d rather live even than caves. So we went around rearranging things. And if there are any aliens out there who are just perfectly content with their caves, they imagine skyscrapers and would be like, no, i'd rather I live in the cave and they're like that about every single thing, then we're not we're not going to meet them. We're gonna be be somebody will go to them, but they're come into us.

You know, this whole question about what's what's for a purpose and what's not, I think is very slippery. And I I kind of suspect, you know, I I think I am understanding your argument, which i'd not done before, so i'd not studied you. So so there's been super interesting to me to understand what I mean. I think it's an interesting argument. I claim that there are you know it's certainly not self evidence that this argument is like, oh, we should all buy you know A I insurance but .

whatever good that .

will do us good good but you know it's the the kind of you know your the way i'm understanding IT is and is kind of interesting that this is a sort of a close 3 logy and biology。 You're saying the thing that is really troublesome, I think, is this tightening, this optimization that happens that if he wasn't for that, if the computation is just doing what the computation does, would be OK. But as soon as do this IT wouldn't profitable.

they would be like a random computation that wasn't doing anything that opening eye could sell. That's why they, through out the initial state of the neural network, can use gradient cent to make IT .

do things they wanted to do more than just right. But somewhere inside, we think there are. So an interesting case here is artificial biological evolution. So for example, you know famously, you know gain of functional viruses or those kinds of things where you where you're running IT through many, many generations of artificial selection and IT. Sort of interesting perhaps that the the kind of the place where you're seeing trouble with the AI is actually strAngely similar to the place where we might see trouble with with things that one can do in biology too.

I mean, if you if you grind the black box on biology hard enough, IT might succeed in wiping assad, but IT doesn't have quite the same crushing sense of ruin that would be associated with facing down something smarter than you. I am not as scared of the virus is somebody should probably be worried about IT could just kill us all before he managed to kill us all.

Let me give analogy. Some part of your art, right? So some some part of your thinking is happening in your brain. There's also another pretty elaborate thought light process that happens in your immune system as the immune system tries to figure out, you know, IT has, you know, these things that the tea cells are interacting with each other and doing all kinds of things that .

we mostly don't understand .

action yeah it's doing things that are you know in detail different for brains, but it's still doing something that kind of a little bit intelligence brain like. And and so what what what is being said is, as you say, let's let's you let's use technology to make a virus that is more, more difficult for our poor fixed immune system, where was stuck with the immense system that biology gave us.

You know, there is an argument there that says, if you do that, if we, if we take IT outside of the know, we run things to make this virus that's incredibly efficient. And a lr immune system doesn't stand a chance. And I think that's a you know that's the case that I think is not so different from what you're saying, what you're imagining .

the A I I think in particular, if humans weren't allowed to fight back. Like using their own intelligence. And instead you just had somebody built like if instead you just had a system that like built viruses much more intelligently than the amount of intelligence than than that the human mune system is allowed to use a few used alpher fold four and started designing viral features that were, no, not just there to kill individuals, but there to kill groups and entire regions.

And you know, you just put that up against the human immune system, which is not quite static, but only shuffles itself, shuffles itself once a generation. Yeah, you could do some damage that way. And I agree that that's like a you know smaller, lesser miniature version of the problem of facing down a super intelligence using your own brain, which does get shuffled every every generation. You can use various external AIDS, but none of it's going to match up to the super intelligence yeah well.

just like the immune system, you use various excEllent get vaccines. You do all kinds of things like that, but you don't tell. And IT has to fight its battled and molecular scale, which is a little different than we. We do not have you. Our brains don't actually allow us to go to battle on the molecular battlefields.

Yeah, but you know, they do allow us to invent alphago ld three. And you know if, if, if IT comes down to a contest of the human brain, of the people trying to have the humans live and the brain running the death called trying to build the super virus, that does not quite feel to me like there's a higher level of this game and being played out by two systems that are both smarter than the immune system and the artificial evolution closed loop on the virus. So IT doesn't IT. I don't quite get the same crushing sense of dooming rule in after that.

I think we're not kind to solve the problem of whether the species is gna wiped out, get wiped out here, even though the the but I am, but I do feel like i've understood more about what your argument is. And I you know, on, you know, in real time I can't take IT apart and decide, you know, do I agree with that and am I going to, you know, say, you got a or or not? I mean, I feel like there are I feel like my intuition is that there are some some kind of just like you say, there are unintended things that will happen in the AI. I think there are unintended things that happen as you trying to taking up this argument, that is my car.

Well, why didn't the the conclusion would I would add with this? Like, you know, you can imagine the native americans trying to come trying really hard to cope with arguments. Why is like the the ships they see approaching, I can't possibly hurt them.

You know, like, well, what could they want? Can we really to find a language over what they want? Maybe most things that they want have to speak alive. And these people were in a much more favorable situation that they would have had a much easier time of retrieving.

Do I think there's a risk? I think there's a risk. Do I think that risk is such that, you know, do I right now think that risk is, I mean, anything one does in life has.

So there's risk. There's risk in all kinds of things that go on. And you know we humans, you know sort of believe in kind of moving forward independent of the risks.

And I think it's it's kind of like, is this such a kind of you do I immediately think this is such a looming ing risk that it's like change everything and it's kind of like I remember when people, you know, people, somebody was telling the years ago when when they, when they are kind of about, you know, when people are just worried than today about climate and C O, two emissions, things like that, they were saying, you can really reduce your, you, you, you, you know, you've got to really reduce your, whatever IT was, I know, energy, you, so whatever and it's like, you can do IT. It's doable and I said, what does IT take to do IT? And I said, what? You could not have a computer.

You couldn't have this. You couldn't have that. You know, it's like, well, yes, if I, you know, if I put my life standing on my head, then I would probably have, wouldn't have swelling my feet type thing, whatever is you know.

Then in other words, there's there's there's a risk and there's a what you have to do to avoid that risk and what the costa avoiding the risk is. And I guess my own, you know, I don't I don't know. You know, you make some interesting arguments, so it's kind of like my own immediate intuitive senses.

Yes, there's a risk. You is that risk something that will cause me to turn my life upside down? Not proven to me at this point.

okay. Um I mean for my for my perspective, i've been like forest on fire and you're like, well, what is fire exactly? And the thing is being like, well, like can wait.

Like is there a particular exotic set of preferences which would make this river exactly the optimal river? And you know like can you view the rivers? You know, thring up the water spout. You know this this does not prevent an A I from you building its own infrastructure and then killing you.

I wanted to protect the native americans from the, from the like, much more similar to them, europeans that we're coming toward to their, sure to say, like, well, you know, can we really have a language for describing all the things that they might want to? Isn't this language a very complicated sort of thing? This there's there's a lot of like delightful philosophical issues here, but I think they mostly integrate out of of the actual signal.

So that's the question is you know in my life people have had different scenario for how the world will end whether when I was a kid I was most thing you know um you know where were three type types scenario then later on I mean are some generation it's the words going to burn up.

But you know climate change and then this you this is very scenarios for how things and I think one of the things that you know is the question of, and sometimes there are things what people say, but but for example, when people are saying, you know, build the, build the large hadron collider and IT will create a black hole that we will destroy the world. That was a, that was a thing. People raised that possibility.

We understood the relevant laws. We ran the calculations. We worked out that we worked out of multiple constraints from multiple angles, saying the probability was tiny.

And in other times in human history, people have warned about let ted gasoline poisoning the soil. And people have been like, I you don't don't be you know these like don't be a worry word, it'll be fine. And then you know what kind of plays in the generation? And det, permanent.

Det, lifelong. G, brain damage to a bunch of kids growing up, let IT let in their soil. And I am. I most .

particular reminded of the story's people have told me about the manhattan project where they were going to detonate the first, you know, do the first nuclear weapons test. And the question was, wounded, ignite the atmosphere.

L six or two was the paper they wrote to analyze that is interesting reading. They again had multiple angles from which to look at this and say, that will definitely not happen. We do not have this with A I IT is more of a related gasoline situation.

I didn't think they've written IT up. That must have been much later because I think IT was. You know, what I heard was from people who are sort of involved who were like doing back of the envelope calculations and saying it's really far away from from ginning the atmosphere.

But but I think they did back of the envelope calculations and then they you know like the front of the envelope calculations, but you know good for that well.

but okay. But so the question now is, in that case, IT all worked out, didn't unite the atmosphere. What you're saying is, you know, we can't do the calculations for a eyes. And you're saying or or you think you have done the calculations for eyes and they will ign the atmosphere .

yeah back of the envelope. This this is not like a rigorous calculation, and it's not going to be a rigorous calculation before the world dance because this is more like this is more like tangled up biochemistry and and less like straight forward physics.

So the concern is you're back of the envelope calculation says a eyes will do the analogue, a good night atmosphere. And I guess my you know, one feature back of the envelope calculations is they require intuition. They're not things where you can you know, if you do some very rigorous ous thing with aims and you can go step by step.

IT doesn't really matter if you have good intuition or not. It's just mechanical. But good thing that envelope requires intuition.

Good thing that reality since the moment of its dawn has followed a hard and fast rule. If you cannot do rigorous former formal calculations about something IT is not allowed to kill you. Just no way in history of time is ever once had a cause of death that would that required intuition to understand.

That's not what i'm saying. What i'm saying is if we are trying to make up you, we've got two two possibilities. One is, you know, we shut down our lives or whatever, and we say, you know, we're gonna have this and that the other because because there's a risk that IT kills us or because we think IT will kill us, but IT will cause us all kinds of troubles, settled the stuff down.

And maybe it's even in practice to do IT, but it's something which which is a cost to us to shuttle this stuff ed out. And so we've got, you know do another one as we shut everything down and then we are sure IT won't kill us. And donor number two is was sufficiently worried that IT will kill us, that you know one thing we're sufficiently worried that will kill us.

So we settle that stuff down. The other thing is we don't shut that stuff down. We get all the benefit, all the known benefit, of being able to do those things. But we have some risk, could IT walk us.

And so the things that I think is the rational thing to do is to say, let's try and tight, not those back to the ebola intuition, so to speak, and see where we come out. And you know, that's a to me the thing you know, I think we both agree that there's a lot about kind of how I really works and what what's really doing that we don't understand. And I understand your analogies about, you know, the native americans or whatever else you to me, that requires a boatload of intuition which I don't think I have.

Now you may have IT and be absolutely right. And you know then you know it's kind of like I just don't have that info ish. And you know the intuition that I have is perhaps you know I have a little bit more than everyday intuition because I have different intuition about what computational systems do because i've sort of liberal them for a long time. But I don't have I don't playing to have the kind of intuition that would allow me to tell does this back of the envelope calculation that you're talking about, you know, does that land in A O gosh, you know, it's kind of kilos l or does that land in a oh, actually, you didn't think of this thing and that thing in the other thing and the, you know, the water about detail actually derail, ed, the intuition and we got the wrong answer so that kind of where I did I not yeah so .

we're now sort of like going into politics and I the the case I would make to the politician is at the point where the most legem credible expert um who recently won the nobel prize about IT says like well personal I am over fifty percent like my first order personal assessment is over fifty percent axius antil catastrophe that's that's code for killing everyone um but you know like taking into account what other people are saying I would say in public more like ten to twenty percent and the people who have been studying the issue longest are like that this kind of looks to us like a straight ford kills you like why would you even expect that not to happen? That's us and other people are going like there is no danger here.

This is ridiculous. The people talking about IT are stupid. Um and from a political standpoint, I think what you want to do at least and to start getting start getting started on preserving the option of shutting IT down, which is easier to do, which would have been easier to do in twenty twenty two twenty four IT will be easier to do in twenty twenty five, in twenty twenty seven if still alive in twenty twenty seven actually for not alive in twenty twenty seven and still easy to .

do in twenty twenty five. One thing that I will agree is thinking about these things is worthwhile. And you know, to have nobody thinking about IT is a mistake. You know, if there is A A high risk, nobody thinking about IT is is the wrong thing.

People who think about this for a while do tend to assume, do tend to start agreeing that the default outcome is everybody died. Sometimes they think they have clever plans for representing that, but they do tend to follow on with the default outcome .

as everybody died. I think the unfortunate there are many selection effects at work here. An many of us, you know, I will say that I am intrinsically an optimist. I don't know what how you feel about yourself. Are you intrinsically ally, an optimist or a personal story?

I try, I tried to do the court yet I would like the best calibrated, best discriminating probabilities I can manage. If you I think if you explicitly say that you are being an optimist or pessimist, you have clearly depart of the way of truth. You have confessed that you are now taking considerations other than truth than to account in your statements. How simple? How unvirtuous .

well for me, you know, what do I mean by that? I mean, I try to do projects that many people would say, oh, that's an impossible project. The fact that there's not a matter of truth, it's like this project is hard, but i'm going to be optimistic that it's possible rather than so.

You know, I think that's to say, you know, I don't do projects which I think you're obviously doomed. But on the other hand, i'm going to take the point of view, let me try IT rather than old. Gosh, it's never going to work. Let me not try. So that kind of I mean .

that absolutely how I spent the years from like two thousand one, two thousand twenty one or there about well two thousand twenty maybe um was you know like all right, i'm going to run at the alignment problem but IT became kind of clear that you know this wasn't gonna for me.

IT wasn't going to work for the other people people working on IT and that they field itself and kind of like failed to for a process that could distinguish shed that that could like publishing, distinguish elon musk. We will just make rock pursuit truth. And like even the old science fiction writers understood that humans may not be the most efficient way of producing truth.

And anything that just produces lots of truth may not produce other things of you know the best possible value but you know like who tells you on this that he believes and yes, there's selection effects like um people did in fact pay me to work on this problem and you know you're hearing me because people paid me to work on this problem and I didn't. Just like star and the people who found IT open eye, where people selected to believe that alignment was totally a problem within their grasp. Or and or to be willing to just take you on musk money and run with IT. Um and if you want to look for people who are not selected, I think you went up with jeffrey hinton, you know the guy who just won the nobel prize for the worker did on machine learning and you know kicking off the whole modern deep learning resolution that isn't obviously selected that the guys saying like well personally fifty percent but if I taken to code other people are saying ten to twenty percent I really need to actually sit down and talk with him at some point which never actually done and try to talk out out out of listening to those particular other people. I think the arguments .

to make no sense, you know, IT really IT damages my thinking about this, that the various people you mention, you know, I i've knowing all these people sometimes for a very long time, so it's of a throws a ranch in my kind of, you know, there is, there is sort of, you know, none of us have, I think you know, perfect so of calibrated rationality about everything and it's it's kind of, uh, you know, it's always chAllenging to, you know, this this thing about, no, I consider myself a convenient person in terms of your argument.

I'm not you know I don't have a you know i'm not about to say, oh, you know i'm going to a change my life you're right without understanding IT, but i'm i'm convenient so i'm not and I don't know how many people are. Maybe there are many people who are not, but I think I am i'm not convinced yet, but that you know I mean, like this conversation, I understand much more about what you're saying and it's you know I think there are to me, there are interesting questions to try to answer, which people should try and answer. I mean, you make IT sound like it's urgent. You know, the people are coming off the ships and and you might be right, you might be right. And I mean, i'm watching .

things go down hill in terms of how much you'd have to spend and how much pain IT would take to get to like to proliferate the technology.

And you know would have been I don't think it's really .

just I don't think it's a less than the person go for if we did IT today. And even if we like and if we do IT later, maybe IT costs more like world war two. But humanity did not lie down and die when the alternative was fighting world wti. We went off and fun world war two. And if everybody is going to die, otherwise you just do what IT takes.

Yeah, but the fact is there are things that one can be proferred, like new pay weapons, for example. Because the supply chain is really complicated. There are ideas. It's very hard to deeper a fright ideas.

yep. If if, if they were the case that a single person on earth having the idea in their mind of artificial super intelligence would cause everyone to die, that would be even worse situation to be in than the one we are in right now. I might, in that case, well, despair, or like try to do something like weird and silver. Uh, I don't think we're in that situation. I don't think we we need to deploy rate the idea.

Okay.

gentle man, thank you so much. I'm gonna call IT now. I think this might be the best conversation in emile's I history. thanks. And part of that, our purpose is about creating conversations of this quality level with people of your know calibre. And I I was expecting this to be a bit of a kind of I don't say ChatGPT conversation but talking past each other and the authentic exchange that you ve both just had is really mind blowing. So thank you so much.

Thank you. thanks. Thanks for setting this up. This was interesting. He is nice to see you, hopefully. Well, I think it's been like fifteen years since we saw each other in person are all like.

yeah, see you in another fifteen years? No, I, I, I, I can I kit fifteen years .

according to you, it's well over.

Well, well, yeah, but but like that would be great to still be around in fifteen years. But that said, I am not saying that we like shouldn't talk again for another fifteen years and not saying this .

was the option of anything. And we both, I think, are interesting in cronic and things that maybe we get to talk in three hundred years and we get to say, you know.

this was the if we get to resolve all the best we make up through there though, you know, conditioning on that possibility, probably you want a bunch of arguments.

Eliezer Yudkowsky and Stephen Wolfram on AI X-risk 04:18:30 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

Eliezer Yudkowsky and Stephen Wolfram on AI X-risk