We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

FSD v12: Tesla's Autonomous Driving Game-Changer w/ James Douma

2024/4/12

Dave Lee on Investing

AI Deep Dive AI Chapters Transcript

People

James Douma

Topics

James Douma: 我对特斯拉FSD V12的发布感到非常惊喜，其性能远超预期。与之前的版本相比，V12采用了全新的端到端规划方法，彻底抛弃了原有的启发式规则。这种转变虽然大胆，但效果显著，V12在实际驾驶中表现出了更高的智能和更自然的驾驶风格。我原本预计V12会存在一些因取消启发式规则而导致的问题，但实际体验中，这些问题几乎不存在，这让我对特斯拉的研发能力深感佩服。 James Douma: 神经网络的训练方式与传统的启发式系统截然不同。启发式系统容易因逻辑错误而导致灾难性故障，而神经网络则更倾向于优雅地失败，即使出错，也往往只是给出稍微错误的答案。此外，V12通过模仿人类驾驶员的行为，学习了各种驾驶技巧和安全习惯。这种模仿不仅包括显而易见的驾驶操作，还包括许多人类驾驶员无意识的准备行为，例如在交通拥堵时稍微向左移动以观察前方路况。这些行为虽然看似微不足道，但却能显著提高驾驶的安全性。 James Douma: V12的模仿学习并非简单的复制，而是对人类驾驶行为的理解和泛化。系统能够根据不同的交通状况和环境因素，灵活地调整驾驶策略，例如在十字路口不堵塞交通、在高速公路上保持适当的车距等。这种智能化的驾驶行为，让我相信V12已经超越了单纯的模仿，具备了一定的理解和推理能力。当然，V12也并非完美无缺，例如它有时会模仿人类驾驶员在高速公路上跟车太近的行为，这让我不太喜欢。但我相信，随着特斯拉不断收集数据和改进算法，V12的性能将会越来越完善，最终实现完全自动驾驶。

Deep Dive

Shownotes Transcript

Translations:

中文

Hey, it's Dave. Welcome to today. I'm joined by James Dama and we've got a whole host of things to talk about. We've got Tesla's FSD V12 release that just happened this past month. We've got Optimus to talk about and

And this RoboTaxi reveal in August. So anyway, it's been a long time. It's been like at least half a year. It was August or something like that. So yeah. Yeah. I remember the last time we met, we talked about V12 because they did a demo and we were quite excited about the potential, but also a little bit cautious in terms of

how it will first roll out and how capable. But I'm curious just what has been your first experiences and first impressions of Utah? Like how long have you been driving it for? - I got it a few Sundays back. I think I got it the first weekend that it really went right. So I think I've had it three weeks or something like that, maybe four, three probably.

And of course drove it out here to Austin from Los Angeles, drove it quite a bit in Los Angeles. On the way out here, so my wife has this hobby of like visiting superchargers we've never been to. So every cross country trip turns, it ends up being way longer than otherwise would be. But one of the cool things about that on the FSD checkout tour is that we ended up driving around all the cities on the way, you know, 'cause you're driving around to the different chargers and stuff.

And so you get a chance to see what it's like in, you know, this town or that town or different, you know, highways are different. We drive a lot of rural areas. So I got lots of rural. We did like the whole back country tour coming out here through across Texas. And so I feel like it was a good experience for like trying to compress a whole lot of FSD. Yeah. And I got to say, I'm just like really impressed. Like it's, I was not expecting it to be this good because it's a really like...

This is not a small change to the planner. With V11, we had gotten to a point where the perception stack was good enough that we just weren't seeing perception failures. But people, almost all the complaints people had had to do with planning, not getting in the right lane, not being able to move far enough over, not knowing when it was its turn, stopping in the wrong place, creeping the wrong way. These are all planning elements. They're not, you know, so...

If you're going to take a planning stack that you've been working on for years, you've really invested a lot, and you're literally throwing it away. They're just not retaining any. At least that's what they tell us. They got rid of 300K lines. They went end-to-end. It's harder to actually mix heuristics into end-to-end, so it makes sense that they actually got rid of almost everything. Anything they have in there that's heuristic now would be built new from scratch for the end-to-end stack. And yet, they managed to outdo...

In what seems to me like a really short, because they weren't just developing this, they were developing the way to develop it. You know, they were having to figure out what would work. There's all of these layers of stuff that they had to do. So my, you know, my...

expectation was that the first version that we were going to see was going to be like on par. It would have some improvements. It would have a couple of meaningful regressions and there would, they would be facing some challenges with, you know, figuring out how to address it because it makes sense that they want to get it out soon. And the sooner they get it out into the fleet, the faster they learn. Um,

But the degree of polish on this was much higher than I expected. And like, you know, Bradford stopped by and I got a chance to see 1221 as he was coming through. And we only had about 40 minutes together, I think. It was just like this spur of the moment thing. And yet even in, because he was kind enough to take it places that I knew well, that I had driven on 11 a lot.

And I think it took me about three blocks to realize. And after 45 minutes, I just knew that this was going to be completely different. And everything that I've experienced since getting it. And I...

you know, what have I got? I must be at like 50 hours in the seat with it right now. A few thousand miles, highly varied stuff. Yeah, it's super solid. Yeah, I think, yeah, I wanted to dive into kind of how big of a jump this FSD-12 is because when I drove it, I was shocked because this is not like a,

I think V12 is a little bit of a misnomer because this is a drastic, you know, rewrite of their whole planning architecture approach, different, different, um,

I mean, on their perception, it seems like they probably kept a lot of their neural nets in terms of the perception stack added on as well. But in their planning stack, this is where they pretty much, it seemed like they're starting from, I wouldn't say scratched completely, but they're taking out all of the guardrails, all their heuristics, and they're putting on this...

end-to-end neural approach where it's deciding where and how to navigate the perceived environment. But I would have imagined, and this is kind of my expectation also, is like you would be better in some ways, it would be more natural, et cetera, but then there would be some just like weird mistakes or things that it just doesn't get because all of the guardrails are off, the heuristic ones. And so you're just like, it's more dangerous in some other ways, right? And that

On par though, Tesla would wait until it would be a little bit more safer before releasing V12. But what we ended up getting was we got this V12 that just seems like really polished. It's not easy to catch those big mistakes in V12. And I'm kind of like, where did all these big mistakes go? That was my expectation at least. And so I'm wondering like,

like what was your experience? Like, did that catch you off guard? Like just seeing the, the, the small number, you know, of, of big mistakes or seeing how polished this V12 is. And then I also wanted to go into like, how did Tesla do that in terms of, cause once you take off the heuristics, the guardrails, you really have to like, it,

I'm curious to hear what's your take on how you think they achieved this with B12, the polish that they have. - Well, first, yeah, it was, well,

There's two components of like starting out experience. There's like my sort of abstract understanding of the system and what I sort of rationally expected. And then there's, you know, there's my gut, you know, because I've got, I've got like 200,000 miles on various layers of autopilot, including, you know, maybe, I don't know, 50,000 miles on FSD. So I have this muscle memory and this, you know, sort of sense of the thing and,

And I expected that to sort of be dislocated. I mean, you know, going from 10 to 11 and was also, I mean, they added a lot. This is not the first time that they've made pretty substantive changes. It's the biggest change for sure, right? But I was expecting it to feel a little bit weird and uncomfortable. Yeah.

But sort of intellectually, I was expecting all the old problems to go away and a new set of problems to come in because it's a different product. Because the perception was pretty polished and the things that people were most aware of as failings of the system were essentially baked into this heuristic code. Well, of course, we take the heuristic code away. All those failings go away too. But what do you get with the new thing, right? So-

And, you know, so that did happen. Like all the old failings went away, like rationally, right? But it was weird to sit in the seat. And, you know, there's this street you've driven over and over and over again where there's this characteristic behavior that it had, which is, you know, maybe not terrible, but not comfortable maybe, or less ideal than you want, or slower, annoying, whatever the deal. And those are just gone. Like all of them, not just like one or two, they're just like gone, all of them. So that was...

It was such a big disconnect that it was kind of disquieting the first week or two. I mean, delightful, but also disquieting because now you're like uncharted territory. What demons are lurking here that I'm not prepared to? Because after you drive the heuristic thing for all, you kind of got a sense of the character of the failures. I mean, even if you haven't seen it before, you know the kind of thing that's not going to work. And now-

But I didn't really find those. Like I haven't really found, I haven't seen something. And I was expecting to see a couple of things that were kind of worrisome and where it wasn't clear to me how they were going to go about addressing them. And I just, I really haven't. Right. And so like in that sense, I'm really, I'm more optimistic about it than I expected to be at this point.

How do they do it? Yeah. Okay. So let me give context that question a bit more because I know it could be open-ended. So I would imagine that if you go end-to-end with planning, that driving is very high stakes. You have one mistake. Let's say you go into the center...

aisle or there's a concrete wall or there's a signpost you just drive into or a tree or something. It just seems like you have one second of mistake or even split second and your car is just catastrophic. It could be. And with V11, up until V11, you had these guardrails of like, oh, stay in the lane and do this and all that stuff. But with those guardrails off, you

Like V12 could, when it's confused, just make a bad move, you know, and just go into some, you know, another car, another lane, another object or something. But what about it is preventing it?

you know, without the guardrails? Is it just the data of mimicking humans or is there something else attached on top of that where they're actually doing some simulation or stuff where it's showing like what happens when you go out of the lane into the next lane, you know, into oncoming traffic or if you do something like, is it,

Are they pumping the neural nets with lots of examples of bad things also that could happen if it doesn't follow a certain path? What's your take on that? - So that question prompts a couple of thoughts. So one thought,

- Okay, first of all, preface it all. Like I don't know what the nuts and bolts of how they are tuning the system. They've told us it's end to end, right? So that basically constrains the things that they could be doing.

But when you train in a system, you don't have to train it end-to-end. I mean, some training will be done end-to-end, but you can break it into blocks and you can pre-train blocks in certain ways. And we know that they can use simulation. We know that they can curate the data set. So what's the mix of stuff that is really hard to predict? There are going to be a bunch of learned methods for things that work well that are going to be really hard to predict externally just from first principles.

This whole field, it's super empirical. One thing that we keep learning about neural networks, even like the language models, we can talk about those some if you want to, 'cause that's also super exciting, but they keep surprising us, right? So you take somebody who knows the field pretty well, and at one point, and they make predictions about what's gonna be the best way to do this and whatnot. And aside from some really basic things, I mean, there's some things that are just kind of prohibited by basic information theory, right?

But when you start getting into the nuance of, oh, will this way of tweaking the system work better than that way? Or if I scale it, if I make this part bigger and that part smaller, will that be a win or a lie? You know, there's so many small decisions. And the training is like that too. Like, how do you curate the data set? Like, what in particular matters? What makes data good? Like, that's a surprisingly subtle thing. We know that good data, like some training sets, get you to a good result much faster than other training sets do.

And we have theories about what makes one good and what makes one bad. And people on some kinds of things like text databases, a lot of work has been done trying to figure this out. And we have some ideas. But at the end of the day, this is super empirical and we don't really have good theory behind it. So for me to kind of sit here not having seen what they have going on in the background and guess, I'm just guessing. So just like, frankly, like I have ideas about what they could be doing.

But, you know, I would expect them to have many clever things that never would have occurred to me that they've discovered are important and they may be doubling down. We actually don't know the fundamental mechanism of like how they're going about doing the mimicry. Like what degree of, you know, we know that the, you know, they have told us that the final thing is photons in controls out is end to end would be, right? But,

So the final architecture, but like how you get to the result of the behavior that you want, you're going to break the system down. They're like, I don't know. It's just like there are many possibilities that are credible picking and they vary a lot and picking the one that's gonna be the best. Like that's a hard thing to do sitting in a chair not knowing. They are doing it really clearly and they're getting it to work. Like the reason I...

It fascinates me on what type of like...

kind of catastrophic scenarios or dangerous things that they're maybe putting in. The reason why it fascinates me is because with driving, part of the driving intelligence is knowing that if your car is like one foot into this lane and there's oncoming traffic, that that's really, really bad. It's gonna be a huge accident versus if there's no cars or something, then it's okay. Or if there's,

Or just the driving intelligence just requires an awareness of how serious mistakes are in different situations. And in some situations, they're really, really bad. In some situations, the same driving maneuver is not that dangerous. And so it just seems to me like there have to be some way to train that, right? To teach the neural nets that. So there's an interesting thing about the driving system that we have in people.

Okay, first, so the failure you're describing is much more likely with heuristics. Like heuristics, you build this logical framework, a set of rules, right? Where, you know, when heuristic frameworks break, they break big. Like they, because you can get something logically wrong and there's this gaping hole, this scenario that you didn't imagine where the system does exactly the opposite of what you intended because you have some logical flaw in the reasoning that got you to there, right?

So, you know, bugs that crash the computer that take it out. Like we, you know, computers generally don't fail gracefully, heuristic computers, right? Neural networks do tend to fail gracefully. So that's one thing, right? They're less likely to crash and they're more likely to give you a slightly wrong answer or, you know, to get almost everything right and have one thing be kind of wrong. Like that's a more kind of characteristic thing. So neural networks fail.

You know, the way that they're going to fail is going to be a little bit different than heuristic code. And they're just by their nature, they're going to be somewhat less apt to that kind of failure. Not that it's impossible, just that it's not going to be the default automatic thing. You know, if you get an if statement wrong in a piece of code or something, you know, catastrophic failures are kind of the norm in logical chains.

So, then there's this other thing, which is the system that we have is for, is it's co-evolved with drivers. You know, you, you know, you, you learn, you develop reflexes, you read the traffic, you read the environment. You know, when the lane gets narrow, people slow down, people sort of

have a set of reflexes that adapt to an environment to try to maximize the safety margin they have for what they're doing. When you're driving down a row of parked cars, if you have space, you move over to give your safe a little more space. You know, if you're coming up on an intersection and you can't see what's coming, you may slow down, you may move over to give yourself more space to see what, like all of these unconscious behaviors, right? And the road system

has been developed over a lot of years to like take advantage of the strengths of people and minimize the weaknesses of people, right? I mean, the way, the amount of space that we provide on roads and the way that we shape our intersection, sight lines, that kind of stuff, the rationale for how our traffic controls work and all that kind of stuff is,

It's evolved to the strengths and weaknesses of human beings, right? So human beings are constantly trying to, within certain margins, maximize their safety margin, give themselves, make themselves more comfortable that they understand what's going on, right? So...

And now we have a system that's mimicking people, right? So like there are funny things that the car will do that just really kind of underscore this. Like, you know, you're in a line of cars and they suddenly slow down and you have a truck in front of you. So one of the most natural things is people will pull over a little if they can't see to see what's happening up there to help them prepare for what might be happening. You give them more situational awareness. Well, you see the cars do this.

Sometimes the funny thing about the car is the car, the car, like its camera is in the center. So moving a little to the left doesn't let the car see around the car ahead of it. Right. It still can't see, but it still mimics that action. So similarly, coming up to an intersection, slowing down, moving over, you know, preparing yourself. So,

So essentially there's this interesting characteristic that you're going to get out of that is that the planning system is going to mimic the margin, you know, the little preparatory things that give you a little more margin, a little more situational awareness that help you prepare, give you a little more time to react in case something happens. It's mimicking all those things now. Yeah.

instead of the heuristics having to kind of be perfect, instead what the system is doing is it's learning to mimic drivers who already have all these reflexes and behaviors in a really complicated contextual environment. So it's not like, we're not talking about four or five behaviors. We're talking about four or 5,000 behaviors, the kind of things that people were, as drivers, were not even aware that we're doing them and the car is mimicking that, right? And the thing, and so...

So they're going to fail more gracefully and they're mimicking drivers who are, you know, who are cautious in situations where they need to be cautious. And they're, you know, they're making small adjustments to give themselves more margin all the time. And I think we may have underappreciated the degree to which, you know, human drivers with a lot of experience that have reflexively, you know, developed a lot of behaviors that are actually, because we're talking about good drivers here, right? Mm-hmm.

they've unconsciously developed a lot of habits that actually have an appreciable impact on their safety. And the system is now getting those for free kind of because it's mimicking drivers, right? Even all the little nuance things that kind of don't make sense. Like I said, like pulling over to see what's ahead of the car, ahead of you. Or we see the behavior where, the very charming behavior where

You know, it doesn't block the box. You come to an intersection and if it's not clear that it can get across, it stops. Yeah. Right? Like nobody had to program that. And if you look at intersections, like when to do that and when to not do that, that's kind of subtle. Yeah. Like is the car ahead of you going to move forward enough by the time you cross the intersection or is it not? And if you look at the flow of traffic, like as a human, you're like...

better than even odds there'll be space when I cross or no, I should definitely stop here because I don't want to be caught in the intersection. The cars mimic all that. Yeah. Even in really complicated context. Yeah. I mean, I would say, I mean, mimicking, it seems like it goes even a little beyond the mimicking at times. I think this is like the uncharted territory, which V12 surprises me is it mimics with some level of understanding sometimes. Like why it, because it's,

For example, you don't know whether to go into the intersection or not, or let's say you're turning into a left turn and pedestrians are here. Every situation is a little bit different.

And so just because in your data you have a bunch of examples, it's like it might not be the perfect, like you might not be able to mimic perfectly because it's a new situation. So you've got to infer kind of in this new situation, what should I do? And that's where I think it's not just mimicry. And it could be just mimicry right now, but the big, I guess, problem

jump in ability is kind of like LLMs. They can understand to a certain extent what you're asking for in a new situation or a new dialogue, et cetera. I think the word you're looking for is generalize. Yeah, yeah. Maybe generalize. Taking the specific mimicry situations that the data provides and generalizing those. But there's a certain level in order to generalize that you do need...

beyond just mimicry, right? Some level of maybe application. So mimicry, I mean, we talk about mimicry. Mimicry is the training goal, right? Do what a human would do in this situation. That's why we call it mimicry, right? But mimicry

The system, it doesn't have the capacity to record every single possibility, right? So it's frequently going to see a situation that's kind of a combination of situations it's seen before. It's not a duplicate of any of them. And you have to kind of figure out how to combine what you learned in these other situations that were similar, but come up with something that's different. And yet somehow it follows the same rules. So a way you could think about it is that

the using the block the box thing depending on how many lanes of traffic there are and how aggressive the drivers are and what the weather is like what the cross traffic is like and there's all of these variables you you as a human you come up to the intersection you have to make the decision whether you're going to cross and maybe get stuck or whether you're going to you're going to pause and wait for the other car to move up you

You know, I saw one, I've seen one where I had the block box and you could see the light at the end of the row of cars, right? And I'm like, this is a thing humans do. When this light turns red, you know you have plenty of time to cross because it's not going to turn green, you're not going to get stuck. And you see the next light up there turn green. Well, even if you get stuck in the box, it doesn't matter. I've been in that situation twice now. And the car moved into the intersection, even though it would block it because it's confident that the row of cars... Well, who coded? Nobody coded that, right? There's...

Now, as a human, I'm describing this thing. Well, here's a rule I just made up. If this light has just turned red, you know, there will be no cross traffic and the light ahead turns green. Well, the cars ahead, they're definitely going to move forward almost certainly, right? Unless there's a broken down car or something like that. And so you see humans do this. They move up because they know they're going to be able to and they want to take, they want to reserve that space behind that car for themselves, you know, to get their priority for crossing the intersection.

So they move forward. I see the car mimic this behavior, right? Only where it's really appropriate. Mm-hmm.

So in a sense, when I described that to you, what I did was I looked at the situation and I figured out what the rules were. Oh, this light changed, that light changed. Now I have time, right? Yeah. But when I've done that in the past, I didn't think about the rules consciously. I'm not checking off this list of things. I see the conditions where it's safe for me to move forward. I'm unlikely to block anyone and I do it, right? So a way that you can think about what the system is doing is we're training it to mimic it, right?

But it has to compress that somehow to save that into a set of rules that are more general. So you can think of what the system is trying to do is trying to figure out what the rules are. Like I've seen these 50 block the box situations. What rules say when it's good to go and when it's not good to go? So if it can figure out what those rules are, like if it's

It's essentially getting, and understanding is a loaded word. So I don't like to use understanding, right? But it's deriving a representation of the rule set, if you will, that humans cross, which might, when we write code, we want to minimize the rules, keep the code simple so we don't have weird bugs and that kind of stuff. But neural networks, if the simple version of the rules is 300 rules,

That's fine. Like 300 rules is no problem for them. So if humans have unconsciously 300 sets of rules that we use to decide when we go across and it can come to figure out what those are, well, that lets it generalize. It can now take the same principles. It's extracting the principles unconsciously, not rationally, just reflexively in the same way people do. It's extracting the principles that humans are using to make that decision, right?

And it's applying those to its own actions. And so that's where you're able to get this. And we see it manifest in some, you know, in some cute behaviors that are irrational for the car, right? Perhaps. But it also captures, I mean, the fact that for, I mean, you get, you know, as Parim had said that you get the puddle avoiding for free, right? You got the U-turns for free. Like when is it safe to U-turn or not? That's hard to write. It's just, you get that for free, but you also get the,

Oh, this guy wants to turn left into the parking lot. So I'm going to pause back here and let him go. Or somebody behind me wants to pass me. I'm going to move up a couple of feet so they can get in or move over. You can see the car's doing all of this stuff. Yeah. Right? Like they're not, you know...

The autopilot team, they're not picking and choosing the behaviors that they want. I mean, it seems clear to me anyway, looking at this, that they're grasping the whole spectrum of all the behaviors that people do, the polite things, the impolite things, where people are irrational. I mean, one thing that I do...

Like one of the things I liked before, because it does mimic some things that I would prefer it doesn't mimic, but they're extremely human behaviors. And that is like when you're on the highway, humans tend to follow other humans, other cars too closely in certain situations where the traffic is kind of dense and whatnot. And I've been just using the auto speed, letting the car pick its own spacing and stuff. And I noticed that.

You know, previously there was a Heuristic, this many car lengths and no less. And, you know, maybe temporarily for braking and stuff, it might go slower, but it was really good at maintaining a really comfortable distance. And now I notice it's kind of, it's driving more like people. And I kind of preferred when it was keeping more space. Like I liked that, the car's ability to like maintain more, have a bigger, and you know, you don't pick up rocks from trucks and stuff.

But it's now, I'm finding it's mimicking human following behavior, which I personally find less than ideal. But that's part of the whole, like, that's definitely something that if you were picking and choosing, you wouldn't have picked that to add because it's not a win. Like, it's an irrational behavior that humans engage in that can lead to accidents that reduces your safety margin. But the car is going to mimic that too. Yeah.

because they're taking the good with the bad in order to get everything, including the stuff that they don't necessarily know is in there. I was suggesting there are all these unconscious rules that we follow. Well, they're unconscious to the autopilot team too. Like they don't know to go look for that. But the net-net is, the reality is they've got this thing, it's out there and it's just working incredibly well. - Yeah, yeah. I mean, it's interesting, I guess, on the topic of generalizing. So,

I think that's probably one of the most, I think, promising aspects of V12 is that the behaviors that it's picking up, some of it can be unexpected because let's say you've got 100 videos on YouTube

on whether or not to go in and out of an intersection or something at a yellow light or something or a green light, even if it's blocked. But then, so the neural nets are analyzing and training data, like through billions of parameters and analyzing these videos, getting what it can out of it. I also wonder, I guess it goes back to this whole thing is, are they adding more types of data where it's like,

are they adding onto those video clips or providing different stuff of if this car actually does this, then there's a crash or does this, there's a crash. 'Cause it seems like if they're only providing a hundred, let's say video clips of it doing well, then the signal for the negative, for the dangerous situation isn't as high as if you give it directly. - So that's useful in reinforcement learning where having negative examples is really useful 'cause you're trying to figure out what the score is and you have a good and bad.

In the case of human mimicking, right? The score is just how close did you get to what, like the way you rate how the neural network is doing in training is you show it a clip it hasn't seen before and you ask it, what do you do here? And you rate it just by how close it was to what a human did. So you take a human recorded example that the system isn't trained on and has never seen before. And when I test it to decide,

These other clips, are they helping? Are they hurting? I give it one that's never seen before. And good and bad is just how close are you to the human? It's not, did you crash? It's not, in reinforcement learning, you do that, or contrastive learning. There are other things where you do that. But the simple mimicking, at least the way that it's done in robotics, overwhelmingly, right? Is we just, we have a signal from a target that we want you to get close to.

And your score is just how close you are to that. So the degree to which it mimics a recording of a never before seen good driver behavior, that's its score. So you don't need the crashes. - So do you think that they're only doing that type of mimic training versus are they, you don't think they're adding on different types of contrastive or let's say reinforcement learning or whatever? - Long-term reinforcement learning is gonna be really useful.

I mentioned there are various ways that I can, fundamentally neural networks, the way they train them is you give them an example and then they say what they would do in this situation. And then you give them a score and based on the score, you adjust all the weights and you just do that over and over again. And the weights eventually get really good at giving you the answer that you're looking for. Okay, how do I pose the problem?

So in reinforcement learning, what you do, the problem is you do, you play all these steps and then you get a score for the game. So this is how like DeepMind did with the Atari games and that kind of stuff. You do a whole bunch of actions. And this is the challenge in reinforcement learning is it's hard to know which

You know, if you have to do a hundred things to get a point, well, how do you know which of the hundred things you did was important, which wasn't? Like that's a big challenge. So reinforcement learning does all that. But because of this challenge, reinforcement learning tends to be very sample inefficient, we say. You need lots and lots and lots of games to play before in order to learn a certain amount of stuff.

If on the other hand, you were trying to train Atari, right? And your feedback signal was have the paddle go exactly where the expert human does, right? Then that's more sample efficient, it learns faster. So remember we've talked about the AlphaGo example before, right? So when they first started training AlphaGo, the first step that they did was they had it mimic humans. They took 600,000 expert human games and the first stage of training the first version of AlphaGo was,

They just trained it via human mimicry, do what the human did. Now that got them a certain distance, right? That got them to, because they had 600,000 games, which were decent, but you know, decently good human players, but they're like amateurs or whatever. How do you get to the next level? Well, in the case of a game like Go or chess or whatnot, a thing you can do is you can start doing reinforcement learning. Now, reinforcement learning in those kinds of settings, in chess, you've got, you know,

16, 30, 50 moves, choices at any given point you have, and maybe only 10 of them are good choices. So you don't, you know, the tree of possibilities doesn't expand that quickly, right? So,

So essentially, you can get the network that's trying to learn which of 13 possibilities to converge much faster than if the choice is much bigger. And in the world, you know, we have these continuous spaces where like you can turn the steering wheel to 45 degrees, 22 degrees, 13.457 degrees. You know, the space of possibilities is really large. And so because so this is a real challenge with reinforcement learning. So.

People have tried to do reinforcement learning with cars in games, like car driving video games and that kind of stuff. And we know it works, but we also know it's very sample inefficient. Okay, so me looking right now at where Tesla is-

I would guess that they're doing human mimicry and they might be doing a little bit of reinforcement learning training on top of that. You know, maybe there's something you want the system to do and it's not quite getting there with the mimicry, stopping at stop signs, you know. And so you can layer on a little bit of reinforcement learning on top of that to just tweak the behavior of the system. And incidentally, this is what

This is what ChatGPT did originally. Remember there with ChatGPT, there was the basic training. Then there's instruct training where you tell it, don't just predict the next token, pretend you're in a dialogue, right? And then there's one more step after that that they do with ChatGPT, which was the reinforcement learning from human feedback, right?

which is where you do, after you get to that point, now you do a little reinforcement learning and you train it. Don't just pretend you're in a dialogue with me, but you're in a dialogue with me and you want to please me. These are the answers that humans prefer. So that last one is the one that makes it polite and gets you

alignment and all that other stuff. Now, it's a tiny fraction of the overall training. The overwhelming bulk of the training is a pre-training. Just predict the next token. And then there's a big chunk of the instructing. Okay, so you can do a similar thing with self-driving. I would sort of expect that that's how it would evolve. There's a ton of pre-training for the perception network, which is just...

they already have all this labeled data and they've got an auto-labeler so they can take these recordings, they can generate maps of where all the street signs are, they can ask the perception system, tell me where the sign is and whatnot. So that's a ton of training on supervised data, which is very sample efficient. That's the most sample efficient kind. Then they go to maybe a more general thing where they're mimicking humans. That's also supervised, but it's in a broader domain, but it's still more sample efficient, much more sample efficient than reinforcement learning.

So then at the tail end, you add, you know, it's this layer cake. You build the foundational capabilities, then you do some refinement and add some additional capabilities, and then maybe you fine tune with yet another kind of training at the end of it. So if they're using reinforcement learning right now,

because of the sample efficiency issue. I would expect it to be that cherry on top kind of thing right at the end, the last little bit where there's one or two things that the mimicking isn't getting you or it's mimicking a behavior you don't want it to. And now you come up with a new game for it to play

where you've got a game and it has to get a score and now you're gonna do reinforcement. So you could totally do that. And eventually they will, because if you really wanna get deeply superhuman, that's how you did it. And that's what we learned. One of the examples from Go was, it got to play when it was, when they were first playing Fong Wei, who was the European champion, like it could kind of get to his level with that mimicry and maybe a Monte Carlo search on top of that, which is basically-

not just doing the first thing the neural network has, but exploring a few possibilities just heuristically, right? That got them there and they could be found way. But they're not gonna beat Lisa Dole that way. There aren't enough example games for it to train on. It has to play against itself with this reinforcement. And then the sky's the limit. How good it is possible to be becomes the limit of how good the system can be. And then they can become truly superhuman.

So eventually we'll see, you know, self-driving systems, they will, they'll do that, you know, as we get more computers,

more computer capacity as we learn how to do reinforcement learning in this domain, it will come to that. And so, you know, long-term, I think that's very likely. I mean, there are things that do the same thing as reinforcement learning. They're a little bit different, but one of these techniques, so it can self-play so that it can learn to be better than humans can ever learn to be, like that'll become part of the formula, but we're not there yet, right? I mean, there's still the low hanging fruit of being as good as a really good human driver. Yeah. Because if FSD was...

was equivalent to a really good human driver, but it never got tired. It never got distracted. It could see in all directions at the same time. That's a great driver. Like that superhuman-

by itself. It's decision-making doesn't necessarily have to be superhuman, but the combination of its perception and its defatigability, right? Indefatigability, it never gets tired. The combination of those things on top of good human decision-making, like I kind of feel like as a near-term goal, that's a great goal and that will get us tremendous utility. And you don't necessarily need more than human mimicking in order to do that. Okay. So on human mimicry, so

when Tesla's training and feeding their neural nets all this, you know, video of good drivers driving, how is the training working? So for example, is it, you're in a situation and it's a, you know,

Is it telling the neural network to predict what the human will do next and then show what the human does next and it corrects its weight? Is it something like that? Basically auto-training itself off of all of the videos, right? Yes. Okay. Like, I would guess they're probably...

So you take the human drive and you break it down into some variables, right? Like positioning, timing decisions for lane stuff and whatnot to create kind of a scoring system for, for,

how close are you to what the human did? Is it, you know, do we just look at all the controls and we take, you know, least mean squares error of the car versus that. You could do that. Maybe that works great. Maybe you go take a step farther back and you say, what was the line the human took through the traffic? And you know, what's the distance at each point you are off that line. Maybe that's the score or the speed.

There might be other elements of the score, like how quickly did you respond when the light changed, when the pedestrian moved? I mean, you could layer other things on top of it. You would start with the simplest thing, this mean squares error, right? And then if that didn't work or if you could add layer other things onto it to make the scoring, 'cause having a good scoring system is an important part. And this is all comes down to sample efficiency too.

you know, does my supercomputer run for a week to get me a good result? Does it run for a month? Does it run for a year? That's sample efficiency. Like how fast do I get to the result I want? The system itself will constrain how good it can get, but a good scoring system can get you there faster. It's economics. And so they'll definitely, there will be a lot of tricks in that scoring function that they have. We call it the loss function. And, uh,

So it would be really like, as a practitioner, I would be really curious to know what they're doing. But they do have one. They've come up with a scoring system. And it's almost certain that, essentially, they're taking what the human did. They have this sort of ideal point. They have an ideal score that you could get. And the system's score is just like, how close are you to what...

what our expert human did in this situation. Yeah. I mean, what's exciting about kind of being able to train like that is it reminds you of, you know, the whole transformer model with chat GPT. It's like you could give it so much data and it just, you know, takes all that data and by predicting the next token and then,

and then rearranges its own weights, it could just get better and better. And it's just, it's so scalable in a sense. You just feed it more data

more parameters, it just gets better and better because the training is just such an efficient usage of it. - That's actually a really interesting metaphor is, if a text model is learning to predict the next token, right? - Exactly. - Okay, well, these tokens, they're all written by humans, right? Like all this stuff before there were language models, like all the texts was written by human beings, right? We didn't have automated systems that generated any meaningful amount of the content.

So in a sense, it's just predicting what the human, the neck, what was the next thing the human put it? It's a kind of human mimicry, right? But when we look at, if you look at what like chat GPT can do relative to what a human can do, well, there are things that can't do that a human can do still. There's forms of reasoning and whatnot that it's still poor at.

But there are a lot of ways it's not only superhuman, like its ability to remember stuff is just like, it's vastly superhuman. Like you can talk to it about any of 10,000 topics in a hundred different languages, you know, it's like deeply superhuman in certain respects already. And so you could expect the same thing from the mimicking. Like if they're learning, predict the next steering wheel movement, predict the next brake pedal. Like in a sense, you get a similar kind of thing.

It's not necessarily constrained to just what a human could do because its capacities are different. It's going to learn it a different way. Like it's not a human being. Like human, one of the things about human beings is we have these really terrible working memories, right? Which is one of the reasons that our, that our like thought process is broken into these two layers, this unconscious thing and the conscious thing. Because consciously we can only keep track of like, you know, a few things at one time, right?

Well, you know, FSD doesn't have that problem. Like when a human being comes to an intersection, one of the challenges that you have is, you know, there's three pedestrians and two cars crossing and you're turning your head to look at them. You're paying attention to a couple. Well, FSD is simultaneously looking at a hundred pedestrians, all the street signs, all the cars in all directions simultaneously. Like it doesn't have attention the same way we do. So,

So even given the same set of ideal, the same target to get to, because it's getting there in a different way,

there's lots of potential for many of its behaviors to be greatly superhuman, even just in the planning sense. You know, I mean, the human being doesn't end up being the limit in the same way that the human being isn't the limit, like for Chad GPT, like the upper bound of how many languages Chad GPT can learn is much higher than the upper bound of what the number of languages a human can be fluent in. Right. And similarly, you know, like what can you tell me about, you know, the Wikipedia page on Winston Churchill? Like how many humans are going to know that? Right. And Wikipedia does try it. It can tell you. Yeah.

Yeah, that's interesting because, yeah, its ability to retain, you know, like so much more information. I mean, for example, chat GPT and also if you apply that to FSD through the training, like if a human was to be trained like as a transformer model for like LLM, you know, we wouldn't retain much of it, you know? It would be like, I mean, it would just be like, it's like, for example, the amount of

we get from, I guess, you know, just looking at video clips ourselves is limited. We're just looking at one aspect, maybe like how the person's turning a little bit about the environment, but a neural net is picking up a lot more subtle things that maybe we're not completely conscious or aware of and retaining that as well. So, I mean, it's,

I think two things. One is it just seems so scalable. You just feed it a thousand more times data across a variety of scenarios and it just gets that much better. Yeah, that's the strength. It's so, it's, the potential is just crazy, right? The second thing is, is this kind of crossover of abilities where it does stuff that maybe you didn't expect it to do because it's

It's learning from other scenarios and other situations and kind of generalizing in new scenarios, right? And so it's kind of like these emergent behaviors or abilities that you weren't planning or you didn't train for originally. And I think as you feed it more and more data, right?

we're probably gonna see more and more of that kind of, people will feel like it's superhuman in some ways. It's just better driver than me. And that is gonna come out more and more, right? As the data increases. - Yeah, we're gonna see a lot of those. I mean, I already have lots of, I mean, I've only been driving for a few, I mean, I got this on V11 sometimes, so I'm getting a lot more on V12 where,

you come to an intersection and then it gets a behavior. Well, like I told somebody the other day that on V11, early V11 for sure, if I intervened,

you know, I want to say like 80% of the time, the intervention was the right thing to do, right? And every once in a while you'd intervene, then you realize that the car was right. You know, oh no, I needed to take that turn instead of this, or I intervened because I thought it was slowing for, pointlessly for the stop sign and I didn't see the pedestrian or I didn't see the speed bump and, you know, or whatever the deal was.

I don't want to say I'm V12, I'm getting much more into the zone where it's like 80-20 the other way. You know, like 80% of the time I intervene, it was my mistake. The car saw something, it was responding to something that I should have, that ideally I would have seen, I would have responded to, but I didn't, right? And, you know, so it's exposing more of my fail. When we disagree, it's often exposing my failings more than the system's failings, you know, as that goes. And I think that's, you know, we're,

On the trajectory we're on right now, we could very quickly be getting into a world where, you know, the odds are if you, like, you should still intervene, you know, it's because the system is not perfect, but...

But, you know, 99% of the time you intervene, the car was right and it's you that's wrong. And, you know, so that begs the question of like, at what point do we not let the human drive? Right? Because like, is it 99 or 99.9? Like how much more right does the car need to be? And of course, that's going to depend on the weighting of errors. You know, like if the 99 are trivial and the one is extreme, you know, but I think, you know,

I think there's a good chance we're going to be there this year at the current rate of progress. And that's going to be really exciting. I think what can trick people is you think V12 is like the next iteration of V11, right? So you got from V11 to V12, you're like, oh, big jump, right? And so you're thinking, okay, maybe in another year, we'll have another big jump, V13 or something, I'll take another year. And then you project that. But

I think the tricky part is V12 was largely done under the cover as this, you know, stealth project not released to the public or really shown much. And it's really been like probably, you know, supposedly maybe December of what, 2020. It's building on a lot of infrastructure that was built for those other projects too. So it's a difficult comparison to make, but it's not unfair to say, yeah, this is a clean sheet for the,

At least the planning part. And if you look at the trajectory of how fast, let's say, the planning is improving and

And it's probably, you could probably map it out with the amount of data you're putting into it and map out the abilities. And Tesla has probably the ability to see into the next 12 months in terms of how much compute they have, how much data they could feed it and what type of abilities they're expecting from it, do you think? And I think that would surprise a lot of people. Wonderful.

thing we don't know what abilities like there are some things that are clearly have been like the parking lots have been left out at this point right the actually smart summon you know we're waiting on that um why are those held back are they held back because they had this part working well and it's 95 percent of what people use it for and we're going to push it out are they holding it back because there's something tricky about it and they want to get it right and so does that maybe indicate that there are some challenges that we don't know until it comes out uh

parking lots are really different than driving on surface streets. And so it wouldn't be surprising if there's some novel things, problems that occur in parking lots at high rates. I mean, there are benefits in parking lots. You move really slow. It doesn't matter if you stop. You know, it's not like driving on a surface street.

So I believe, you know, ultimately they're tractable and whatnot, but you know, we don't know that it's feature incomplete, I would say at this point. And so when it's feature complete, then it'll be easier to predict what the scaling, have you heard the expression, the bitter lesson? No. No, okay. So it's this white paper was written by a machine learning researcher named Richard Sutton. It's kind of famous inside the field, right? Richard Sutton, he basically wrote this thing. It was an observation about machine learning over the decades, right? And especially recently.

And it basically says that what the field has learned over and over again is that doing simple things that scale, that maybe don't work great today, but which will get better if you scale them up, always wins over doing exotic things that don't scale. And the temptation as a researcher is always to do is to get the best research, to get the best performance you can at whatever scale you're working at in your lab or whatnot, even as a small company, right?

But Sutton basically observed that betting on techniques that scale, like maybe it doesn't work great, but it predictably improves as you scale up. They always win. They just always, always, always win. And, you know, he called it the bitter lesson because...

you know, researchers keep learning that you build this beautiful thing, but because it doesn't scale, it falls to the wayside. Nobody ever uses it. And this simple thing that everybody's known since like 1920 or whatever, that just scales well is what just people keep doubling down on. So this is what models are teaching us today, right? And a thing that's,

the way that this relates back to FSD is that heuristics aren't scalable. You need humans to do it. The more heuristics you have, like if you have 300,000 lines of heuristics and they have a certain number of bugs, when you get to 600,000, you don't have twice as many bugs. You have like four times as many bugs 'cause the interactions get more complicated, right? So there's like poor scaling.

Like heuristics don't scale, heuristics written by people don't scale. But if I just take the same model and I give it more video and it gets better, now that scales. I just need more video and I need more compute time and it gets better. So the bitter lesson would tell us that V12 is way better fundamental approach to solving this problem than V11 was with its heuristic planner. And I think if you go all the way back, you know,

Andrei Karpathy was telling us in his earliest talks about this, that he foresaw what he was calling software two, the neural network just gradually taking over. And I think that's largely inspired by the same thing. The neural networks are going to take over because as you get scale, they just become the right way to do everything, right? And eventually there's nothing left for the heuristics. - Yeah, I was thinking about that Karpathy quote, and I think,

the intention was for at least the planning stack to be more gradual, 2.0 to eat away. And I think this was V12, the end-to-end approach was a bit more drastic than maybe what I originally intended.

But it's just, to me, it definitely makes sense. And if they can get it working, which they have, it's clearly, I think, going to be the- Well, there's another way to tell this story too. Well, people have asked me a few times, and I think the right way to think about this is that Tesla didn't suddenly stumble onto the idea of doing end-to-end. End-to-end is obvious. Right, sure. Like if you can make end-to-end work, the problem is it just doesn't work in really complex domains or-

rather it doesn't not work at all. You have to get to a certain scale before it starts working. Right. So I think the more realistic way of thinking about Tesla's relationship with end to end is that they had, they were trying it. It didn't work. They try it. It didn't work. You know, they would, you know, so,

It may be that the reason that V11 got to 300,000 lines is they expected end-to-end to start working a year ago, two years ago. They didn't think they were ever going to get to 300,000 lines, but it took longer to get the neural network to do the planning part. Yeah.

So essentially this is like the dam breaking, you know, when they finally find the technique that scales that they can do that kind of stuff, the dam breaks quickly because it quickly overwhelms the downsides to having 300,000 lines of heuristics that are guiding your planning. Yeah. I mean, did you see that tweet by Ashok, like something about the beginning of the end or something? Do you think it's related to FSD at all?

It's completely speculative. I think it is. But yeah, I mean, what does he comment on that's not FSD? It's mysterious. But, you know, the beginning of the end of people driving cars is kind of the way I look at it. I kind of wonder if like with the internal metrics and, you know, things that Tesla internally is tracking with V12 and, you know,

they're on their next version to V12.4 or whatever, and they're just seeing the improvements and they know what's coming down the line and how much compute and data and everything going forward. They just must be really excited right now, I think, just to see the level of improvement, especially with the latest FSA. 12.3, it was still...

I mean, you could tell from the firmware number, right? And generally what we saw through V11, right, was that the things that were getting in the customer's hands were three, four, five, sometimes six months old, right? So Tesla's already looking at the one we're gonna get in six months. So, I mean, they may,

Why does it take them six months? Well, they do all this testing and validation, there's tweaking, there's all these waves of rolling it out to be super safe and whatnot. So the pipe is deep between when they first... But they're gonna know the potential the first couple of weeks after they do those initial builds. So...

they already mostly know what we're gonna have in six months. And so they don't really have to guess, right? We just, it takes six months for it to get through the safety pipe and everything and get to us. Yeah. So-

With V11, I remember very half fondly, half not fondly. When you're at some intersection or something, you're stopped or moving slowly, you get this jerky steering wheel thing. It's going left, going straight, going left, going straight. And when I think about that, I'm like, that's going to be something I think all people

pre V12 beta testers will be having their joint experience. It's like jerky steering wheel. - Have you seen the V, so V12 has this thing where occasionally you'll be stopped in an intersection and it starts, you're totally stopped. - Yeah. - Not moving slowly. You're stopped, you're behind another car or something like that. And it just starts turning. - Yeah, it does that. Yeah, I thought it was just me. I guess it does it a little bit. - No, I've seen it two or three times. The first couple of times I saw it, I'm like,

What are you doing? And it's just slowly turning the steering wheel. I'm like, this will be interesting. The light changes and it goes... And it whips back and straight and then it drives. It's like it's bored or something and playing with the steering wheel. That's funny. Okay, so moving from V11 to V12, V11, it just...

I interpreted the steering wheel thing at the intersection. It's like, it's debating between two options, right? It's like, oh, 60% this way, it's 40%, but then it changes to 60% this way. And then, you know, it goes back and forth. Like literally as it changes percentage of what it should do, it's changing the steering wheel. But why in V12, we don't see that behavior, you know? Why is it just confidently just going in one direction without having that? Human...

Okay, when you have heuristics, you come to an intersection, your options are, you got a few options. Straight, right, left, right, go, don't go, they're binary. So the neural network that the output from the neural network is,

You're at an intersection and you can go right, you can go straight, or you can turn right, right? There is no 45 degree option, right? Okay, so the neural network, in this case, it's functioning as a classifier. You choose this or choose that.

But neural networks to work, they have to be continuous. So there has to exist in the system a very low probability option between the two, right? This is, you know, you have a sigmoid, right? The important parts of the zero and the one, but it has to be continuous because if it's not continuous, you can't, it's not differentiable and you can't back propagate. So this is a fundamental thing neural networks have to have, has to be continuous. Okay.

So the system has a set of criteria where it's going to go forward, and it has a set of criteria where it's going to go right. And you're trying, you know, and you minimize, you know, this is a, there's a certain probability for this and a certain probability for this, and they add to almost one, and there's a tiny little bit of remaining probability in the stuff in between. And it's intended to just connect the two states, so the neural network, so it's differentiable, right?

okay, this is actually kind of a weakness in a system where you have two states, right? Because imagine that you get to a set of criteria that every once in a while you're going to get to a situation where the system is balanced right on that 45 point, right? And as the shadows shift and the cars move around, the contextual cues just shift a little bit,

the network is going to, 'cause that's a choice and this is a choice. And the system before it was built, so the steering wheel, it reflected the choice that was upcoming for the intersection, right? So something is flickering back and forth. And yeah, as you say,

It's oscillating, it's a very tiny little oscillation, but you have to have this huge disparity between going right and going left because going 45 is never an option. Like you have to make that super, super small. So if you're right on the boundary, it'll hop back and forth between two options that to a human being seem very disparate, right? The thing is, if you're mimicking a human being,

You no longer have, you know, your goal is to just get as close to the human being as you have. You don't have this classifier thing where you have these A/B options.

So the system is not going to end up in states where it's making, like, it has the option, like a human being comes to the intersection. If they're going straight, their wheel might be here, might be here, might be here, right? That one, it might be here, might be here. They're fairly broad and continuous. It's not perfectly straight or here with like a no man's land in between. Like humans will come to an intersection. They can turn the wheel 45 degrees, let it sit there. And then when the light changes, turn it straight and keep going. That's not...

That's not a fail for the network. It's an option. So it never gets in these situations where it's oscillating between two states that the design of the neural network has to keep highly discreet for safety sake, right? Because it's just mimicking a human being. I don't know if I'm explaining that very well. But it is naturally going to fall out of the fact that

that they have a target that they're tracking. And the goal is to be close. You don't have to be right on. Being pretty close is good enough. Would you say because, let's say with FSD and TEND, the neural nets are, because they're mimicking, they just have so many points to mimic along the path. And that is just like, whereas V11, it's...

deciding between left and right, or let's say straight and right, it's oscillating. And these are two big decisions to make. And once you're on them, it's going that certain path. So that's the big decision versus- - Let's put it this way, right? Okay, you're writing digits down. There's a one, a two, a three. There's nothing partway between the one and the two. Like it should either be a one or a two. There's no in-between option. That's okay.

But as a human, you can have a sloppy one or two. I mean, if what you're doing is mimicking the human, the target, the success target is broad. It's not precisely one or precisely two with a no man's land. There's a whole bunch of different ways you could write a one, a whole bunch of ways you could write a two. There's not really a space in between. But

But the network has the leeway to have slightly different ones and still be right. Whereas, you know, in the classifier way, you don't have that. You've got these, a very small number of extremely distinct decision points. And so if you're on the boundary between them, you're going to see oscillation. Interesting. All right. So, yeah.

- Moving forward to Robotaxi, August 8 reveal. What are your expectations on what Tesla? Like why do you think they're revealing it now? You know, like, yeah. Any thoughts or any ideas on this? - It seemed kind of forced after that Reuters article. Maybe that was a coincidence. I don't know.

You know, I've seen a couple of theories. My guess is that around August, that rough timeframe, there is a good time for them to be introducing this view. So there's kind of, there's the software angle of interpreting it, and there's a hardware angle. Like, you know, it's about time for them to get the hardware out. Why would they need to get the hardware out? Why wouldn't they wait for a reveal like they did with like the Y or the 3 where they waited longer?

until they were ready to start taking, I mean, the III, it was early, but with the Y, they didn't want Osborne and the III, so they waited and they played it down until they got there. And up until now, it seems like, you know, with the compact car that they'd been doing a similar kind of thing. So it was not the Osborne and the III or the Y, presumably.

If they introduce it in August, they've either greatly accelerated the timeline or they're doing an introduction well ahead of the actual release of the vehicle, which kind of makes sense for RoboTaxi because people aren't expecting it. Nobody's not gonna buy a Model 3 because they're waiting for the RoboTaxi, right? I mean, at least that's unlikely to be a thing, whereas they might wait to buy a Model 3. So maybe it's less of an issue

And maybe they want to get prototypes out on the road to start testing and gathering data. Like that's a theory I've seen. Yeah. Seems like not bad. So that's one. The other possibility is that

they think the software is getting really close and they want to demo the software on a platform to start sort of preparing the world and regulators for the fact that this is a real thing. It's really going to happen. And here's our status. I mean, that's obviously, it's good for the company, gathers attention,

It might get investors to take it more realistically. It might get regulators to start taking it more realistically. Like this isn't pie in the sky and this isn't us just dreaming. And so don't put us at the bottom of your stack of work. Like put it at the top because this is, we really need to start working on the issue of like how,

what are you gonna require before you allow us to operate these things? So like those all kind of make sense. - Yeah, yeah. I wonder if the robotaxi will be just Tesla owned, right? For certain urban city environments in the beginning at least. I don't see like why would they sell it to people initially when they have a lot of capacity or needs to fill this vacuum of ride hailing?

the discrepancy of how much physical, like human ride-hailing costs and robotaxi will cost is such a big gap. Like Tesla could easily use, you know, the first few years of production. Maybe 3 million vehicles they could. It's a really good question. And, you know, this is,

This is something that it's been debated a long time. I have a 10 year standing bet with another guy about whether Tesla will stop making, selling cars to private parties when they start making robo taxis. You know, you can see it going like,

I've tried to work this a couple of ways. I can see advantages either. I mean, the robo taxi where a wholly owned fleet thing, it's upside is a simple model, like predicting and understanding it are kind of straightforward, right? I don't know. Like I would argue it's not the best model to like to plan kind of long-term. I also feel like when I think about the whole sweep of this thing, like I've said before that,

I feel like the robo-taxi is going to go through this period of time where a relatively small number of robo-taxis are really profitable, but as the fleet continues to grow and it continues to take more miles, it becomes commoditized. Now, the degree to which it becomes commoditized, ultimately, it's still a profitable business. It's a much bigger business, so the total profit being generated is bigger, but the gross margins are a lot lower as you get out into building out the fleet.

And that might be a relatively, like when I look at the numbers, I could see that transition from being, I could see they're super profitable, you know, because you're just taking ride hail business and there's a lot of demand and you, like you basically can't build enough cars to fill the demand. Like that could last a couple of years, easy. Like, will it last five years? Maybe, I don't know. That seems long to me. And it's not going to abruptly end, you know, it'll be this long taper into this long-term thing where, like I think,

You know, there's I mean, what is the end state like is it 20 years 50 years? You know, you get different windows at different things But I the other point I like to think about is the point where it's commoditized like the low-hanging fruit of vehicle miles traveled for you know, like you your your robo taxi it costs 40 50 cents a mile it shows up in three minutes it's super convenient you can rent a two-seater four-seater minivan and

You know, like there's a lot of variety, a lot of accessibility, and it's less expensive than owning your own vehicle. And half of all miles have moved over to that. And so why do I say half and not 100% or some other number? Yeah.

One is human habits change slowly, so that people tend to not make transitions to new technologies as soon as they... The tail end of the adopter curve, and there are aspects of the robo taxi adopter curve, like moving off of private vehicles on robo taxis, which I think for various reasons are likely to be more slow than say, moving to cell phones or smartphones off of Galapagos dumb phones was.

even though that took 10 years plus for us to make that transition. But it's an interesting point to talk about because that's a point we're definitely gonna get to. We're definitely gonna get, when we have 25 million robo taxis on the streets in the United States, they'll be supplying like half of vehicle mile travels. And I like that point because it's really hard to argue that we won't at least get to that point. So you can talk about that model. You can talk about the model when you have one, two, three million robo taxis.

And that sort of gives you an overall spectrum to sort of think about what's going on. Okay. In state two, which I think probably comes five years after state one.

Maybe it's a bit longer. Maybe it's 10 years. I don't think it's 10 years, but maybe it is. Most of the car market is private vehicles. It's not robo-taxis because a smaller number of vehicles saturate the robo-taxi market sooner. And, you know, if you still have a lot of vehicle miles travel, I mean, because robo-taxis drive five times as many miles as privately owned vehicles do, say, five times.

Um, that means it takes five times as many private vehicles to satisfy the same demand that, that an equivalent number of robo taxis could do. So you, so you, after you get out of this profitable zone where you know you have a small number of robo taxis because you're production constrained or jurisdiction constrained, regulation constrained, uh, after you get out of that zone, uh,

The way I see this thing is Tesla is gonna have this huge demand for robo taxis over some window of time, and that that's gonna taper and most of their business in this longer term is private vehicles again. So how do you manage that as a company? Like you don't wanna leave anything on the table during the gold rush when the robo taxis are making a ton of money and you're rapidly scaling out the thing.

But you also don't want to gut your long-term prospects of continuing to be, you know, a viable manufacturer. Like you can't walk away from the car business for five years and feel like you're just going to pick it up. You know, you got a supercharger network to keep going. You got to keep your service centers going. You have sales people. You have like all these channels, your manufacturing design goals, all that kind of stuff. They're different between manufacturers.

between the two. Robotaxi I think will be crazy profitable through some window of time. I think it'll be decently profitable and huge long-term, right? So that's the arc I see for those things. But I'm skeptical about the, there are people who feel like the economics of robotaxis are so good that they expect a wholesale abandonment of private ownership.

Is that possible? I think it's possible. I just don't like, that's not the base case to me of what's going on. And I think whatever strategy you,

Tesla uses has to be prepared for both eventualities. And the flexible strategy that guarantees your future is to keep a foot solidly in the retail camp all the way through this transition. Sure. In terms of timeline of when we can get unsupervised FSD or robo-taxi starting to roll out, I know there's going to be different municipalities, different cities. Do

It's going to be a phased rollout where you're going to start with certain places that are more permissible, and it'll be a smaller fleet to try out, kind of like what Waymo is doing, for example, in a few cities. And then you gradually roll it out further.

I mean, I imagine Tesla's route will be a lot faster because I think their rate of improvement is going to be tremendously fast, especially once they get to that point. But would you say timeline of expectations...

When do you think Tesla will first test out unsupervised robotaxis on the streets, kind of like Waymo in a city? Do you think it's second half of 2025? Test? Like if they're... I think they... I'd say more than 50 vehicles in a city. This year? Yeah.

With Tesla employees behind the wheels. I'm talking about like no one in the car and taking passengers. Kind of like what Waymo is doing with no one in the car. Yeah, that...

Like I wouldn't expect to see them doing it this year. It's going to, you know, we're seeing this sort of dislo, disconnect, discontinuous sort of rate of improvement. Yeah. And, you know, we don't know what the next six months holds. Tesla has a way better idea than we do. So it's conceivable that they're confident about this and they feel like they could try to do that this year. Like that seems super aggressive to me. The, and, you know, they're going to just as,

Waymo, Uber did. They're gonna go through this long period where they have employees sitting in the cars, trying not to touch the wheel anymore than they have. And they're racking up miles and they're getting a sense of how well the thing works. And I don't think that that's gonna be 10 cars. I think that's gonna be 500 cars kind of thing, various places, maybe various countries. And that's gonna be a way of gathering data, a way of providing feedback to that AP team about things that have to be done.

It's going to be a way for management to develop a strategy or get data to help inform a strategy for how they're going to proceed. And I would expect that to happen this year.

Now, you know, what fraction of the drives will be totally intervention-free? Will it be 99%? Will it be 99.99%? I mean, I think that's open to debate. And it very much depends on what... We haven't seen the slope of improvement for V12 yet. And so it's hard to have an informed...

So do you think these Tesla employees in, let's say, these robot taxis, are they going to be picking up passengers and driving them? So both Cruise and Waymo did a thing where they had company internal passengers for years, I think. And San Francisco Cruise had company internal for like two years or something. Waymo did it for quite a while. Yeah.

I think Waymo's doing that with employees in Austin now. That's like the first stage is your own employees get to use it. And then Waymo did a thing where they did, they had a long thing in Chandler, Arizona, where they had customers under NDA as they were working through. And it turned out to be long because obviously they weren't making progress as fast as they wanted to in terms of like polishing off all the things, or maybe they became more conservative. Okay.

They were in that window for a really long time. I don't see why that wouldn't be a good idea for Tesla to have internal people and then you have external people just like with the safety score thing. You have a population of people who are

who ride as passengers, maybe under NDA, maybe not under NDA. And, you know, you just, as your confidence builds and you have more vehicles on the road and whatnot, you gradually open up, you know, you let people see what you're doing, partly because you have to, because as your scale goes, it's too hard to keep things, you know, under wraps.

Like I, I would expect them to be starting that process this year and like how quickly they move through the various stages of like scaling up the vehicles, having more and more things. That's, you know, that's going to depend on the technology. I really do believe the tech is the fundamental thing. Yeah. I mean, that's interesting because I'm,

in the Bay Area and then like Austin, they could roll out, you know, Tesla or employee passengers, right? And employee driver. Palo Alto first. Yeah. Yeah. Palo Alto, Fremont, Austin, factories, whatever. That

That would be, I mean, they have plenty of people. Yeah, there's plenty of employees that they could do. I mean, how many people do they have that commute to their factories every day? Yeah, exactly. I mean, imagine having a fleet that just brings your line workers in, you know? Yeah. And so you run a shuttle service for line workers and use robo-taxis. Yeah. I wonder if the August 8th reveal will share some of those details, you know? Like, what do you think? Yeah.

cool if it did. I've been, like my guess is we won't get a ton of detail because they don't. You know, battery, occasionally we do get a lot of detail, right? I mean, the AI days have never given us a ton of detail and strategy. The battery day, it kind of did. So there's precedent for maybe getting more data. So if they think of RoboTaxi as more like, but the other thing is the

there's this variable about to what degree do people with Teslas get to participate in the Tesla network, right?

When Elon first announced the Tesla network was going to be a thing, the dedicated robo-taxi was pretty far away. So there's a lot of incentive to get. The other thing is when they initially did it, they didn't have the cash reserves they had now. The idea of building your own fleet based on your own pockets or borrowing money to do it, that would have been a lot scarier back when they were thinking about that. Now they could scale moderate-sized fleets more.

with their existing cash reserves and it could totally make sense. It could be a no-brainer of a thing. And so my guess is like the optimal strategy has probably shifted, but there are lots of people who expect to be able to participate in that and we're looking forward to it. And like I didn't go back to read what the contract language was when we bought these things, but that was part of the promise that FSD got sold on in the early days.

So I'm still expecting that to some extent they expect participation. Now, what are the terms? How many people get involved? You know, that's like, we don't know what that is. These are very, these are knobs they can turn to tune the strategy down.

I mentioned the thing, like, I feel like navigating this boom in robo-taxi sales and whatnot while maintaining your retail business is going to be challenging. And these are knobs that they can turn to try to keep the market orderly while all this stuff unfolds and, you know, you know.

gain as much benefit as they can, provide as much benefit as they can to their consumers while not taking on unnecessary risk. - Yeah. Is there anything about the RoboTaxi, like what's the biggest difference between the RoboTaxi and the $25,000 vehicle you think? - Like I would say self-closing doors. - You think that is that important? - When I think about,

When I found out they were doing a Robotaxi, I did a couple of clean sheet things like what would be a good Robotaxi, like if you were making it. And when I think about this stuff, like what doesn't a Model 3 have or a Model Y have that you want in Robotaxi?

There's a bunch of things that I think are non-obvious that have to do with fleet operation vehicles that they make sense. They're totally cost-effective in a robot, like a self-closing door, I feel like is a highly cost-effective thing to put in a $25,000 robo taxi, right? Just so that your passenger doesn't walk off and leave the door open, right? Or make sure the door's actually properly closed and be able to properly close it.

Um, but other stuff like, you know, being able to check if somebody left packages behind in the car, making it so it's easy to clean so that, you know, one of the things, uh, taxi cabs, one of the first things to wear out is the backseat, you know, cause people get in and get out. So you want to be able to, you know, easily swap out that kind of stuff. Um, yeah.

I liked the idea of doing a Cybertruck style, kind of really unusual looking, because for one thing, it's an advertisement. Oh, there's one of those. In the same way the Cybertruck's an advertisement, there's one of those Tesla Robotans, right? But also, being dent-free, not needing as much cleaning or care.

So there's that. Obviously there's sensor suite stuff, there's spending more money on the sensor suite, spending more money on the computer, like all that stuff.

is more justifiable in a vehicle that's using the sensors and using the computer like 24/7. So like the economic trade-offs of that kind of stuff, when it's a gimme that you're putting on cars and 90% of people aren't using it, like that's harder to justify than in a robo taxi where you know they're gonna use it. - Right. - So sure. - Does it need to be four doors, like four seater? - That's a really interesting question. So like I went back and forth on this a couple of years back when I was looking at the thing. And my,

So two-seaters attractive, like the fundamental economics of the two-seater are pretty attractive, but you do have the thing where like, like it's true that most rides are one or two people. Right. But like 10% of the rides are more than two people. So of course, if you have two-seaters, they can take two vehicles. But like, if you have two parents traveling with children, like, are they going to be happy with the two vehicle kind of thing? I, uh,

And a lot of people, if your drive is more than short and you're traveling with your family, you want to travel together so you can talk kind of thing. I mean, there's, I feel like from an operational flexibility standpoint, like if you're going to build one vehicle, that the four-seater is the thing that makes the most sense because the overhead today, the way our streets are configured is,

Right. I mean, there's no advantage to having a really tiny vehicle today. You're going to take a whole lane anyway. You're not lightening up congestion or whatnot. You're just reducing the cost of the vehicle. I feel like the four-door vehicle, if you're just going to build one vehicle and you're not going to make another one for two or three years, and this is going to be the first one and you're going to start scaling your robotaxi, I feel like there's a lot of...

argument to be made for doing a four seater because it lets you cover like 99.9% of the market or something like that, as opposed to 90% of the market. Interesting. Um, so I was thinking about this, about the whole idea of Tesla becoming an AI company versus let's say auto manufacturer. And I was thinking, um,

It seems like the purely auto manufacturer business is just, I've never thought, it's just very cyclical, low margins typically. It's like the software component is the intriguing part, adding extra value. I mean, as an investor. Yeah, yeah. Or, you know, rather than the human side

you know, invested amount of time and energy and focus drive. You're pulling that out off, loading it onto a AI chip. Like that's interesting. That can drive margins, et cetera. But it seems like this transition from Tesla as an auto manufacturer to AI company, like it's been happening over time. And I would argue that Tesla's like focus and priority best engineers all been on this, you know, kind of AI, you know,

trajectory. But just like, for example, OpenAI, before ChatGPT, sure they were an AI company, but ChatGPT made them

kind of like a real AI company, like AI company that people use their products, you know, like a company that's immensely useful, right? For people as an AI company, rather than just a research lab, right? Before that, in a sense. And I think in some ways, when I drive at V12, I'm like, oh, it feels like Tesla is getting close to this point where FSD is going to be really, really useful, right? It's like unsupervised FSD is going to

you know, transform people's, you know, driving transport experiences. And it'll get to this point where Tesla's AI products are finally in the hands of lots of people in a very useful way. And that to me marks kind of like this big transformation in Tesla's history. When we look back,

20 years from now, you know, we'll say, oh, that was kind of the moment where everything crossed over. It's not so much that, again, like OpenAI wasn't an AI company. They were more like a research lab. But then when they came out with their product, it really transformed. So in a sense, I look at Tesla up until now, the AI part of Tesla, it still feels like more like,

to this point, you know, where the real products haven't come out, you know, for millions to use it. So it just seems like we're getting closer and closer to this pivotal point in Tesla's kind of history. - I wonder if people, if their impression will change. Like we don't think of Apple as a software company, even though software and the ecosystem that they build and the stores and all that kind of stuff, arguably,

add more value than building laptops, the phones, that kind of stuff, right? I mean, not just the software, but the ecosystem the software enables, you know, both the cloud stuff and the software that goes on the...

But we still think of Apple as a phone company, as a laptop company and whatnot. Like the software becomes like an ingredient in the hardware, but the hardware is the thing that you see. So, you know, I mean, arguably Tesla's already, you know, the software content of the cars is super high and it has all these network features and stuff. And yet-

The world, even Tesla fans, they don't really see them as qualitatively different than other cars. It's a different kind of car and we still view it as a car. So even though the economic reality of the company and the operational reality of the company may shift away from being more about the car and more about like the ecosystem and the services and that kind of stuff. I don't know that, like I wonder if they will change. And by extension, like will investors change

who, you know, mostly they're ordinary people. They're not experts. Yeah. Right. Will their perception of the company shift? It might. I think a big part of it is going to come down to like a,

You know, we don't think of Amazon as a grocery store. We still think of it as an internet store, right? Because we went through this thing, you know, when the internet companies all took off back in the 20s and Amazon just became an internet. And, you know, Amazon is probably way more hardware than it is internet at this point. I mean, if you put aside the AWS part, which is a very important part of the thing, you know, I mean, it's delivery vans and warehouses, right?

and vast inventories of stuff. And there's this other component too, but we think of it as an internet company. That's true. So it'll be interesting to see

if and what is the trigger. If Tesla ever escapes the car maker thing. It's not clear to me that ever will. - I mean, I guess Apple being like Steve Jobs defined Apple as more of a device company. That's always been their thing. I mean, it's possible Tesla follows in that sense where they're a car, but also a robot, humanoid robot company. - Yeah, that's gonna be interesting. - And that type of devices in those ways.

But yeah, on Optimus, I wanted to ask you your latest thoughts on kind of where Tesla's at. Do you think they're going to start some limited production run in the next year or so? Or are we still kind of a little bit farther out than that? That's a good question. I mean...

- Okay, so I think they're still getting up the curve on software. Everybody's still getting up the curve on software. The thing is the humanoid robot software stack is evolving. It's like the LLM stack. It's just evolving crazy fast. The reason that I thought Tesla should make humanoid robots is because I see the software as happening.

Now, you can make it happen sooner, but the underlying tech that makes the software possible for doing those, it's just coming. And we can speed it up some, but it's coming for sure, right? And the ingredient that I thought was missing to make humanoid robots happen big, happen soon, was that you want to be able to build them at scale and you want to be able to build them cheap. Right.

You want good, cheap robots built at scale. And I didn't see the industrial infrastructure out there in the world or anybody preparing at the point that we first talked about this to make that infrastructure. And that's the long pole on doing this stuff. Like the software is going to happen. It's kind of, I mean, we're going to pull it in now that there's a lot of interest. It's going to happen sooner than it would have otherwise, but it was going to happen. These techniques were going to be developed, right? And

And so was the fact that there were no good robots out there going to be the limiter, the reason why it didn't get adopted in, you know, in 2028, as opposed to 2038 became the big year that it goes. So, you know, like when I look at through this lens and my sense is that there are people in Tesla who look at it, like, you know, they very clearly understand the challenge of industrial stuff at scale.

And they understand that that's a problem that needs to be properly addressed in order for this product to really fulfill its potential and that there's a big first mover advantage in getting there first. Not just a first mover advantage, but a sustainable advantage, right? Because you get there first and then you don't stop. You keep developing. You always have the best product, right? So you command the best margin and you also have the platform that lets your software people move forward the most quickly, right?

It lets you get to scale and keep the scale because building things at scale, a lot of building things at scale is about harnessing the advantages of scale and maintaining that advantage means you want to maintain the lion's share of the market because that gets you the scale to let you harness

hold that position and maintain it and maintain the margins associated with that position. So like I look at this and I imagine that, you know, if Tesla sees it the same way and a lot of the stuff that they say suggests to me that they do see it this way, that their focus right now is on like getting the hardware down, right?

Building stuff and getting it out there. Like if it helps them get the manufacturing lineup, if it helps them understand the product better so they can build a better product, so that they can build the product that builds the product better. Like I think they'll do that. But it's a good question. We've just seen so little of Optimus in action. We've seen so little of it in terms of detailed development.

information about the way that it's built, that understanding where they are in the process of industrializing it is tough. But my sense has been for some years now and still is that there's a lot of really fundamental improvement in the tech that's available, that you can keep turning that crank

And, you know, every single year, the product that you can make is going to be a lot better. So to some extent, timing the scale up of the manufacturing with when the software is really useful, like that's the thing that makes sense to me. Because if you build the robot a year early,

You're not going to have as good a robot as you will a year later, right? The longer you delay scaling, the better product design and stuff, the more you're going to know, the better the core technology. Tesla's came out, and for years, the motors got better and better. Sometimes the motor in your car would get better. They'd do firmware updates because they'd figured out something new or they could change the margins. Early Teslas, if you had an early Model 3, the battery capacities would change because they'd change the software. Yeah.

But there's stuff that you can't change without actually changing the hardware out too, right? And we did see the motors that go in the cars today are much better than the motors that went in like two years ago, five years ago and so on, 'cause they're still learning that kind of stuff.

Yeah. Yeah. Whereas like I sort of expect them to not scale until the software is fairly mature, but I expect the focus to be on scaling the industrial capacity. I see. So, I mean, Elon has said like oftentimes takes three generations of a product before it gets really good. Oh, yeah.

They're on Gen 2, supposedly. So maybe one more generation. Well, was the first one a product? I would argue that Bumblebee and Bubble Seed weren't really. I would argue that the first Optimus wasn't really a product. I mean, they're going to make these test mules, essentially, where they're figuring stuff out. But I think they're calling it Gen 2, right? Yeah, sure. But I think third generation product is third generation product of customers. I see. That's true. That could be.

I wonder if the internal thing though is developing three really, you know, prototypes and then, you know, starting your first product after that. So maybe we see one more Gen 3 prototype and then we start to see some type of initial production. A good question is when...

When do you get to the point where having more robots accelerates your development? Because like if they're, I mean, this is a thing with the fleet for FSD cars, right? Once they got to a point where they had data ingestion engine and the data from the fleet was a major limiter on how fast it can improve all, having a bigger fleet is a really big advantage. I would guess that Optimus isn't at the point right now. And there's this interesting thing about gathering data, like,

You know, having Optimus the platform itself gathering data to the extent that you can do it efficiently is pretty useful. But like having humans put on, you know, sensor stuff and go around and do stuff, that's actually a not unreasonable mechanism. In certain ways, it's better than having...

Like for instance, if you want to do human mimicking, well, there's kind of two ways you can do it. You have a human driving an optimus, right? Or you can have, you know, an optimist mimic a human. Those both have different strengths and weaknesses, but they're both things you want to do. And they both involve a human in the loop, right? So, you know, if you've got 50 operators, there's no point in having a thousand optimize, right? Because you can only use 50 of them at a time. If you get to a point where the software can start doing stuff on its own, then it makes sense to start scaling up. I would guess- What do you mean when the,

the point that software does it on its own. Say for instance, that you're working in it, that you have some basic tasks that you can do in a factory. Yeah. That's, you know, that makes sense to do, you know, like you, it's economically useful or you have some space to do it and you can set some optimize aside in order to, um,

in order to work on this thing, well then they can kind of autonomously gather data by repetitively doing a task with some variation. And we see other robot, like Google had a robot lab that had like hundreds of robot arms that were just basically doing repetitive tasks over and over and varying them to gather data. So you can do that kind of thing too. I don't know if it's compatible with the way that they're trying to do the stack in Optimus right now, but if it is, then it would make sense to like have a thousand of them and find something for them to do.

But that's a question. Like, you know, there's going to... I think there's going to be the scale up where they build a bunch of them and they use them internally prior to any customers getting them and them going externally. So it's an interesting question to ask when they do that. And when they do that depends on the development path that they're doing and what their strategic path that they see is. I still...

don't see, like I still see FSD as a significantly more near-term product than I see Optimus. So how does Tesla, let's say, scale human mimicry of humanoid robots, like with Optimus? So let's say they need quantity of data and quality of data. So do you have, I mean, you're talking about the human can control the robot, but then would it be better just to

Have a suit or a bunch of sensors on the key parts so that you know how the human is doing it. When you have a human and we've seen we know Tesla does this already. They've already demonstrated, you know, a guy in a VR rig who has some hand controls that he's using to basically control.

"do upper body men stuff, rearranging stuff on a table." We saw the folding a shirt thing. That was how that was being done. In fact, the folding a shirt video might've been somebody in the process of data capture. I'm folding a shirt with Optimus. So like you put on your VR rig, you take direct control of an Optimus body, and then you use it to fold. This is a thing that's done. This is one of the ways that is known to be effective and fairly sample efficient way of gathering data.

- Measuring it straight off of a human, you can also do that and people do do that stuff. It has some strength, like because the exact operating constraints for human are different and you just have the hand targets and stuff, you don't have all the intermediate joint positions and that kind of stuff. So you get less data, but the data gathering rig is a lot less expensive. And so you can give it to a whole bunch of people and they can take it out in the real world. They can go down the street and pick up garbage.

They can fold cardboard boxes in a UPS store because you can just take it someplace. So there are constraints with trying to do it with Optimus to use the body directly, but then there are advantages also. And the trade-off between the two is another one of those empirical things that I was mentioning. There are gonna be some trade-offs. What's the right mix of which things that you do? And then there's reinforcement lane. I mean, reinforcement lane,

learning in simulation for robots, that's known to work well. And in fact, using reinforcement learning to train robots to mimic humans is like one of the primary modalities for doing it.

Robots have many more operating degrees of freedom than cars do. So a robot can mimic a human action in many different ways, some of which are much more preferable to others. Like if the goal is just like move your hand through this arc to pick this thing up, you know, what is your upper body doing? What is your head doing? These are all free variables.

to train a robot, you know, in a sample efficient way, you'd like to constrain all of those to something reasonable. So having a human control the entire body so that you gather all, you know, the opinions of the human as to what those things should be doing. And they make those targets too, even though they're not minimally necessary to the, maybe the target is to move the bottle over here or pour a drink or something like that, right? So, yeah.

I think, you know, most in, you know, the reality is that all of these processes get used in various combinations because they all kind of bring something to table. And, you know, as we were talking about, you've got, you know, pre-training, instruct training, and then RLHF on a, and other things now that get used in training large language and all. Like there's many other stages that we're not mentioning.

It's not a matter of like, which is the best one? It's like you use all of them to the degree that they contribute to a rapid, reliable solution. - So do you think, I mean, it seems to get for Tesla to get a product out to people, they need to scale up the data.

the, or what, if it's human mimicry or whatever, unless you're doing, I guess, some specialized, you know, you know, factory tasks, but even that, if it's so specialized, why do you need a humanoid robot has to be somewhat, you know, you know, like there needs to be a need for right. More generalized. I mean, it, a lot of, you know, great progress is being made in robots, robotics right now without a huge fleet of robots. Uh,

There are, you know, scale that we were talking about before, like scale just wins if you can scale. But, you know, for scale to win with Optimus, you have to have, you know, a wide variety of real world tasks that you're deploying Optimus into where it's operating either without superhuman supervision or where humans are supervising it and they would have been doing the task anyway before. So, yeah.

Because the cost of like paying 10,000 people to stand around and operate Optimus, you know, eight hours a day, like it's super burdensome, right? And more importantly, one of the advantages of operating in the real world is you want to take advantage of the complexity and entropy of the world. Like if you got 10,000 OptiMite and they're all standing in white cubicles that are basically the same and they're just moving the same blocks around.

And part of the benefit of operating in the world is the long tail of context and properties. So if you give OptiMai to 10,000 people and you tell them, hey, go use this on your farm, hey, use this at – try to use it as a carpenter or whatnot. And you can find people who are enthusiastic about investing their own time in doing it, maybe finding something useful for it to do.

Now what you're doing is you're harnessing the variety that all of these different people thinking about this stuff and all the different settings and environments. That's where the data really starts. Having a ton of OptiMai in a factory all in approximately the same context, doing approximately the same thing, it's not nearly as valuable as having lots of different ones because that's what the cars get. Sure.

Each of the cars is serving a different master. It's doing a different set of tasks on a different set of roads at different times of day and different weather and whatnot.

So the data it gathers is bringing all of that variety in, and that variety is really useful to training these things. Tesla, I mean, that reminds me, Tesla recently had a job listing for like 10 or so prototype vehicle drivers. And they're like in these different cities all over the US. Like, why do you think they need that? I assumed it was because they were checking out V12 and maybe gathering training. It's, you know, having some...

I mean, you get two things out of having a driver in Adelaide, Australia, right? One of them is you get to see, like, is there anything weird about Adelaide, Australia that breaks what we're doing that we should be paying attention to? And you get to gather data from Adelaide, Australia. Like, as I was saying, variety, right? And different countries just have things that are different. And different driving cultures. I mean, when Brad Ferguson went to New York, he noticed that, you know, FSD was driving like a New Yorker.

You know, humans change their driving behavior depending on context, right? Some of that is cultural.

You drive in Brazil, you drive in Italy, and then you go drive in like England or Germany. And the driving cultures are really, the way people behave are different, right? So just like being in those environments and gathering data on the driving culture in that environment, that can also be useful. I mean, why can't they just use their own 100 rated, 100 safety score drivers in those different cities instead? Why do they need to hire separate drivers, you think? Yeah, I wouldn't say they necessarily...

say you wanna run a stack that you're not confident is safe yet and you wanna give it control of the vehicle. So the first thing I said is like, is there something in Adelaide that breaks our current stack? Well, if you imagine that you wanted to go test V12, but you were like four months away from being able to roll it out, well, you can go there and test it to see if there's any big problems with it.

you know, without taking the risk of giving it to a retail customer. And, you know, you can put it on a single vehicle. If you have a professional driver that you're paying, you get a ton of data in a small period of time and you can choose the data. You can tell them we want data from this situation, go there and do this. Now go to this other place, you know, like the drivers doing Chuck's UPL. - Sure, sure. Yeah, interesting.

LLM, so you talked about LLMs a little bit. So what's going on? Where is this all headed? So in the bigger LLM picture, so we have OpenAI,

They just released a, I guess, update for GPT-4. A new turbo update. Yeah. We'll see what the capabilities are. But then Claude Opus has been destroying GPT, at least in my personal use. It's beating out on benchmarks and in a lot of people's personal experience. Yeah, yeah. Like that may be the reason that we were getting this GPT-4 turbo. Yeah.

you know, one of OpenAI's points of pride is, you know, they've managed to stay atop the leaderboard quite comfortably for a long time with GPT-4. - Yeah, I mean, does this change the LLM game to have like, Anthropic being able to challenge OpenAI at least at this point? - Well, game. So everybody likes a horse race and that's why the horse race aspects of this stuff get played up.

So yeah, the game that newspaper reporters want to report on, the game that the bystanders like want, it gets more exciting when the two horses are close together at the nose. Does that change in an important way the long-term dynamics of the market? I don't think from a tech standpoint it does. I think from a regulatory standpoint and from the perception of the markets, the breadth of the willingness of a wide range of people to get involved and participate

like I think it might, 'cause it'll change people's perceptions and it might have an impact on the real, on the outcomes because it changes people's perceptions. I think most of it is, you know, people just like a race. And so like that's part of it. I am, you know, Mixtrel 8-22B came out. If you saw this, that was yesterday. I'm gonna download that tonight. So,

That might be the first GPT-4 class open source model. Yeah. And that would be exciting. Yeah, I doubt it's GPT-4, but... We'll see. Wait. Yeah, I mean, you know, there's a range of the GPT-4s, because the current turbo, like from a benchmark standpoint, it's interesting how...

Like the performance on benchmarks and the performance in people's experience has kind of diverged, you know, over time. The turbos, you know, the later versions of GPT-4, they continue to get better on benchmarks, right? But there are a lot of heavy users of it. Their perception is that its performance on the jobs that they're doing has degraded.

So that's a really interesting, you know, and I think one of the reasons there's so much enthusiasm about like the stuff that I was seeing is that a lot of heavy users, people who are building applications around this, they were delighted that cloud, that Opus wasn't having the problems that they were experiencing with GPT-4 as they felt like it had degraded. Now, you know, it's hard to know.

how much of this is anecdotal, how much of it represents the real experience of everybody using the tools. Certainly having competitors out there gives you something to compare it to, so you get alternatives. It's like having other models out there is definitely good for the field.

It's about time, if you look at the rate of improvement in the open source models for us to be getting there. Databricks had a model that came out. It was another, well, there's a four of 16 mixture of experts, 100 and...

60 billion parameter, is this right? That kind of scale, 150 billion parameter. Super high performant on industrial low. We're starting to see the ecosystem vary. It's kind of just vary out where you see models

that where the people building them are specifically have certain types of workloads, certain kinds of applications in mind. And so the models can get good at those without necessarily showing up in the benchmarks, you know, so the people who work in that space, that set of applications, there's a,

Command R plus is out, which is, that's another big open source model. It just came out like in the last like couple of weeks and it's optimized for doing like rag applications, you know, back office type stuff where you use it in agentic ways. You build it, you wrap an agent wrapper around it and it's been specifically trained for all these modalities. So like,

Like, we don't know how good that is right now because it's not optimized for the kinds of things. Like, it does fine on the benchmarks for its size. Yeah.

But, you know, as Andrew Wynn has been pointing out a lot recently, if you wrap a model in an agent, you get much higher performance on the same set of tasks. It's with somewhat lower reliability, but people are gradually figuring out how to do that. So you can get GPT-4 performance from like 7 billion parameter models if you wrap a good agent around it and you directed a particular task.

So, like, it's really exciting to think, like, what people building wrapping agents around, you know, 150 billion parameter, maybe not quite GBD4, but getting close, are going to be. And it's an open source model. It's decentralizing the power structure, the knowledge base, right? Yeah. I think it's actually huge. Yeah.

source getting to GPT-4 level. I think this year we'll get there. Seems like Mistral will deliver something probably. They've been really impressive. Yeah, they've been super impressive. It just seems like that is significant because the GPT-4 level is kind of this benchmark where LLMs start to get really useful for a lot of things. And once you can get that open sourced, you can...

the cost to access that intelligence just drops like crazy because you could basically download it, run it on your computer, or eventually that will be, you know, shrunk down and to be able to run locally on different devices or different things. The cost to access that base level intelligence will basically go down to, you know,

negligible cost i'm impressed with what people demo running on iphones these days you know it's a apple has this uh this group of guys inside researchers that developed this platform called mlx which is basically it's like cuda for apple silicon with like a pie torch layer sort of built on top of it so that it basically it's the it's designed like the you know the uh

The new Mixerol model came out. I mean, they literally just released the, released it for download. And like three hours later, people had it running optimized on Apple Silicon under MLX. It's designed to like make, bring models in, you know, easy and performant. So, you know, people building on that platform, they can map it to iPhones and that kind of stuff. And so there's, there's a pretty good, you know, ecosystem of demos out there where people are taking whisper, they're taking all of the, you know, various other models and demonstrating what you can do.

by quantizing them, by doing Apple themselves, they released a paper last year, basically, that was all about how you change the design of a transformer so you can run it out of flash. Like you don't even have to load it into DRAM. You just keep the weights in flash and it runs at full speed off of the CPU with most of the weights being kept in flash. Like we're going to see like over the next year or two, like on,

a lot of performance coming into these small portable devices that you can carry around. Yeah, definitely. And it's about time because Siri sucks. Yeah, I think Apple is finally going to announce something at WWDC this year. It's so disappointing that it's taking them as long as it has. Yeah, they'll do something. Sora, what's your take on OpenAI's text-to-video engine, Sora? It? I'm, you know...

It's a really cool demonstration. I think it's a straightforward extrapolation of the trends that had been, that we'd been seeing, you know, taken to video,

It's a cool tool. You know, it'll be great when more people get to use it. I mean, it's still, you know, you're not going to just dump a prompt into SOAR and get a movie out. You know, it's a point in the continuum. It's a nice step forward. But I think it's the kind of place that you would have gotten throwing a lot of, you know, compute at the problem. So like...

Like I'm pleased to see one of the things about all these things, like you see this arc of progress. Right. And every point that you make along the arc of progress, like there's a part of it where you're like, you know, right in line. That's what we expected. But then there's there's always this we haven't hit the plateau yet that where, you know, you're.

And that's kind of how I feel about Sora, right? That yes, these methods, they continue to scale and they're going to keep getting better. The capabilities themselves are kind of in line with the trend. Yeah. It just seems like with Sora, it's super impressive to me, but I just think that it's taking a lot of compute to run that thing and it's not cheap. And it's like a proof of, it shows what's possible and people are going to be able to do the similar thing with,

different methods and it'll be a lot cheaper over time and the capabilities will grow, but it'll take some time, you know, I mean, yeah. I mean, the difference between demoing something and making it economical to get the customers can be pretty large on these things. I think OpenAI themselves has said that, you know, they need time to get it to where they can offer it at a reasonable price. Yeah. Yeah. We're almost wrapped up with our two hours here. We were in Austin. Yeah.

How was your Eclipse viewing? Where did you see it? Did you stay in Austin? Oh, it was terrible. I'm so miserable. Oh no. I saw the 2017 one and it was mind blowing. I was super excited about this. And we ended up going out to Kerrville 'cause I looked at the map ahead of time. I was prepared to like go a long way if necessary to get good viewing conditions, but it ended up kind of being this toss up. I mean, you've just got these banks of clouds coming in and are you gonna get lucky and be between two clouds during totality?

So we picked Kerrville like the day before it looked like it had the best odds and we just totally struck out. I mean, it's still cool to be underneath the thing and see the sky go dark and hear the animals all change. You know what I mean? It's definitely interesting. I don't regret going. I don't feel like we made any bad decisions. Going back, looking at the data, it was still the best shot. It was just like, it struck out. Like I was inconsolable the whole day. I was so bummed out. So the clouds were over the-

The whole time? No, I mean- I mean, during totality. No, we had the whole thing where, you know, you'd see it in between the clouds as there's something else or whatever. Yeah, yeah, yeah. But there were these couple of different layers of clouds moving back and forth and they had holes between them and occasionally you'd get a good view for, you know, a few seconds or a minute or something like that. Okay, yeah, yeah. But like just before totality, this huge thick bank just made.

Oh, man. We just like, we didn't get anything. Like, I couldn't even look at a clip of pictures for like 24 hours on one. I was so bummed out. That's funny. That's terrible. Yeah, it's too bad. Well, I got to tell people, like, if you've never seen one, it is so worth it. They're so...

It's just a really incredible experience. To be out in an open space under a blue sky and watch the moon move in front of the sun, it will change the way you see the world. - Yeah, definitely. My kids were really into it. I had them watch a bunch of videos. We bought some books on solar eclipse. So they're really into it. They're just like so excited. Yeah, it's fun. - Yeah, bummer. So now, this morning I was like,

Where's the next one? Australia is going to get a lot over the next year. Maybe we're going to be going to Australia. I really want to see another one. Yeah, yeah. I really do. Fun. All right, James, thanks for hanging out. Yeah, and- This was fun. Yeah, yeah. We'll talk again, hopefully soon. All right. See you guys. Bye.

FSD v12: Tesla's Autonomous Driving Game-Changer w/ James Douma 01:55:33 Share

Dave Lee on Investing

Deep Dive

Shownotes Transcript

FSD v12: Tesla's Autonomous Driving Game-Changer w/ James Douma