We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

NVIDIA's Plan To Build AI That Understands The Real World — With Rev Lebaredian

2025/2/5

Big Technology Podcast

AI Deep Dive Transcript

People

Rev Lebaredian

Topics

Rev Lebaredian: 我认为Jevons Paradox很好地解释了为什么降低AI的成本会增加需求,因为这会解锁更多应用。NVIDIA选择计算机图形作为首个计算问题,因为它是一个永无止境的问题,持续的创新需求推动市场增长。智能是所有计算问题中最‘无限’的,更多的计算能力可以创造更高的智能,提高AI效率将增加其经济价值和市场需求。过去十年GPU在张量运算方面的性能提升了百万倍,这不仅来自硬件,还来自软件算法的改进。DeepSeq的进步延续了GPU在AI性能提升上的十年趋势。要创造真正智能的机器,需要在AI中融入常识和物理知识,而不能仅仅依赖文本数据。AI的下一步是将相同的技术应用于物理世界,让AI学习物理规则而不是语言规则。未来最有价值的AI将能够与物理世界互动。将AI应用于物理世界将带来比信息技术更大的价值,因为物理世界市场的规模远大于信息技术市场。训练物理世界AI不仅仅是输入文本描述,还包括视频、3D数据和物理模拟数据。NVIDIA Cosmos项目的目标是构建能够理解物理世界的机器人大脑,这项工作已经持续了大约十年。训练机器人大脑需要提供物理世界的经验数据,而模拟是获取这些数据的有效途径。NVIDIA Omniverse平台用于创建物理精确的模拟世界,用于训练和测试AI。Cosmos项目提供开源模型、工具和数据管道,以促进物理AI的开发。Cosmos项目面向所有需要与物理世界交互的应用,包括机器人、自动驾驶和传感器等领域。物理世界AI可以利用文本模型的知识库,并结合其他数据模式进行学习。AI的学习方式与人类相似,通过多种感官同时接收信息并建立关联。虽然视频生成模型在理解物理方面取得了令人惊讶的进展,但仍然存在缺陷,例如缺乏物体持久性。目前的视频生成模型对物理世界的理解程度可能只有5%-10%,还有很大的提升空间。NVIDIA不仅仅是芯片公司,还开发软件和AI技术,以支持其加速计算平台。NVIDIA在大型语言模型的早期发展中发挥了关键作用,并开源了相关软件。机器人不会一夜之间取代所有人类工作,而是在解决劳动力短缺问题方面发挥作用。全球面临劳动力短缺问题,机器人可以填补这一缺口。人形机器人首先会在工业领域得到广泛应用,因为劳动力短缺和更容易被企业接受。AI将彻底改变好莱坞电影制作,使制作更逼真、更低成本的电影成为可能。机器人技术在战争中的应用具有巨大潜力,但也存在潜在的危险,需要建立规则和机制来防止滥用。NVIDIA的成功源于其长期坚持核心技术和人才培养。

Deep Dive

Shownotes Transcript

Translations:

中文

Let's talk about NVIDIA's push to generate AI that understands the real world with technology that can influence the future of robotics, labor, cars, Hollywood, and more. We're joined by the company's VP of Omniverse and Simulation Technologies right after this.

From LinkedIn News, I'm Leah Smart, host of Every Day Better, an award-winning podcast dedicated to personal development. Join me every week for captivating stories and research to find more fulfillment in your work and personal life. Listen to Every Day Better on the LinkedIn Podcast Network, Apple Podcasts, or wherever you get your podcasts.

I'm Tomer Cohen, LinkedIn's Chief Product Officer. If you're just as curious as I am about the way things are built, then tune in to my podcast, Building One. I speak with some of the best product builders out there. I've always been inspired by frustration. It came back to my own personal pinpoint. So we had to go out to farmers and convince them. Following that curiosity is a superpower. You have to be obsessed with the human condition. Listen to Building One on Apple Podcasts or wherever you get your podcasts.

Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world and beyond. Today, we're joined by Rev Liboredian. He's the Vice President of Omniverse and Simulation Technology at NVIDIA for a fascinating conversation about what may well be the next stage of AI progress, the pursuit of world models that provide common sense to AIs.

Rev, I'm so happy to see you here. We actually spent some time at your headquarters a couple months back, and I'm really glad that you're here today and to introduce you to the big technology audience. Welcome to the show. Thank you for having me. All right, before we jump into world models, obviously, we're having this conversation in the wake of the deep sea revolution. I don't know what you want to call it. And everyone is talking about NVIDIA now. You're in quiet period, so we're not going to go into financials. But I can and do want to ask you about NVIDIA.

the technology side of this, specifically about Jevons Paradox. I keep hearing NVIDIA, Jevons Paradox, Jevons Paradox, NVIDIA. What is Jevons Paradox and what do you think about it? My understanding of what Jevons Paradox is, is essentially an economic kind of principle that as you reduce the cost of something, of running it, you create more demand.

because it unlocks essentially more uses of that technology when it becomes more economically feasible to use that. I think that really does apply in this case in the same way that it applies to almost every other important computing innovation over the last 40, 50 years, or at least as long as I've been alive. At the inception of NVIDIA in 1993, NVIDIA selected

very carefully selected the very first computing problem to address in order to create the conditions by which we could continue innovating and keep growing that market. And this was the problem of computer graphics and particularly rendering within computer graphics, generating these images. The reason we selected it is because it's an endless problem.

No matter how much compute you throw at it, no matter how much innovation we throw at it, you always want more. And throughout the time I've been at Nvidia, which is now 23 years, many times I've heard, well, graphics are good enough.

Rendering is good enough. And so soon, soon, NVIDIA's big GPUs and more computing power is not going to be necessary. We'll just get consumed by SOCs or integrated into another chip as integrated graphics and it'll disappear. But that never happened because the fundamental problem of simulating the physics of light and matter was endless.

We see this in almost every important computing domain. AI is one of these things. I mean, can we really say that we have now reached the point where our computers are intelligent enough or the intelligence we create is good enough and so it's just going to shrink? We're not going to have any more use for more compute power there? I don't think so. I think intelligence is something that is probably the most...

endless of all computing problems. If we can throw more compute at the problem, we can make more intelligence and do it better and better. So making AI more efficient will just increase its economic value in many of the applications we want to apply it to and increase demand.

And can we talk about the progression of AI models becoming more efficient? I know it's like a hot topic right now, but it does seem to me that over the past couple of years, we've definitely seen models become more and more efficient. So what can you tell us about, we'll just talk about large language models on this front, the efficiency gains that we've seen over time with them? I mean, this isn't new. This has been happening for

for the past 10, 12 years or so, essentially since we first discovered deep learning on our GPUs with AlexNet. If you look at the computational curve, what our GPUs can do in terms of

tensor operations, the AI kind of math that we need to do. Over the last 10 years, we've had essentially a million X performance increase.

And that increase isn't just from the raw hardware. It's also through many layers of the software algorithms. So we're getting these benefits, these speedups, continuously at a very rapid rate, exponentially, by compounding many layers, all the different layers at which

This computing happens from the fundamental hardware, the chips themselves, systems level, networking, system software, algorithms, frameworks, and so on. So what we've seen here with DeepSeq is a great advancement that's on the same curve that we've been on for a decade now.

Okay. And 23 years at NVIDIA. I'm going to save a question to ask you about that as we get later on or towards the end of the interview, because I'm very curious what your experience has been being at NVIDIA for so long, especially given that, you know, the company's technology, at least from the outside world, was viewed as in favor and then people questioned it and back in favor of people question. Obviously, we see what's going on now. Maybe we're living through a mini cycle at this point. So very curious about your experience. But I want to talk about the technology first.

And let me just bring you into a conversation that we had here on the show with Jan LeCun, who's met as a chief AI scientist, really right after ChatGPT came out.

And one of the things that Jan did was he said, go ask ChatGPT what happens if you let go of a piece of paper with your left hand. And I typed it in. It gave a very convincing answer. It was completely wrong because with text, you don't have the common sense about physics and text.

Try as you might to teach a model physics with text, you can't. There's just not enough literature that describes what happens when you drop a paper with a hand and therefore the models are limited. And Jan's point here was basically like,

If you want to get to truly intelligent machines, you need to build something into the AI that teaches common sense, that teaches physics, and you need to look beyond words to do that. And so now I turn it over to you, Rev, because I do think that right now within NVIDIA, a big initiative is to build a picture of the world that

to teach AI models that common sense that Jan had mentioned was lacking. And I have some follow-ups about it, but I want to hear first a little bit about what you're doing and whether your efforts are geared towards solving the problem that Jan brought up. Well, what Jan said is absolutely true. And it makes intuitive sense, right? If an AI has only been trained

on words, on text that we've digitized, how can it possibly know about concepts from our physical world, like what the color red really is, or what it means to hear sound, what it means to feel felt. It can't know those things because it never experienced them.

When we train a model, essentially what we're doing is we're providing life experience to that model and it's pulling apart patterns or it's discerning patterns from all of the experience that we give it. And what was really, really amazing about GPT, the advancements with LLMs, starting with the transformer, is that we could take...

this really, really complex set of rules that humans had no way of actually defining directly in a clear and robust manner, the rules of language. And we were able to pull that out of a corpus of data. We took all of this text, all these books and whatever information you could scrape from the internet about that. And somehow,

This model figured out what all the patterns of language are in many different languages and could then, because it understand the fundamental rules of language, do some amazing things. It could generate new text, it could style some text that you give it in a different way. It can translate text from one form to another, from one language to another. It can do all of this awesome stuff.

But it lacks any information about our world other than what's been described in those words. And so the next step in AI is for us to take the same fundamental technology we have, this machine we have, where we can feed it life experience and it figures out what the patterns and the rules are, and feed it with actual data about our physical world.

and about how our world works so that it could apply that same learning to the rules of physics instead of the rules of grammar, the rules of language. It's going to understand how the physical world around us works. And our thesis is that from all the AIs we're going to create into the future, the most valuable ones are gonna be the ones that can interact with our physical world.

the world that we experience around us, the world created out of atoms. Today, the AIs that we're creating are largely about our world of knowledge, our world of information, ones and zeros, things that you could easily represent inside a computer in the digital world. But if we can apply the same AI technology to the physical world around us, then essentially we unlock robotics. We can have these agents

with this intelligence and even super intelligence in specific tasks, do amazing things in the world around us, which is if you look at global markets, if you look at all of the commerce happening in the world and GDP, the world of knowledge information technology is somewhere between two to $5 trillion a year.

But everything else, transportation, manufacturing, supply chain, warehouse and logistics, creating drugs, all the stuff in the physical world, that's about $100 trillion. So the application of this kind of AI to the physical world is going to bring more value to us. So it's interesting. It's not just...

basically inputting that real world knowledge into LLMs, right? So they can get the question about dropping the paper with the hand correct. It is also something that you're working on is building the foundation for robots to go out into our world and operate within it. So yes, it's not inputting it in the same way that we do for these text models. We're not just gonna describe

with words what happens when you drop a piece of paper. We're gonna give these models other senses during the learning process. So they'll watch videos of paper dropping. We can also give it more accurate, specific information in the 3D realm. Because we can simulate

these physical worlds inside a computer today, we have physics simulations of worlds, we can pull ground truth data about the position and orientation and state of things inside that 3D world and use that as another mode of input into these models.

And so what we'll end up with is a a world foundation model that was trained on many different modes of data essentially different sense it senses it can see it can hear it can touch and feel and do do many of the things we can do or many things other animals or Or even things no no creature can do because we can provide it with sensors that don't exist uh inside inside the natural world and uh, it can from that

kind of decipher what are the actual combined rules of the world. And this encoding of the knowledge of how the physical world works can then be the basis for us to build agents

inside the real world to build the brains of these agents, otherwise known as physical robots. Right. And so this is your recently announced Cosmos project. So talk a little bit about like what Cosmos is. I mean, obviously it's a world foundational model, but where, how long you've been building it and what type of companies are and developers might use it and what they might use it for? Um,

We've been working towards Cosmos for probably about 10 years. We envisioned that eventually this new technology that had formed with deep learning, that that was going to be the critical technology necessary for us to create robot brains.

And that is ultimately what's going to unlock this incredible amount of value for us. So we started working towards this a long time ago. We realized early on that the big problem we were going to have is in order to train such a model, to train a robot brain to understand the physical world and to work within it, we're going to have to give it experience.

We're going to have to give it the data that represents the physical world. And capturing this data from the real world is not really an easy thing to do. It's very expensive and in some cases very dangerous. For example, for self-driving cars,

which is a type of robot. It's a robot that can autonomously, on its own, figure out how to get from point A to point B by controlling this physical being, a car, by braking and accelerating and steering. How are we going to ensure that a self-driving car really understands when a child runs into the street as it's barreling down the street that it should stop?

And how can we be sure that it's actually going to do that without actually doing that in the real world? We don't want to go capture data of a child running across the street. Well, we can do that by simulating it inside a computer.

And so we realized this early on. So we set about applying all of the technologies we'd been working on up until that point with computer graphics and for video games and video game engines and physics inside these worlds to create a system to do world simulation that was physically accurate so that we could then train these AIs. And so we call that...

operating system, if you will, Omniverse. It's a system to create these physics simulations, which we then use to train AIs that we could test in that same simulation before we put them out in the real world. So we use it for self-driving cars and other robots out there. So building Cosmos actually starts first with simulating the world.

And so we've been building that stack and those computers for quite a while. Once the transformer model was introduced and we started seeing the amazing things large language models can do and the ChatGPT moment came, we understood that this essentially unlocked the one thing that we needed to really push forward in robotics, which is the ability

to have this kind of general intelligence about a really complex set of things, complex set of rules. And so we set about building what is Cosmos today, essentially a few years ago, using all of the technology we had built before with simulation and AI training. And what Cosmos is, it's actually a few things. It's a collection of some open weight models

that we've made freely available. Along with it, we also provide essentially all of the tooling and pipelines you need to create a new World Foundation model. So we give you the World Foundation models that we've started training, which are world-class, especially for the purposes of building physical AI.

And we also have what's called a tokenizer, which are AIs themselves that are world class. It's a critical element of building world foundation models. And then we have curation pipelines, the data that you

you select and curate to feed into the training of your World Foundation model is critical. Just selecting the right data requires a lot of AI in and of itself. We released all of this stuff and we put it out there in the open so that the whole community can join us in building physical AI.

And so who's going to use it? Is it going to be robotics developers? Is it going to be somebody that's building, let's say, LLM-based application, but just wants them to be a little smarter? Both? It will be all of them. Yes. We feel that we're, as the industry, the world is right at the beginnings of this physical AI revolution. And no one company, no one organization, no

is going to be able to build everything that we need. So we're building it out there in the open to encourage others to come build on top of what we've built and come build it with us. And this is going to be essentially anybody that has an application that involves the physical world.

So that's definitely robotics companies are part of this and and robotics in the very general sense that includes self-driving car companies robo taxi companies and as well as Companies building robots that are in our factories and warehouses Anybody that wants to make intelligent robots that have perception and operate autonomously inside the real world though they want this but

It's not only about robots in the way we think about them as these agents that move around. We have sensors that we're placing in our spaces, in our cities, in urban environments, inside buildings.

These sensors need to understand what's happening in that world, maybe for security reasons, for coordinating other robots, changing the climate and energy efficiency of our buildings and data centers. So there's many applications of physical AI around.

that are broader than what we generally think of as these, what you imagine when you say a robotic application. There's going to be thousands and thousands of companies that build these physical AIs, and this is just the beginning. Now, you mentioned that the transformer model was an important development on this path, and that obviously was the thing that underpinned a lot of the real innovation we've seen with large language models.

Can the real world AI learn from the knowledge base that has been sort of turned into these AI models with text? Like if you're if you have a model that's trying to understand the world with common sense, do they take text as an input? They take all of it as input.

How does it work then with text? I mean, it's very interesting because it seems like that's like when we talk about the progression towards general intelligence, that is a very, you know, kind of amazing application of being able to read something and then sort of intuit what it means in a physical space. Don't you think? Yeah, I think the way I think about it, and I think this is right, is these AIs learn the same way we do. When you're brought into this world...

You don't know who is mommy, who is daddy. You don't even know how to see yet. You don't have depth perception. You can't see color or understand what it is. You don't know language. You don't know these things. But you learn by being bombarded with all of this information simultaneously through the many different senses. So when your mommy looks at you and says, "I'm mommy," pointing,

you're getting multiple modes of information coming at you, including essentially that text that's coming through an audio form there. And then eventually when you learn how to read, you learn how to read because a teacher points at letters and then words and sounds them out. So you have this association that you build between the information that you're reading

You understand like like mommy and the letters that mean that thing AIs learn in the same way when we train them if you give them all of these modes of information Associated with each other at the same time. It'll it'll associate them together That's how image generators work today when you go generate an image using a text prompt and

The reason why it can generate, you know, an image of a red ball in a grass field in an overcast day is because when it was trained, there was an association of some text along with the images that were fed into it.

It knew that during the training process that these words were related to that image. And so we can gather that understanding from that association.

What we're trying to do with World Foundation models is take it to the next level by giving it more modes of information and richer information. But part of that will still include text. We'll feed in the text along with the video and other ground truth information from the physical state of the world. Yeah, so this is going to be a multi-part question, and I apologize, but I

I don't really know another way to ask it. So what are the other modes of information that you're feeding in there? And do you really need to go through this simulation process? And I'll tell you, you know, it all sounds like a worthwhile endeavor to me, and I'm sure it is. But I also see video models today. And that is something that's really surprised me.

when we've seen the video generation models is that they really have an understanding of physics. Like they know just as an image, like an image generation is not moving, right? So you know that let's say the guy sits on the chair, but video, you could see people walking through a field and you watch the grass move.

And that means that those models inherently have a concept of how physics works, I think. And I'm going to run it by you because you're the expert here. But like, again, and Jan's going to come on the show in a couple of weeks. So maybe this is just in my mind because I'm gearing up and thinking about our last conversation. But I'm going to put this to you also. Maybe I'll ask what your answers, I'll ask him to weigh in on your answers on this. But the thing that he always talked about is

a human mind is able to sort of see infinite possibilities and accept that it doesn't break us so if you have a pencil and you hold it up you know it's going to fall but you know it could fall in infinite possibility in infinite ways but it's still going to fall for an ai that's been trained on different scenarios it's very difficult for them to understand that that pencil might fall in infinite ways when asked to generate it however

They've been doing a very good job with the video generators of like showing that they understand that. So just to sort of reiterate, what different modes of information are you using and why do we need this broader simulation environment or this Cosmos tool if we are getting such good results from video generation already? All very, very good questions. So first off, we use many, many modes. The primary one, though, for training Cosmos is video.

just like the video generation models. But along with that, there's text. We also feed it extra information and labels that we can gather from data, particularly when we generate the data synthetically. If you use a simulator to generate the videos, you have...

Perfect information about everything that's going on in every pixel in that video We know how far each object is in each pixel. We know the depth we know What the object is in each pixel you can segment out all of that stuff traditionally What we've done

for perception training for autonomous vehicles. So we've used humans to go through and label all that information from hours and hours of video that's been collected and it's inaccurate and not complete. So from simulation, we can get perfect information about the actual videos themselves. Now, that being said, your question about

These video models seem to really know physics and know it well. I think it is pretty amazing how much physics they do know. And it's kind of surprising we're here at this point. Like, had you asked me five years ago, would we be able to generate videos with this much physics plausibility at this stage?

I wasn't sure actually because I continually had been wrong for years prior to that. I didn't expect to see image classification in my lifetime until we saw it with AlexNet. But I would have bet against it. And so we're pretty far along. That being said, there's a lot of flaws in the physics we see. So you see this in the video. One of the basic things is object permanence. If

uh you direct the video to move the camera point away and come back objects that were there at the beginning of the video are no longer there or they're different right and so that is such a fundamental violation of the laws of physics um it's kind of hard to say well these models currently understand physics well uh and there's a whole bunch of other things in there um you know my my um

Life's work has been primarily computer graphics and specifically rendering, which is a 3D rendering is essentially a physics simulation. It's the simulation of how light interacts with matter and eventually reaches a sensor of some sort. We simulate what a camera would do in a 3D world and what image it would gather from the world.

When I look at a lot of these videos that are generated, I see tons and tons of flaws because when we do those simulations and rendering, we're attuned to seeing when shadows are wrong and reflections are wrong and these sorts of things. To the untrained eye, it looks plausible. It looks correct.

But I think people can still kind of feel something is wrong, you know, when it's AI generated, when it's not. In the same way that for decades now, since we introduced computer graphics to visual effects in the movies, you know, when some, you don't know what it is, but if the rendering's not great in there, it just feels CG, it feels wrong. We still have that kind of uncanny valley thing going on.

That all being said, I think we're going to rapidly get better and better. So the models today have an amazing amount of knowledge about the physical world, but they're maybe at like 5%, 10% of what they should understand. We need to get them to 90%, 95%.

Right. Yeah. I just saw a video of a tidal wave hitting some Island. And I looked at it. It was like super realistic. It was, of course it was on Instagram because that's all Instagram is right now is 3d generate. I mean, AI generated video. And it took me a second and it's more frequently taking me a minute to be like, Oh, that's AI generated. And sometimes I have to look in the comments and just sort of trust the wisdom of the crowds and on that front. But I look, I could talk about, you might not be the best judge.

of it as well. Humans, I mean, we're not particularly good at knowing whether physics will really be accurate or not. This is why movies, directors can take such license with the physics when they do explosions and all kinds of other fun stuff like tidal waves in there.

Yeah. Well, it's, it's just like some comedian made this joke. They're like Neil deGrasse Tyson likes to come out after these movies like gravity and talk about how they're like scientifically incorrect and

And some comedians like, yeah, well, how about the fact that George Clooney and Sandra Bullock are the astronauts? That didn't bother you at all. But it is interesting that we can watch these videos, watch these movies and fully believe, at least in the moment, that they're real. Like we can allow ourselves to like sort of lose ourselves in the moment. Exactly. And just be like, yep, I'm in this story. I feel emotion right now watching, you know, George Clooney in a spaceship, even though I know he's no astronaut.

And I think for that purpose, I mean, I worked on movies before I was at NVIDIA. That's what I did, computer graphics for visual effects. That is a perfectly legitimate use of that technology. It's just that that level of simulation is not sufficient for building physical AI that are going to be the underpinnings or the fundamental components of a robot brain.

I don't want my self-driving car or my robot operating heavy machinery in a factory to be trained on physics that doesn't match the real world. Even if it looks right to us, if it's not right, then it's not gonna behave correctly and that's dangerous. So it's a different purpose. That's why what we're doing with Cosmos

It is really a different class of AI than video generators. You can use it to generate videos, but the purpose is different. It's not about generating beautiful imagery or interesting imagery for art. This is about simulating the physical world using AI to create the simulation.

Rev, I want to ask you one more follow-up question about not the flaws but the video generator's ability to get things right.

And then we're going to move on from this topic. But it is just surprising and interesting for me to hear you and Demis Hassabis, the CEO of Google DeepMind, who was just on, who commented on this, talk about how these video generators have been surprisingly good at understanding physics. And Jan also, basically in our conversations previously, effectively saying that it's very difficult for AI to solve these problems. I won't say they've solved it.

But everybody's surprised they've gotten to this point. So what is your best understanding of how they've been, though flawed, but this good? You know, this is the trillion dollar question, I guess. You know, we've been betting now for years that if we just throw more compute and more data at the problem, that...

These scaling laws are going to give us a level of intelligence that's really, really meaningful, that will be like step function changes in capabilities. There's no way for us to know for sure. It's very hard to predict that. It feels like we are on an exponential curve with this, but which part of the exponential curve we're on, we can't tell.

So we don't know how fast that's going to happen. Honestly, I've been surprised at how well these transformer models have been able to extract the laws of physics to this level by this point in time. At this point, I believe in a few years, we're going to get to a level of physics understanding with our AIs.

that are that's going to unlock, you know, the majority of the applications we need, we need to apply them in in robotics. Let me ask you one more question about this, then we're going to take a break and talk about some of the societal implications of putting robotics, let's say in the workforce, and in I don't know, in all different areas of our lives.

There's definitely a sizable portion of the population that is going to be surprised. Maybe not our listeners, but a sizable portion of the population that would be surprised to hear that NVIDIA itself is building these world foundational models, releasing weights to help others build on top of them. The perception, I think, from some on the outside is, hey, isn't NVIDIA just a company that makes those chips? So what do you say to that, Rev?

Well, yeah, that's been the perception. It's been the perception since I started NVIDIA 23 years ago. And it's never been true that we just build chips. Chips are a very, very important part of what we do. They're the foundation that we build on. But when I joined the company, there were about 1,000 people, 1,000 employees at the time. The grand majority of them

We're engineers, just like today. The majority of our employees are engineers. And the majority of those engineers are software engineers. I myself am a software engineer. I wouldn't know the first thing about making a chip. And so our form of computing, accelerated computing, the form of computing we invented, is a full stack problem. It's not just a chip problem.

It's not just a chip that we throw over the fence and leave it to others to figure out how to make use of it. It doesn't work unless we have these layers of software and these layers of software have to have algorithms that are harmonized with the architecture of our chips and our systems. So we have to go in these new markets that we enter, what Jensen calls $0 billion industries,

We have to actually go invent these new things kind of top to bottom because they don't exist yet and nobody else is likely to do it. So we build a lot of software and we build a lot of AI these days because that's what's necessary in order to build the computers to power all of this stuff. We did this...

with LLMs early on. Many, many years ago, we trained the, at the time, what was the largest model in terms of number of parameters for an LLM. It was called Megatron. And because we did that, we build our computers, our chips and computers and the system software and the frameworks and pipelines and everything online.

We were able to tune them to do these large-scale things and we put all of that software out there, which was then used to create all the LLMs we enjoy today. Had we had not done that, I don't think we would have had ChatGPT. This is essentially the same thing. We're creating a new market, a new capability that doesn't exist. We see

This is being an endeavor that is greater than NVIDIA. We need many, many others to participate in this. But there are some things that we're uniquely positioned to contribute, given our scale and our particular expertise. And so we're going to go do that. And then we're going to make that freely available to others so they can build on it.

Yeah. For those wondering why NVIDIA has such a hold in the market right now, I think you just heard the response. So I do want to take a break and then I want to talk about

the implications for society when we have let's say uh humanoid robots doing labor um in that part of the economy that we simply uh you know haven't really put AI into yet and what it means when it's many more trillions of dollars than the knowledge work so we're going to do that when we're back right after this

I'm Tomer Cohen, LinkedIn's Chief Product Officer. If you're just as curious as I am about the way things are built, the insights behind what it takes to create a world-renowned product, then tune in to my podcast, Building One. There's so much to learn, like how Patagonia innovates with its supply chain. We had to go out to farmers and convince them...

It was really damn hard. Or the way Adobe thinks about the first interaction somebody has with Photoshop. I was always so fascinated by how people navigate and find their way. Ever wanted to know how Nike builds emotion into the Jordan brand? You have to be obsessed with the current state of the human condition. And it doesn't stop there. What about how Gleam reinvented knowledge search with AI? You can learn about how a Michelin star chef is redesigning seeds for flavor and how Pixar is nurturing a creative culture.

Listen to Building One on Apple Podcasts or wherever you get your podcasts. And we're back here on Big Technology Podcast with Rev Leberedian. He's the Vice President of Omniverse and Simulation Technology at NVIDIA. Rev, I want to just ask you the question that obviously has been bouncing around my mind since we started talking about the fact that you're going to enable robotics to be able to sort of take over. I don't know. Is take over the right word?

take over a lot of what we do currently in the workforce. I mean, what do you think the labor implications are here? Because yeah, if you've spent your entire life working at a certain manual task and next thing you know, someone uses the Cosmos platform or your new, I think it's like a Groot, it's called, what is it called? Groot. - Groot, that's our project for humanoid robots.

building and training humanoid robot brains. So, all right, so Groot, you know, some company uses Groot to start to put a humanoid work for humanoid labor in, let's say, a factory or even as a care robot. And I'm a nurse. And all of a sudden, some Groot built robot is now helping take care of the elderly. What are the labor implications of that? Well, first and foremost, I think we need to understand that

This is a really hard problem. It's not like overnight we're going to have robots replace everything humans do everywhere. It's a very, very difficult problem. We're just now at an inflection point where we can finally, we see a line of sight to building the technology we needed to unlock the possibility of these kind of general purpose robots. And that's

we can now build a general purpose robot brain. 20 years ago, that was not true. We could have built the physical robot, the actual body of a robot, but it would have been useless because we couldn't give it a brain that would let it operate in the world in a general purpose manner. We couldn't interact with it or program it in a useful way to do anything. So that's what's been unlocked here.

I talk to a lot of CEOs and executives for companies in the industrial sector, in manufacturing, in warehousing, to retail companies. In all of these companies I talk to, in every geography, there's a recurring theme.

There's a demographic problem the whole world is facing. We don't have as many young people who want to do the jobs that the older people who are retiring now have been doing. If you go to an automotive factory in Detroit or in Germany, go look around. Most of the factory workers are aging and they're quickly retiring.

And these CEOs that I'm talking to, their biggest concern is all of that knowledge they have on how to operate those factories and work in them. It's going to be lost. The young people don't want to come and do these jobs. And so we have to solve that problem. If we're going to maintain, not just grow our economy, but just maintain where the economy is at and produce the same amount of things, we need to find some solution to

to this problem. We don't have enough workers. We've been seeing it in transportation. There's not enough truck drivers in the world to go deliver all the stuff that's moving around in our supply chains. We can't hire enough of them. And there's less and less young people that want to do that job every year. So we need to have self-driving trucks. We need to have self-driving cars to solve that problem.

So I think before we talk about replacing jobs that humans want to do, we should first be talking about using these robots to fill in the gap that's being left by humans because they don't want to do it anymore. Right, and there could be specialization, like...

Take nursing, for example, the nurse that injects me with a vaccine or the nurse that like puts medication in my IV. Maybe we keep that human for a while, even though, you know, they make mistakes, too. But I'd feel a lot more comfortable if that was human. The nurse that takes me for a walk down the hall after I've gotten a knee replacement, that could be a robot.

Maybe better than a robot. We'll see how this plays out. We believe that the first place we're going to see general purpose robots like the humanoid robots really take off is in the industrial sector because of two things. One, the demand is great there because we have the shortage of workers. And also because...

It makes more sense to have them adopted in these spaces where a company just decides to put them in there and mostly warehouses and factories are kind of unseen. I think the last place we're going to start seeing humanoids show up is in our homes, in your kitchen. Don't tell Jeff Bezos that.

Well, they will show up there and I think it's going to be uneven. It'll depend on even geographically. They'll probably show up in a kitchen in somebody's home in Japan before they show up in a kitchen in somebody's home in Munich, Germany. And I think that's a cultural thing. You know, I personally don't even want another human in my kitchen. I like humans.

being in my kitchen and preparing stuff myself. My wife and I are always in each other's space there, so we get kind of annoyed. So having a humanoid robot would be kind of weird. I don't even want to hire somebody else to do that. We kind of do that ourselves. So that's a kind of personal decision. I think things like jobs like caring for our elderly and healthcare, those are very human things.

human professions. You know, there's a lot of what the care is, it's not really about the physical thing that they're doing. It's about the emotional connection with another human. And for that,

I don't think robots are gonna take that away from us anytime soon. Well, the question is, do we have enough care professionals to take those jobs? That's the one that really seems in danger. And so what's likely to happen is it'll be a combination. The care professionals we do have will do the things that require EQ, that require empathy, that requires really understanding the other human you're taking care of. And then they can instruct the robots around them

to assist them to do all of the more mundane things like cleaning and maybe giving the shots and IVs, I don't know. How long away is that future, Rev? How long do you think? You know, I wouldn't venture to guess on that kind of interaction in a hospital or a care situation.

Quite yet. I believe it's going to happen in the industrial sector first. And I believe that it's within a few years we're going to see it. We're going to see humanoid robots widely, widely used in the most advanced manufacturing and warehousing. Wild. Okay. I want to ask you about Hollywood before we go. I guess I have this question rattling in my mind, which is,

Are we just going to see like movies that look real but are computer generated? Like we have computer generated movies now, CGI, but they all look pretty CGI-y. But I imagine- Well, they don't all look CGI-y. Some of them look pretty amazing. Somewhat real. But I'm curious, like do you think that like is Hollywood going to move to a area where it's super real and just simulated? Absolutely.

Go ahead. Absolutely. I mean, well, was it a year or two ago when the last Planet of the Apes came out? I went to go see it with my wife. Now, my wife and I have been together since I worked at Disney in the mid-90s working on visual effects and rendering. I had a startup company doing rendering, and she was a part of that. So she has a good eye, and she's been around computer graphics and rendering for decades now.

When we went to go see Planet of the Apes, even though obviously those apes were not real, at one point she turned around and said, that's all CG, right? She couldn't quite believe it. I think what Weta did there is amazing. It's indistinguishable from real life, except for the fact that the apes were talking. Like, other than that, it's indistinguishable.

The problem with that though is to do that level of CG in the traditional way that we've done it requires an incredible amount of artistry and skills that only a few studios in the world can do with the teams that they have and the pipelines they build. And it's incredibly expensive to produce that.

What we're building with AI, with generative AI, and particularly with world foundation models, that once we get to the point where they really understand the depth of the physics that they need to to produce something like Planet of the Apes, once we have that, of course they're going to use it.

those technologies to produce the same images because it's going to be a lot faster and it's going to be a lot less expensive to do the same things. It's already starting to happen. Rev, I know we're getting close to time. Do I have time for two more questions? Absolutely. Okay. So the more I think about robotics, the more I think about sort of what the application in war might be. I know that like

You can't think of every permutation when you're developing the foundational technology, but we are living in a world where war is becoming much more roboticized. And it's sort of like remarkable that we have some wars going on where people are still fighting in trenches. So I'm just curious if you've given any thought to like how robotics might be applied in warfare and whether there's a way to prevent some of like the bad uses that might come about because of it.

You know, I'm not really an expert in warfare, so I don't feel that I'm the best person to talk about how my bees are not. But I can say this. This isn't the first time where a new technology has been introduced that is so powerful that not only can we imagine

great uses of it that are beneficial to people but also really really scary devastating consequences of it being used particularly in warfare and somehow we've managed to to not not have that kind of devastation and

And in general, the world has gotten better and better, more peaceful and safer, despite what it might feel like today. By almost any measure, we have less lives lost through wars and these sorts of tragedies than ever before in mankind's history. The big one, of course, everybody always talks about is nuclear technology.

I mean, I grew up, I was a little kid in the 80s. This is kind of the height of the Cold War, the end of it. But every day I remember thinking, you know, it might happen. We might have some ICBMs arrive in Los Angeles at any point. And it hasn't happened because somehow...

The general understanding by everyone collectively, such that this would be so bad for everyone, that we put together systems, even though we had intense rivalry and even enemies between the Soviet Union and the U.S.,

We somehow figured out that we should create a system that prevents that sort of thing. We've done the same with biological weapons and chemical weapons. Largely, they haven't been used, even though the technology has existed there. And so I think that's a good indicator of how we should deal with this new technology, this new powerful technology of AI.

And a reason for us to be optimistic that it's possible to actually have this technology and not have it be so devastating. We can set up rules and conventions that say, even though it's possible to use AI in this way, that we shouldn't and we should all agree on that. And anybody that skirts the line on that

There should be ramifications to it to disincentivize them from using it that way. Yeah, I hope you're right on that. It seems like it's something that we're going to as a society deal with more and more as this stuff becomes more advanced. All right, so last one for you.

You've been at NVIDIA, we've talked about a couple of times, 23 years. I already teased this. So I just want to ask you, the technology has been in favor, it's not been in favor. You're at the top of the world right now, even though there was some hiccup last week, but whatever. It doesn't seem like it's going to be a long-term issue. Just what is one insight you can tell us that you can draw from your time at NVIDIA about the way that the technology world works?

Well, first I can tell you about how NVIDIA works. Yeah, that's great. And the reason I'm here, I've been here for 23 years and this will be the last job I ever have. I'm positive of it. When I joined NVIDIA, that wasn't the plan. I thought I'd be here one year, two years max. And now it's been 23 years. When I hit my 20-year mark,

Jensen at our next company meeting had rattled off a bunch of stats on how long various groups have been here, how many people had been there for a year, two years and so on. When you got to 20, there were more than 650 people that were at 20-year. Now earlier I had said when I joined the company, there were about a thousand people. So this means that most of the people that were there when I started at NVIDIA were still there after 20 years.

I wasn't as special as I thought I was when I hit my 20 year mark. And this is actually a very strange thing about Nvidia. We have people that have been here a long time and haven't left. It's strange in general for most companies, but particularly for Silicon Valley tech companies, people move around a lot. And I believe the reason why we've stayed here through all of our trials and tribulations and whatnot is because

fundamentally What Jensen has built here is a company where people come to do their lives work and we really mean it like You feel it when you're here. This is more than just just about Making some money or having a job You come here to do great work and to do your life's work and so the idea of leaving just a

It feels painful to me. And I think it is to many others. That's what's actually, I think, behind why, despite the fact that NVIDIA has had its ups and downs. And you can go back to look at our stock chart going back to like the mid 2000s. We introduced CUDA in 2006. And that was a really important thing. And we stuck to it. The

the analysts and nobody wanted us to keep sticking to it, but we kept investing in it. And our stock price took a huge hit and it was flat there for a long time, flat or dropping. And then it finally happened. AI was born on our GPU. That's what we were waiting for. And we went all in on that. And we've had ups and downs since then. We'll continue to have ups and downs, but I think the trend is going to still be up and to the right because

This is an amazing place where people who want to do their life's work, the best people in the world at what we do, who want to do their life's work, they come here and they stay here.

Yep. Well, Rev, look, it's always such a pleasure to speak with you. I really enjoyed our time together at NVIDIA headquarters. It was a really fun day. We did some cool demos and I appreciate that. And I'm just thrilled to get a chance to speak with you about this technology today. It is fascinating technology. It is cutting edge. Obviously, it brings up a lot of questions, some of which we got to today. I'm sure we could have talked for three hours. And I hope to keep the conversation up. So thanks for coming on the show.

Thank you for inviting me and hope we do talk for three hours one day. That'd be great. All right, everybody. Thank you for listening. Ranjan and I will be back to break down the news on Friday. Already a lot of news this week with OpenAI's deep research coming out. I just paid $200 for ChatGPT, which is a lot more than I ever thought I would for a month. But that's where we are today. So we're going to talk about that and more on Friday. Thanks for listening. And we'll see you next time on Big Technology Podcast.

NVIDIA's Plan To Build AI That Understands The Real World — With Rev Lebaredian 01:03:15 Share

Big Technology Podcast

Deep Dive

Shownotes Transcript

NVIDIA's Plan To Build AI That Understands The Real World — With Rev Lebaredian