We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode How AI robots learn just like babies — but a million times faster w/ NVIDIA’s Rev Lebaredian

How AI robots learn just like babies — but a million times faster w/ NVIDIA’s Rev Lebaredian

2024/12/3
logo of podcast The TED AI Show

The TED AI Show

AI Deep Dive AI Insights AI Chapters Transcript
People
B
Bilal Velsadu
R
Rev Lebaredian
Topics
Bilal Velsadu: 本期节目探讨了AI机器人学习的现状,特别是机器人如何学习理解和与物理世界互动。传统机器人学习缓慢,而NVIDIA利用模拟环境帮助机器人快速学习。基于物理世界的AI有潜力彻底改变各行各业,从自动驾驶到复杂手术,甚至家务劳动。 Rev Lebaredian: 在NVIDIA,我的角色是Omniverse和模拟技术的副总裁。我将电影特效领域的经验应用于机器人模拟训练。NVIDIA从一家游戏硬件公司发展成为AI和模拟领域的领导者,这源于其对加速计算的专注。我们发明了GPU,并通过可编程着色和CUDA技术,将其应用于更广泛的计算领域。AlexNet的出现标志着深度学习的突破,使得我们可以通过大量数据训练来生成人类无法想象的算法。 我们认为,要将AI应用于价值100万亿美元的物理世界市场,需要机器人作为桥梁。机器人通过感知、决策和行动与物理世界交互。模拟环境可以提供大量数据,克服现实世界数据采集的限制,并加速机器人训练。强化学习是机器人学习的一种有效方法,类似于人类婴儿的学习过程。Isaac Sim是基于Omniverse的机器人模拟器,可以进行物理精确的模拟。 Rev Lebaredian: 物理AI的应用领域广泛,包括自动驾驶、机器人辅助手术和自动化仓储。自动驾驶技术已经成为现实,并朝着更通用的模型发展。未来,我们将看到更多基于物理的AI模型,可以理解物理世界的基本规律,并根据特定任务进行微调。人形机器人是通用型机器人的一种理想形态,因为它们更适合在为人类设计的环境中工作。工业领域对人形机器人的需求巨大,未来它们也可能进入人们的日常生活。虚拟助手也可以被视为一种机器人,它通过感知、决策和行动与物理世界交互。物理AI技术可以增强个人设备的功能,并最终实现类似Jarvis的沉浸式虚拟助手体验。在物理世界中部署AI需要确保安全,并始终保留人为干预的机制。物理AI可以显著提高生产力,并最终带来极大的丰富。

Deep Dive

Key Insights

Why are robots struggling to master physical intelligence compared to humans?

Robots lack the years of practice and learned experiences that humans accumulate from childhood through trial and error in the physical world. While humans can instantly calculate trajectories and movements, robots require extensive training in simulated environments to achieve similar physical intuition.

How does NVIDIA's simulation technology help robots learn faster?

NVIDIA's simulated environments allow robots to practice and learn at a supercharged pace, compressing tens of millions of repetitions that would take humans years into minutes. This accelerates the development of physical intelligence, enabling robots to master new skills much more quickly.

What is the potential market size for physical AI applications?

The market for physical AI, which includes industries like transportation, manufacturing, and drug discovery, is estimated to be around $100 trillion. This is significantly larger than the $2-5 trillion global IT industry, highlighting the vast potential for AI to transform physical-world industries.

What is the role of simulation in training robots for the real world?

Simulation allows robots to gather the necessary data to learn the physics of the real world without the constraints of the physical environment. It enables robots to practice in virtual worlds where they can experience millions of scenarios, including rare and dangerous ones, that would be impossible or unethical to replicate in the real world.

How does reinforcement learning help robots develop physical intelligence?

Reinforcement learning allows robots to learn through experimentation, similar to how humans learn. By placing robots in virtual environments and giving them goals, they can practice millions of iterations of tasks, such as standing up or grasping objects, until they develop a deep understanding of the physical world.

What are some current applications of physical AI in industries?

Physical AI is currently being applied in autonomous vehicles, robotic-assisted surgery, automated warehousing, and drones. These technologies are already transforming industries by addressing labor shortages and improving efficiency in tasks that are tedious or dangerous for humans.

Why are humanoid robots gaining attention now?

Humanoid robots are becoming more relevant because they can operate in environments designed for humans, such as factories, hospitals, and homes. Their human-like shape allows them to navigate stairs, ramps, and shelves, making them versatile for a wide range of tasks in both industrial and personal spaces.

What are the potential risks of deploying AI in the physical world?

The primary risks include safety concerns and the need for human oversight. Ensuring that AI systems are safe and that humans can intervene if needed is crucial. This includes maintaining the ability to turn off or pause AI systems and ensuring that humans are part of the decision-making loop.

What are the positive outcomes of applying AI to the physical world?

The positive outcomes include increased productivity, reduced labor shortages, and the ability to perform tasks that are too tedious or dangerous for humans. This could lead to a world of radical abundance, where humans can focus on fulfilling and enriching work while robots handle the mundane and repetitive tasks.

Chapters
This chapter explores how NVIDIA uses simulated environments to train AI robots to learn and master new skills at an accelerated pace, far exceeding human learning speed. The discussion covers the concept of 'mirror worlds' and their role in robot training, and the potential impact of this technology on various industries.
  • Robots struggle with physical intelligence in the real world.
  • NVIDIA uses simulated environments to accelerate robot learning.
  • Simulations allow for millions of repetitions in minutes, compared to years for humans.

Shownotes Transcript

Translations:
中文

Hey, Belaval here. Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible.

The world of AI is advancing at an incredible pace. And it's no secret that in many areas, computers have long outperformed humans. But there's been one area that's been tough for robots to master.

Physical intelligence. We've talked a lot on this podcast about text and image generation, technologies that took years of research, immense computational power, and vast datasets to develop. But when compared to mapping 3D spaces and predicting the chaotic randomness of the real world, that's all child's play. So what gives humans the edge here, at least for now?

It's simple. We've had a lot of practice. Imagine you're a pro baseball player in the outfield watching a fly ball come your way. In an instant, your brain calculates the ball's speed, spin, and trajectory to predict where it will land. To you, it feels automatic.

But it's the result of years of practice and learned experiences, not just from baseball, but from a lifetime of physical interactions. From childhood, moments of trial and error in the physical world have trained your brain to understand how objects move and react. And for humans, mastering these skills takes time because real-world practice can't be rushed.

But fortunately for robots, it can be rushed. And NVIDIA, the AI giant historically known for its graphics cards, has developed incredibly powerful simulated environments where robots can practice and learn at a supercharged pace. Tens of millions of repetitions, which might take humans years, can be compressed into minutes. We're already seeing this in self-driving cars, but the potential goes far beyond that.

By building AI that understands the physical world, NVIDIA is setting the stage for machines that could revolutionize industries, assist in complex surgeries, and even help around the house. So what does it mean for robots to develop a kind of physical intuition? And what challenges and opportunities lie ahead as we continue to push the boundaries of robotics?

I'm Bilal Velsadu, and this is The TED AI Show, where we figure out how to live and thrive in a world where AI is changing everything. Hi, I'm Bilal Velsadu, host of TED's newest podcast, The TED AI Show, where I speak with the world's leading experts, artists, journalists, to help you live and thrive in a world where AI is changing everything. I'm stoked to be working with IBM, our official sponsor for this episode.

Now, the path from Gen AI pilots to real-world deployments is often filled with roadblocks, such as barriers to free data flow. But what if I told you there's a way to deploy AI wherever your data lives? With Watson X, you can deploy AI models across any environment, above the clouds helping pilots navigate flights, and on lots of clouds helping employees automate tasks, on-prem so designers can access proprietary data,

and on the edge so remote bank tellers can assist customers. Watson X helps you deploy AI wherever you need it so you can take your business wherever it needs to go. Learn more at ibm.com slash Watson X and start infusing intelligence where you need it the most.

Your business is modern, so why aren't your operations? It's time for an operations intervention. The PagerDuty Operations Cloud is the essential platform for automating and accelerating critical work across your company. Through automation and AI, PagerDuty helps you operate with more resilience, more security, and more savings. Are you ready to transform your operations? Get started at PagerDuty.com.

As we approach the 250th anniversary of the Declaration of Independence, TED is traveling to the birthplace of American democracy, Philadelphia, for an exciting new initiative. Together throughout 2024, TED and Visit Philadelphia started to explore democratic ideas in a series of three fireside chats that will shape our collective future as we work towards a more perfect union.

Our third and final event of 2024 about moving forward together took place on November 20th at the historic Reading Terminal Market. Hosted by TED curator Whitney Pennington Rogers, we featured TED Talks and a moderated Q&A with world champion debater Julia Darr and head of curiosity at the Eames Institute, Scott Shijioka. Thanks to Visit Philadelphia and our supporting partners Bank of America, Comcast NBCUniversal and Highmark.

Go to visitphilly.com/ted to learn more about this event and to hear about the exciting things we have coming up in 2025. Our guest today, Rev Liberadian, began his career in Hollywood, where he worked on visual effects for films like Mighty Joe Young and Stuart Little. His experience in creating detailed dynamic 3D worlds laid the foundation for his role today as VP of Omniverse and Simulation Technology at Nvidia.

There, he's using that expertise to push the boundaries of robotics by applying simulation technology to teach robots physical intelligence. In other words, how to understand and interact with the real world. In our conversation, we explore how NVIDIA, known for its role in gaming technology, became a key player in the development of generative AI. What a robot even is, and Rev's vision for a future where robots enhance our lives.

So, Rev, welcome to the show. Thank you for having me, Bilal. So in the first part of your career, you worked in entertainment, helping audiences become immersed in fantasy worlds. And now your work involves helping robots become immersed in simulations of the real world. Can you explain to our listeners what your role is at NVIDIA?

Technically, my role is, the title is Vice President of Omniverse and Simulation Technology. It's a weird title. I don't think there's many others like it out there. It's strange because it's a new concept, relatively speaking. I started my career, as you mentioned, in entertainment, media entertainment, doing visual effects and computer graphics for that purpose. I joined NVIDIA 23 years ago with the hope of taking what I was doing in movies,

creating this imagery of high fidelity, high quality fantasy worlds, and doing it in real time, doing it really fast using our GPUs to power that computation so that it could become what's a linear experience in movies could become an interactive one, like in a video game or in a immersive experience like XR.

It took a while for us to get there, though. Speaking of that, you've had a very unique vantage point over the years watching NVIDIA almost evolve from basically a gaming hardware company to a leader in AI and simulation. Could you share a little bit about your journey at NVIDIA and how NVIDIA's mission has transformed over the years?

That's a really great question. I think a lot of people don't really understand how Nvidia, this "gaming company" or this chip company that made chips for gaming PCs is now the most valuable company in the world and at the center of all of this AI stuff. But if you go back to what the idea behind the creation of the company was all the way at the beginning, it actually makes a lot of sense.

The founding principle of the company was this idea that general purpose computers, ones built around CPUs, the same architecture that we built all computers around since the 1960s, starting from the IBM System 360. They're really great, but there are certain computing problems that they just aren't fast enough to solve.

Now, at the time, we had this law called Moore's Law. It's not law like law of physics. It was more like an observation of how semiconductors were essentially providing double the compute for the same price or the same amount of power every year and a half or two. At its height, Moore's Law made it so that we could get 100 times speed increases for the same

price or the same power over a 10-year period. But we looked at Moore's law and said, well, if we wait for Moore's law to give us enough computing power to do certain things like rendering for computer graphics for video games, we would have to wait decades or maybe even hundreds of years before the computers would be fast enough to do some of the things we wanted to do. So NVIDIA set about creating this new form of computing,

that doesn't do everything, but it can do many things that would otherwise be impossible with this generic computer. We call that accelerated computing. We invented the idea of a GPU. The first problem we chose to tackle was the problem of 3D rendering for producing these images in video games.

At the time when NVIDIA was formed in 1993, there was no market for this. There were actually no 3D video games. They were just starting. There was Doom and Wolfenstein, like the first ones that just showed up. Yeah, that came a little bit later, I think. It was not '93, maybe not '95, I think.

So we imagine that this problem, if we could help solve it, a market would form around that, and then we could expand into other markets with the same accelerated computing architecture. That's essentially what happened. Fast forward a few more years, in the early 2000s, we added a critical feature to our GPUs. It's called programmable shading.

which is simulating how the light interacts with the material inside a 3D world. That's what makes plastic look like plastic, aluminum look like aluminum, wood look like wood. Up until that point in time, the kinds of shaders we could have, the kinds of materials were very limited and they made the video games look very simple or cartoony, not quite realistic.

In the movie world, we weren't limited by time and how much time you have to render. We could spend hours and hours rendering. So there's this big disconnect between how the quality of computer-generated image in a movie and what you could see in a video game. We introduced programmable shading and that feature of making it programmable unlock the possibility of us using the same GPUs for more than computer graphics and rendering.

Very quickly, we saw researchers and other people who weren't doing computer graphics take advantage of all the computing capabilities that were in our GPUs.

By taking their problems, other sorts of physics problems like molecular dynamics and fluid dynamics, they would take these problems and phrase them like they're a computer graphics problem. And when we realized that, that that was happening, people were willing to contort themselves into using graphics APIs to do this other stuff. We said, let's make it easier for them.

and we introduced CUDA, which was a more natural way of programming general-purpose things that weren't graphics on our GPUs. We essentially waited for six, seven years to see what the killer app would be. We imagined some developer somewhere, probably a grad student, is going to go figure out something amazing to do with this computing capabilities, and it took a while. We introduced CUDA in 2006,

At the end of 2012, almost seven years later, we finally had that moment. And what happened was two research students and their professor at the University of Toronto, Ilya Suskiver, Alex Krushevsky, and their professor, Jeff Hinton, who just won the Nobel Prize,

They beat all of the benchmarks in image classification with a deep learning neural network called AlexNet at the end of 2012 when they published that. That essentially changed everything.

- And this is insane because up until that point, basically every other approach for the ImageNet benchmark was not really winning because of this deep learning approach. This was the first time deep learning kind of blew everyone's mind in the realm of computer vision. And it's kind of wild to imagine it started off with programmable shaders and trying to make like cinematic visuals from Hollywood run in real time on your computer. But that same capability, like you said, as you made it easier for developers,

unlocked this whole new world in computer vision and certainly caught the whole world's attention, particularly y'all's, probably sooner than everyone else, I assume. That's exactly right. It seems counterintuitive that this thing built to create images is somehow the same thing that you need to build intelligence. But really, it all just comes down to computing and

The form of computing we had to build for computer graphics, we process a lot of pixels, a lot of triangles, a lot of light rays bouncing around in a scene. That same form of computation is the same thing you need to do all of the tensor math, all of the matrix math. The problem of image classification, that's been a longstanding one that we've all known would be great if we could solve. They've been trying to solve it since the 1950s.

It's a really, really useful thing to do to be able to distinguish what's inside an image that you provide the computer automatically. Up until that point, we would take a really smart person, a computer scientist, that person would imagine an algorithm that can do image classification, and then transcode what's in their brain into the computer and produce a program. What changed here was for the first time,

We were able to create an algorithm to solve something that no human could actually imagine. The way we solved it was by taking a large computer, effectively a supercomputer. We gave it millions of examples of images and said, when you see an image that looks like this, that's a cat. When you look at an image that looks like this, it's a dog. When you look at this image, it's an airplane. So we did that enough times that it wrote the software, it wrote the algorithm,

that could do that image classification. And so it did it better than any algorithm that a human could imagine. - And that's wild, right? You're talking about this like era where humans have written software. Now software is writing software. - That's right. There's two basic ingredients, a supercomputer, lots of computation,

and you give it a whole bunch of data or examples of what you would like it to do, and it figures out the algorithm for you based on the examples you give it. The first one, building large computers, that's our happy place, right? That's what NVIDIA knows how to do. We love building powerful computers and scaling them up. And so that's what we set about doing over a decade ago. And the recent explosive growth of NVIDIA is essentially...

Because of the bet we placed over a decade ago that these big computers were going to be useful. That's what everybody is clamoring for right now. They're setting up these AI supercomputers.

Yeah, and every country and company wants more of your GPUs. And of course, the recent demand has really been driven by large language models and diffusion models, which we've talked about a bunch on the podcast. But it's interesting, like as cool as ChatGPT is and as cool as it is to be able to type a prompt and get an image out, this stuff isn't the holy grail. These systems have their limitations, right? Could you talk a little bit about that as we

transition this conversation towards physical AI. Yes, that's exactly right. At that moment when we realized how profound this change was, that we could now produce algorithms that we never imagined we would have in our lifetimes through this new technique of deep learning and AI. The next question we asked ourselves was,

Now that we have this possibility of creating these amazing new things, which ones should we go create? What are going to be the most valuable and impactful ones? Now, if you just take a step back and think about the computing industry, the IT industry, it's somewhere between $2 and $5 trillion a year globally, which is a huge number, right? That's a really big industry. However, all of the rest of the industries out there, the industries that are

about our physical world, the world of atoms, that's $100 trillion. That includes markets like transportation, transporting humans, transporting goods. It includes manufacturing, which is reassembling atoms into products. It includes drug discovery and design, reassembling atoms into medicines, so on and so forth. Like all these things about our physical world,

at least the way humans value them through markets, are much greater value than information. Now, information is the easiest things for us to digitize. So it makes sense that the first algorithms that we develop using this new machine learning, deep learning AI technique, it's going to use all the data that we have readily available to us, which is essentially what's on the Internet. But if we could somehow take this new superpower,

and apply it to the realm of atoms, we unlock that $100 trillion market. All of those markets take manufacturing, for example. We've applied IT and computing to those markets like manufacturing. But if you go into a factory, it's not that different from a factory 50 years ago. They've been largely untouched by computing.

The reason why we haven't been able to do that is because we haven't really had a bridge between the physical world and the computing world. Connecting bits and atoms, baby. Let's go. Yes. And if you think a little bit more about that, bridge is essentially robotics. Totally. And so we thought about this and we said, this is now maybe possible. The robotics, it's been a dream for a long time. But what we've been missing are the fundamental algorithms we need to build algorithms

a truly useful robotic brain so that we could apply computing to the real world. And so what's a robot? A robot is essentially an agent out here in our real world that does three things and does these three things in a loop. A robot

perceives the world around us, the physical world. It inputs the world through sensors. They can be cameras and lidars and radars, all kinds of sensors, whatever the sensing mechanism is. It makes some sense out of what's coming in. It understands what's coming in. Essentially, that first neural network, AlexNet, was doing that. It's getting some information from the real world, an image,

photograph and making sense of what's inside it. The next thing it does, a robot agent inside the physical world, it takes this information, what is perceived, and makes some decisions. Makes a decision about how it should act, it plans and decides how it's going to affect the world.

The third thing is actuation. It actually does something inside the world. So once it's made the decision, it does something that actually moves or affects the physical world. Once that happens, then it's a loop. You perceive your changes to the world,

update your decisions and your plan and go actuate. By this definition, many things are robots, not just the things we normally think of as a robot, like a C3P or R2D2. A self-driving car is definitely a robot. It has to perceive the world around it. Where are the other cars, the stop signs, pedestrians, bicyclists? How fast are they all moving? What's the state of the world around me?

around the car, make some decisions on how it's going to get to the final destination, and actuates, steers, brakes, or accelerates, and this thing runs in a loop. Lots of things are robots if you define them this way. The building I'm in right now, which is our Endeavor building, our headquarters,

Every day when I enter it, in the reception area, we have turnstiles. There are sensors there. There's some cameras. They know when I walk up to the turnstile. It senses that I've approached and then decides who I am

based on an image classification algorithm, not dissimilar from that original AlexNet. Once it determines that I'm Rev, it can look me up in a database and should I have access, and then it actuates in the world. It opens the turnstile so I can pass through and update some count somewhere that now I'm in the main area. So this building is essentially a robot.

So if you think about robots in this way and you think about robotic systems as essentially the bridge between computing and the $100 trillion worth of industries out there that deal with the physical world, you start to get pretty excited. You're like, wow, we now potentially have the opportunity to go make a big impact in many of these other industries. So on that note, I mean, it's interesting, right? You are talking about how factories haven't changed in decades and you're right.

Enterprise resource planning software to keep track of the inventory of stuff and how it's moving around. But the world of atoms hasn't seen as much progress in the world of bits and to unlock that massive like physical like the massive opportunity in these physically based industries. What's the missing piece? What do we not have today? And what are y'all building to make that happen?

Yeah. So this is where simulation comes in. If we go back to what were the key differences between how we used to write software and this new form of AI, one is supercomputing, the other is you need that data or the set of examples to give it so we could go write the function. Well, where are we going to get that data?

to learn the physics of the world around us. How do you gather that data? It doesn't just exist on the internet. The stuff we have on the internet is largely the things that were easy to digitize, which is not stuff in the physical world.

So our thesis is that the only way we're going to get all the data that we need is by essentially taking the physical world and all the laws of the physical world and putting it in a computer, making a simulation of the physical world. Once you have that, you can produce all of the data you need, essentially the training grounds for these AIs to learn about the physical world. You're no longer constrained.

by all of the constraints that we have out here in the real world. We can train faster than time, than the real-world time out here. By just adding more compute, you can go for every real-world second. We can do millions of seconds in the simulated world. Wow. Yeah. Collecting data from the real world is really expensive. Let's take one kind of robot, self-driving cars, autonomous vehicles.

If you want to train a network to perceive a child running across the street in any condition, any lighting condition, any city. Different times of year, so different weather. Yeah, different weather conditions. You're going to have to actually go out there in the real world and have a child run across the street as your car is barreling down the road and capture it.

I mean, first of all, obviously, this is unethical to do and we shouldn't do that.

But then just the tediousness of that, of capturing it in every possible long tail scenario, it's just untenable. You can't do that. It's too expensive and it's just impossible. You know, there are some really rare weather conditions. You might want to have that same condition with volcanic ash falling. That might happen in Hawaii. How can you even construct that scenario, right? But in simulation, we can create it all.

In addition, when you grab data from the real world, you only have half the data you need. We also need to know about what's inside this information and the unstructured information. Labels. Labels, exactly. So with AlexNet, when they trained it, they had not only the image,

but they had the label that said that image is a cat or a dog. When we simulate a world, we can produce the labels perfectly and automatically. You get it for free pretty much. But when you do it in the real world, you have to have an army of humans or some other mechanism of adding the labels and they're going to be inaccurate. Before you deploy it out into the real world, you probably want to make sure it's going to work. We don't want to put a robot brain in a self-driving car,

and just hope that it's going to work when that child runs across the street. The best place to go test that is in a virtual world, in a simulation. There's a really long-winded way to get to, this is essentially what I've been working on in recent years.

Here at NVIDIA, we saw the need for this many years ago, so we started building what we call Omniverse. Omniverse is a "operating system" that we collect all of our simulation and virtual world technologies into. The goal of Omniverse is specifically about doing simulations that are as physically accurate as possible.

That's the key thing. It has to match the real world because otherwise our robots would be learning about laws of physics from something that's just wrong. This is distinctly different than what I did before.

And my work in movies and doing simulations to produce the amazing imagery that we see in visual effects and CGI movies or in video games, that's all about creating really cool looking images that are fun of fantasy worlds, of fake worlds.

there's all kinds of stuff that we're cheating. We add extra lights and makeup and we're breaking the laws of physics in order to make the movie fun and cool or exciting. There is something really poetic about that though. Like it,

It basically goes back to the start of your career, like all this stuff, all these capabilities you all built to emulate the laws of physics, let's say for light transport and just get the material properties right. So the glint, veneer, the reflections and refraction all look really good. That's exactly what you need. Obviously, tuned in a fashion that's physically accurate, as you said. So these robots have kind of a believable digital twin or copy or replica of the real world where they can where they're free to make mistakes and

But also the time dilation aspect that you mentioned where you can scale up and have these like models go do things in the digital realm that like would take forever to do in the physical world. And it feels like there's another piece of this, too, is like you create these digital replicas of the world that becomes the training data. Because as you said, you don't have the Internet to go and pull all this text or image data from.

But then you have the robots try things and there's this like domain gap that this chasm that you need to cross between the simulation and the real world. What are some of the other capabilities that y'all are building to make that happen? Yeah, I kind of oversimplified how we build these AIs to just feed data into robots.

into the supercomputer and out comes this amazing robot brain. That's some of how we do it, but there's many different forms of learning. And I think the one you're touching upon is what's called reinforcement learning. It turns out that these robots, one of the best ways for them to learn is sort of how humans and creatures learn. When a baby is born, a human baby is born into the world,

it still doesn't understand the physics of the world around them. A baby can't see depth, they can't really see color yet, they have to learn how to see color. Over time, over weeks, they start learning those things. They start learning how to classify. They classify mom and dad and siblings and- Apple. Apple, all of those things around. They learn it just through experience.

They also learn about the laws of physics through a lot of experimentation. So when you first start giving your baby food and putting food in front of them, one of the first things they do is drop it or throw it, breaking things, throwing things, making a mess. Those are essentially science experiments. They're all little scientists that are trying things until they learn it. And once they understand how that physics works, they move on. Robots learn in the same way.

through this method called reinforcement learning, where we throw them into a virtual world or into, it could actually be in the real world, but it's too slow to do in the real world. Generally, we do it in the virtual world. We give this robot the ability to perceive and actuate inside that world.

but it doesn't actually know anything. But we give it a goal. We'll say, "Stand up." We have them try millions and millions of iterations of standing up. What you were alluding to, this Isaac Sim, that's our robotic simulator that we've built on top of our Omniverse platform on this "operating system" that allows you to do many of the things you need in order to build robot brains,

One of those things is reinforcement learning. It's almost like a training simulator built on top of Omniverse where it's free to make mistakes. And you're almost like, like you said, I love the notion of wall clock time and speeding that up. You're compressing all these like epochs of learning and evolution down into something that is manageable. And then you plop that into a real world robot and it still works. That's exactly right.

Simulated time is not bound to wall clock time. If I double the amount of compute, double the size of my computer, that's twice the amount of simulation I can do. That's twice the number of simulation hours. So the scaling laws apply here in a profound way. That's pretty magical.

Let's talk a little bit about the applications of physical AI, like obviously applies to so many different fields. We talked about autonomous vehicles. There's like robotic assisted surgery. You alluded to automated warehousing. Could you share some examples of how physical AI is currently impacting these areas and what it's unlocking for these industries that have sort of been stuck in the past? I think the very first place that it's impacting the most, the first area is autonomous vehicles.

The first robots that once we discovered this deep learning machine learning thing, immediately you saw

all of these efforts from different companies to go build autonomous vehicles, whether they're robo-taxis or assistance inside commercial cars. And it's actually become a reality now. Like, I don't know if you've been to San Francisco or Phoenix or... We got Waymo in Austin here, too. Yeah, Waymo. I didn't realize they're in Austin as well. It's pretty awesome. I was in Phoenix a month or so ago at the airport, and...

I was waiting for my Uber and five Waymos picked up these people standing next to me. And it was super mundane. Just another day. Just another day staring at their phones and got into the car like it was nothing. This was unimaginable 10 years ago.

And now it's become mundane. And all of that is powered by these AI algorithms. Now, I don't know exactly what's inside Waymo or any of the other ones, but there's this trend that's happening where we're moving from the kind of earlier generations of AI that are some more specific AI, like AlexNet, where we trained people

these models on very specific datasets, and then we string these different models together to form a whole system. Like task-specific models that you clutch together. Yeah. You put together to these more general purpose unified models that are built on the transformer architecture, the same thing that powers LLMs. We're starting to see these robotics models

that are more general purpose. And that's what we're talking about with physical AI being the next wave. Essentially having these kind of foundation models with general purpose understanding of the physics world around us

that you use as the basis, as the foundation to then fine-tune for your specific purpose. Just like we have LAMA and GPT and the anthropic models, and then from there you go fine-tune those for specific kinds of tasks. We're going to start seeing a lot of new physical AI models that just understand the general laws of physics. Then we'll go take those and fine-tune them to specialize for different kinds of robotic tasks.

and so there's robotic tests it's like you know the roomba in your freaking house versus like you know of course a warehouse robot or even an autonomous vehicle that's right yeah they could be a pick and place robot in a warehouse it could be an amr they're like basically little driving platforms that that zip around in these warehouses and factories they could be drones that are flying around

inside factories, outside. That's what I want, by the way, is I want like a hot latte delivered like on my balcony by a drone, not having to navigate traffic. It's like actually hot and gets to you. Yeah, I'm not sure I'm with you on that one. I don't know if I want to have thousands of drones zipping around my neighborhood, just dropping off lattes everywhere. That's one of the few things that I do by hand and handcraft at home myself. Yeah.

You like your latte art? I make one every morning for my wife. That's like the first thing I do every day. And it kind of grounds me into the world. So I don't need a drone doing that. Fair enough. Fair enough. How do you think about where we are in terms of like physical AI capabilities today? I don't know if like the GPT-1234 nomenclature is the right way to think about it. But I'm curious as you think about where we are now and where we're headed, what do you think about the future?

What stage are we at in terms of the maturity of physical AI capabilities, especially this more general approach to agents that understand and can take action in the physical world? I think we're right at the beginning. I don't know how to relate it exactly to GPT-1234. I'm not sure that works, but we're at the very beginning of this.

That being said, we're also building on the GPT-1234, on the LLMs themselves. The information and data that's fed into these text-based or LLM models is actually still relevant to the physical AI models as well. Inside these descriptions in the text that was used to train them is information about the physical world. We talk about things like the color red and

putting a book on a shelf

And an object falling, those abstract ideas are still relevant. It's just insufficient. If a human has never seen any of those things, never touched or experienced it, only had the words describing the color red, they're not really going to understand it. It's not grounded in the physical world as you said previously. Right. And so they're going to take all of this different modes of information and fuse them together to get a more complete understanding of the physical world around us.

Is a good analogy, like different parts of our brains. Like it's, it seems like these LLMs are really good at reasoning about sort of this like symbolic textual world. And there's all this debate over how far the video models can go and like reproduce the physics of the world. But it sounds like you just create another primitive that kind of works in concert with these other pieces that is actually grounded in the real world and has seen examples of the physical world and all the edge cases that you talked about. And then that system as a whole is far more capable.

Exactly. I think, you know, there is debate over how far you can go with these video models because of the physics of the world. Now, even these, the current more limited video models we have, they're not trained with just video. They're multimodal. There's lots of information coming from non-video sources. There's text and captions and other things that are in there. And so if we can bring

bring in more modes of information like the state of the world that you have inside a simulator. Inside a simulator, we know the position of every object in 3D space. We know the distance of every pixel. We don't just see things in the world, we can touch it, we can smell it, we can taste it. We have multiple sensory experiences that fuse together.

to give us a more complete understanding of the world around us. Like right now, I'm sitting in this chair. I can't see behind my head, but I'm pretty sure if I put my hand behind me here, I'm going to be able to touch the back of the chair. That's proprioception. I know that because I have a model of what the world is around me because I've been able to synthesize that through all of my senses and there's some memory there.

We're essentially replicating the same process, the same basic idea with how we train AIs. The first, the missing piece was this transformer model, this idea that we could just throw all kinds of unstructured data, this thing and it figures out, it creates this general purpose

function that can do all kinds of different things through understanding of complex patterns. So we had that and we need all the right data to pump into it. And so our belief is that a lot, if not most of this data is going to come from simulation, not from what happens to be on the internet. So interesting what your point about

Yeah, the state of the world. Like you have the, to use nerd speak, the 3D scene graph. And as you mentioned, yeah, like the vectors of all the various objects, all this stuff that you take for granted in video games could then be thrown into a transformer along with other image data, maybe decimated to look like a real sensor. And then suddenly you can, like, it'll build an understanding or build a, I've heard it described as like a universal function approximator to figure out how to, yeah, revert

emulate all these other senses like proprioception and all these other things. I think there's like 30 or 40. I was like kind of surprised to hear that we have so many and maybe robots could, I mean, they're not even limited by art. You alluded to LIDAR and lasers earlier, right? Or infrared. And so it's like at some point these robots will be, going back to the start of our conversation, superhuman. Yeah. I mean, we have animals that are superhuman in this way too, right? Bats can see with sound. Yeah.

Yeah, eagles can have got like very focal vision. They can kind of zoom in. Sure, why won't they be superhuman in certain dimensions of sensing the world and acting within the world? Of course, they already are in many respects. We have image classifiers that can classify animals, every breed of dog and plants better than any human can. So true. So we'll certainly do that, at least in certain dimensions. ♪

Hi, I'm Bilal Velsadu, host of TED's newest podcast, The TED AI Show, where I talk with the world's leading experts, artists, journalists, to help you live and thrive in a world where AI is changing everything. I'm stoked to be working with IBM, our official sponsor for this episode. In a recent report published by the IBM Institute of Business Value, among those surveyed, one in three companies pause an AI use case after the pilot phase.

And we've all been there, right? You get hyped about the possibilities of AI, spin up a bunch of these pilot projects, and then crickets. Those pilots are trapped in silos, your resources are exhausted, and scaling feels daunting. What if instead of hundreds of pilots, you had a holistic strategy that's built to scale? That's what IBM can help with. They

They have 65,000 consultants with generative AI expertise who can help you design, integrate, and optimize AI solutions. Learn more at ibm.com slash consulting. Because using AI is cool, but scaling AI across your business, that's the next level.

Your business is modern, so why aren't your operations? It's time for an operations intervention. The PagerDuty Operations Cloud is the essential platform for automating and accelerating critical work across your company. Through automation and AI, PagerDuty helps you operate with more resilience, more security, and more savings. Are you ready to transform your operations? Get started at PagerDuty.com.

So let's talk about looking towards the future a little bit here. So you talked about physical AIs is transforming factories and warehouses. What's your take on the potential in our everyday lives, right? Like, how do you see these technologies evolving to bring robots into our home or personal spaces in really meaningful ways? So it's like as intimate as it possibly can get, right? It's not really a controlled environment either. If you've been watching any of Jensen's keynotes this past year, within the last

10, 12 months or so, there's been a lot of talk of humanoid robots. Absolutely, yeah. And that's kind of all the rage. You're seeing them everywhere. I imagine for many people when they see this, they could just kind of roll their eyes like, oh, yeah, yeah, humanoid robots. We've been talking about these forever. Why does it have to look like a humanoid? Doesn't make more sense to build specialized robots that are really good at specific tasks and

We've had robots in our most advanced factories for a long time and they're not humanoids, they're like these large arms in automotive factories. Why are we talking about humanoid robots? The reason why this is coming up now is because if you take a step back and think about it, if you're going to build a general-purpose robot that can do many different things, the most useful one today is going to be one that's roughly shaped and behaves and acts like a human. Because we built all of these spaces

For humans. For humans. So we built our factories, our warehouses, our hospitals, our kitchens, our retail spaces. There's stairs and ramps and shelves. And so if we can build a general purpose robot brain, then the most natural kind of physical robot to build is...

to put that brain in for it to be useful would be something that's human-like because we could then take that robot and plop it into many different environments where it could be productive and do productive things. Many companies have realized this and they're going all in on that. We're bullish on it. I think even within this space though, there are specializations. Not every humanoid robot is going to be perfect for every task that

that a human can do. Actually, not all humans are good at every task. Some humans are better at playing baseball and some are better at chopping onions. Astronauts have a certain criteria, right? That's right. So we're going to have many companies building more specialized kind of humanoids or in different kinds of robots. The ones that we're immediately focused on are the ones in industry.

We think this is where they're going to be adopted the most, the quickest, and where it's going to make the most impact. Everywhere we look globally, including here in the US, there's labor shortages in factories, warehouses, transportation, retail. We don't have enough people to stock shelves.

And the demographics are such that that's just going to get worse and worse. So there's a huge demand for humanoid robots that could go work in some of these spaces. I think as far as in our personal space, a robot that can work side by side with a human in a factory or a warehouse should also be able to work inside your kitchen in your home. How quickly those kinds of humanoid robots are going to be accepted, there'll be a market for it.

I think it's going to depend on which country we're talking about because there's a very cultural element. Bringing a robot into your home, another entity, some other thing that's human-like into your home, that's very personal. And God forbid it makes your latte for you. Exactly. I don't want to do that in my kitchen. I don't even want other humans in there in the morning. But there's cultural elements here. In the U.S. and the West in general, we're probably a bit

more cautious or careful about robots. In the East, especially countries like Japan. Totally, that's where my head was going. They love them, right? And they want it. But industry everywhere needs it now. Right, yeah. And so for industrial applications, I think it makes sense to start there and then we can take those technologies into the consumer space and the markets will explore where they fit

the best at first, but eventually we'll have them everywhere. It's so fascinating to think about how many technologies that their early adopters are of, including virtual avatars and things like that. But sort of bridging virtual and the physical, the technologies you all are building aren't just limited to robots, right? As this tech improves spatial understanding, they could enhance our personal devices, sort of virtual assistants,

How close do you think we are to that sort of, you know, in real life Jarvis experience, a virtual assistant that can seamlessly understand and interact with our physical environment, even if it's not embodied as a robot? So this gets back to what I was saying earlier about the definition of a robot. What is a robot? Totally. The way you just talked about that, like to me, Jarvis is actually a robot. It does those three things. It perceives the world around us. Yep.

through many different sensors, it makes some decisions and it can even act upon the world. Like Jarvis inside the Avengers movies. Yeah. It can actually go activate the Iron Man suit. Right, yeah. And do things there, right? Like, so what is the difference between that and a C-3PO? Totally. Fundamentally. You're kind of inside a robot, sort of as you alluded to the NVIDIA building too, yeah. And if you think about some of these XR devices that immerse us into the world, they're half a robot. There's the perception...

part of it. There's the sensors along with some intelligence to do the perception, but then it's fed into a human brain and then the human makes some decisions and then it acts upon the world. Right. And when we act upon the world, there's maybe some more software, some even AI doing things inside the simulation of that world or that combination. So it's not black or white. What's a robot and

What's a human or human intelligence where there's kind of a spectrum between these things. We can augment humans with artificial intelligence. We're already doing it. Every time you use your phone to ask a question, you go to Google or perplexity or something, you're adding AI, you're augmenting yourself with AI there by asking ChatGPT a question. It's that blend of

AI with a Jarvis experience that's immersive with XR, it's just making it so that the that loop is faster with the augmentation. You beautifully set up my last question, which is as AI is becoming infused in not just the digital world, but the physical world, I have to ask you, what can go wrong and what can go right?

Well, with any powerful technology, there's always going to be ways things can go wrong. This is the most powerful of technologies potentially that we have ever seen. So we have to be, I think, very careful and deliberate about how we deploy these technologies to ensure that they're safe. So in terms of deploying AIs into the physical world,

I think one of the most important things we have to do is ensure that there's always some human in the loop somewhere in the process, that we have the ability to turn it off, that nothing happens without our explicit knowledge of it happening and without our permission.

We have a system here. We have sensors all around our building. We can kind of see where people are, which areas they're trafficking the most. At night, we have robotic cleaners. They're like huge Roombas.

that go clean our floors. We direct them to the areas that people have actually been and they don't bother the areas that haven't been trafficked at all to optimize them. We're going to have lots of systems like that. That's a robotic system. That's essentially a robot controlling other robots. But we need to make sure that there's humans inside that loop somewhere, deploying that, watching it, and ensuring that we can stop it, and pause it, and do whatever is necessary.

So the other part of the question was, what are the good things that are going to come out of this? We touched on a bunch of those things there, but ultimately, being able to apply all of this computing technology and intelligence to things around us in the physical world, I can't even begin to imagine the potential for the increase in productivity. Just look at something like agriculture. If you have effectively unlimited workers,

who can do extremely tedious things like pull out one weed at a time and thousands of acres of fields go through and just identify where there's a weed or a pest and take them out one by one. Then maybe we don't need to blanket these areas with pesticides, with all these other techniques that harm the environment around us, that harm humans. We can...

Essentially, the primary driver for economic productivity anywhere is the number of people we have in a country. I mean, we measure productivity with GDP, gross domestic product, and we look at GDP per head. That's the measure of efficiency, right? But it always correlates with the number of people. Countries that have more people

have more GDP. When we take physical AIs and apply them to the physical world around us, it's almost like we're adding more to the population.

And the productivity growth can increase. And it's even more so because the things that we can have them do are things that humans can't or won't do. They're just too tedious and boring and awful. So you find plenty of examples of this in manufacturing, in warehouses, in agriculture, in transportation. Look, we keep talking about transportation being the CG issue right now. Truck drivers, we don't have enough of them out there.

This is essentially a bottleneck on productivity for a whole economy. Soon, we're effectively going to have an unlimited number of workers who can do those things. Then we can deploy our humans to go do all the things that are fun for us, that we like doing. I love that. It's like we're finally going to have technology that's fungible in general enough where we can reimagine all these industries and yet let humans do the things that are enriching and fulfilling.

and perhaps even have a world of radical abundance. I know that's a little trendy thing to say, but it feels like when you talk about that, it sounds like a world of radical abundance. Do you feel that way? I do. I do. I mean, if you just think about everything I said from first principles, why won't that happen? If we can manufacture intelligence and this intelligence can go drive, be embodied in the physical world and do things inside the physical world for us,

Why won't we have radical abundance? I mean, that's basically it. I love it. Thank you so much for joining us, Rev. Thank you for having me. It's always fun talking to you. Okay, as I wrap up my conversation with Rev, there are a few things that come to mind. Oh my God, NVIDIA has been playing the long game all along. They found just the right wedge, computer gaming, to de-risk a bunch of this fundamental technology that has now come full circle.

Companies and even governments all over the world are buying NVIDIA GPUs so they can train their own AI models, creating bigger and bigger computing clusters, effectively turning the CEO, Jensen Huang, into a bit of a kingmaker. But what's particularly poetic is how all the technologies they've invested in are the means by which they're going to have robots roaming the world. We are creating a digital twin of reality, a mirror world, if you will.

And it goes far beyond predicting an aspect of reality like the weather. It's really about creating a full fidelity approximation of reality where robots can be free to make mistakes and be free from the shackles of wall clock time. I'm also really excited about this because creating this type of synthetic training data has so many benefits for us as the consumer.

For instance, training robots in the home. Do we really want a bunch of data being collected in our most intimate locations inside our houses? Synthetic data provides a very interesting route to train these AI models in a privacy-preserving fashion. Of course, I'm left wondering if that gap between simulation and reality can truly be overcome. But what it seems is that gap is going to continually close further.

Who knew? Everyone was throwing shade on the metaverse when it first hit public consciousness. Like, who really wants this 3D successor to the internet? Now I'm thinking maybe the killer use case for the metaverse isn't for humans at all, but really it's for robots.

The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Girard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker. And our engineer is Asia Pilar Simpson. Our researcher and fact checker is Christian Aparta. Our technical director is Jacob Winnick. And our executive producer is Eliza Smith.

And I'm Bilal Velsadu. Don't forget to rate and comment, and I'll see you in the next one.