We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode How AI robots learn just like babies—but a million times faster w/ NVIDIA’s Rev Lebaredian

How AI robots learn just like babies—but a million times faster w/ NVIDIA’s Rev Lebaredian

2024/12/3
logo of podcast The TED AI Show

The TED AI Show

AI Deep Dive AI Insights AI Chapters Transcript
People
B
Belaval
R
Rev Lebaredian
Topics
Belaval: 我认为计算机在许多领域已经超越了人类,但在机器人掌握物理智能方面仍然面临挑战。人类的物理技能是通过多年的练习和经验积累而获得的,而现实世界的练习是无法加速的。 然而,NVIDIA已经开发出强大的模拟环境,使机器人能够以超快的速度学习和掌握新技能。数百万次的重复练习,对于人类来说可能需要数年时间,但在模拟环境中只需几分钟就能完成。这在自动驾驶汽车领域已经有所体现,但其潜力远不止于此。通过构建能够理解物理世界的AI,NVIDIA为能够彻底改变各个行业、辅助复杂手术甚至帮助家务的机器奠定了基础。 那么,机器人如何发展出一种物理直觉呢?随着我们不断突破机器人的界限,未来会面临哪些挑战和机遇呢? Rev Lebaredian: 我的职位是NVIDIA Omniverse和模拟技术的副总裁,这是一个相对较新的概念。我职业生涯的早期是在好莱坞从事视觉特效工作,为《金刚》和《精灵鼠小弟》等电影制作高保真、高质量的奇幻世界。23年前,我加入NVIDIA,希望将我在电影中所做的工作——实时、快速地创建高保真、高质量的奇幻世界图像——通过GPU的计算能力,从电影中的线性体验转变为视频游戏或XR等沉浸式体验中的交互式体验。 NVIDIA从一家游戏硬件公司发展成为AI和模拟领域的领导者,其核心是加速计算。公司的创立理念是,通用计算机(基于CPU的架构)虽然强大,但在某些计算问题上速度不够快。摩尔定律虽然在一段时间内使计算能力快速提升,但我们认为,如果等待摩尔定律提供足够的计算能力来进行某些操作(例如视频游戏的计算机图形渲染),我们将不得不等待数十年甚至数百年。因此,NVIDIA致力于创造一种新的计算形式,它并非万能,但可以解决通用计算机无法解决的许多问题。我们称之为加速计算,并发明了GPU。 我们首先解决的是视频游戏中图像的3D渲染问题。在NVIDIA成立的1993年,这个市场还不存在。可编程着色技术的引入使GPU能够用于更多领域,而非仅仅是电脑图形渲染。在2000年代初期,我们为GPU添加了一个关键特性,即可编程着色,它模拟光线与3D世界中材料的交互方式。这使得塑料看起来像塑料,铝看起来像铝,木材看起来像木材。在此之前,我们能够使用的着色器和材料种类非常有限,这使得视频游戏看起来非常简单或卡通化,而不是逼真。在电影世界中,我们不受时间限制,可以花费数小时进行渲染。因此,电影中计算机生成的图像质量与视频游戏中看到的图像质量之间存在巨大差异。我们引入了可编程着色,这一特性使我们能够将相同的GPU用于计算机图形和渲染以外的更多用途。 很快,我们看到并非从事计算机图形学的其他研究人员也开始利用GPU的计算能力。他们将其他类型的物理问题(例如分子动力学和流体动力学)表述为计算机图形学问题。当我们意识到这种情况时——人们愿意将自己扭曲成使用图形API来做其他事情——我们说,让我们为他们简化操作。我们于2006年引入了CUDA,这是一种更自然的方式,可以在我们的GPU上编程非图形的通用事物。我们基本上等了六七年,看看杀手级应用是什么。我们设想某个开发者(可能是一位研究生)会想出一些令人惊叹的事情来利用这种计算能力,这花了一段时间。 2012年底,大约七年后,我们终于迎来了那一刻。发生的事情是多伦多大学的两名研究生和他们的教授——Ilya Sutskever、Alex Krizhevsky和他们的教授Geoffrey Hinton(他刚刚获得诺贝尔奖)——在2012年底发表论文时,他们使用名为AlexNet的深度学习神经网络击败了所有图像分类基准测试。这彻底改变了一切。这令人难以置信,因为在那之前,基本上所有其他针对ImageNet基准测试的方法都没有真正获胜,因为没有采用这种深度学习方法。这是深度学习第一次在计算机视觉领域让所有人为之震惊。 这有点疯狂,因为最初是从可编程着色器开始,试图让好莱坞的电影效果实时在你的电脑上运行。但正如你所说,同样的能力开启了计算机视觉的全新世界,并且肯定引起了全世界的关注,特别是你们(NVIDIA)的关注,我猜想比其他任何人都早。这是完全正确的。这似乎违反直觉,这个用来创建图像的东西竟然是你构建智能所需要的东西。但实际上,这一切都归结于计算。我们为计算机图形而构建的计算形式,我们处理大量的像素、大量的三角形、大量的在场景中四处反弹的光线。同样的计算形式也是你进行所有张量数学、所有矩阵数学所需要的东西。图像分类问题是一个长期存在的问题,我们都知道如果能够解决它将是多么伟大。自20世纪50年代以来,人们一直在试图解决这个问题。 这是一件非常有用的事情,能够自动区分你提供给计算机的图像中的内容。在那之前,我们会找一个非常聪明的人,一位计算机科学家,这个人会想象出一个能够进行图像分类的算法,然后将大脑中的内容转换成计算机并生成一个程序。这里发生的变化是,我们第一次能够创建一个算法来解决任何人都无法想象的问题。我们解决这个问题的方法是使用一台大型计算机(实际上是一台超级计算机)。我们向它提供了数百万个图像示例,并说,当你看到一个看起来像这样的图像时,那就是一只猫。当你看到一个看起来像这样的图像时,那就是一条狗。当你看到这个图像时,那就是一架飞机。我们做了足够多次,它就编写了软件,编写了算法,能够进行图像分类。因此,它做得比任何人类能够想象的算法都要好。 这太疯狂了,对吧?你谈论的是人类编写软件的时代。现在软件正在编写软件。这是正确的。有两个基本要素,一台超级计算机,大量的计算,你给它一大堆数据或你希望它做的事情的例子,它会根据你给它的例子为你找出算法。第一个,构建大型计算机,这是我们的强项,对吧?这是NVIDIA擅长的事情。我们喜欢构建强大的计算机并对其进行扩展。这就是十多年前我们开始做的事情。NVIDIA最近的爆炸式增长本质上是......因为我们十多年前押注这些大型计算机将非常有用。这就是现在每个人都在争先恐后想要的东西。他们正在建立这些AI超级计算机。 是的,每个国家和公司都想要更多你们的GPU。当然,最近的需求主要来自大型语言模型和扩散模型,我们在播客中讨论过很多次。但很有趣,就像ChatGPT一样酷,能够输入提示并获得图像一样酷,这些东西并不是圣杯。这些系统有其局限性,对吧?在我们转向物理AI时,你能谈谈这一点吗?是的,这是完全正确的。当我们意识到这种变化有多么深刻时,我们能够通过这种新的深度学习和AI技术来产生我们从未想过在我们有生之年能够拥有的算法。我们问自己的下一个问题是, 既然我们有了创造这些令人惊叹的新事物的可能性,我们应该去创造哪些呢?哪些将是最有价值和影响力的事情呢?现在,如果你退一步想想计算行业、IT行业,它在全球每年大约在2万亿到5万亿美元之间,这是一个巨大的数字,对吧?这是一个非常大的行业。然而,所有其他行业,那些与我们的物理世界相关的行业,原子世界,那是100万亿美元。这包括交通运输、人员运输、货物运输等市场。它包括制造业,即重新组合原子形成产品。它包括药物发现和设计,将原子重新组合成药物,等等。像所有这些关于我们物理世界的方面, 至少就人类通过市场对它们的价值而言,比信息要高得多。现在,信息是我们最容易数字化的事情。因此,我们使用这种新的机器学习、深度学习AI技术开发的第一个算法是有道理的,它将使用我们 readily available 的所有数据,这基本上就是互联网上的数据。但是,如果我们能够以某种方式利用这种新的超能力, 并将其应用于原子领域,我们就能解锁这个价值100万亿美元的市场。所有这些市场都采用制造业,例如。我们将IT和计算应用于制造业等市场。但是,如果你走进一家工厂,它与50年前的工厂并没有太大区别。它们在很大程度上没有受到计算的影响。 我们之所以无法做到这一点,是因为我们还没有真正建立起物理世界和计算世界之间的桥梁。连接比特和原子,宝贝。让我们开始吧。是的。如果你再考虑一下,桥梁本质上是机器人。完全正确。因此,我们考虑了这一点,并说,这现在可能成为现实。机器人技术,长期以来一直是一个梦想。但我们一直缺少的是构建机器人所需的基本算法, 一个真正有用的机器人大脑,以便我们能够将计算应用于现实世界。那么什么是机器人呢?机器人本质上是我们现实世界中的一个代理,它做三件事,并且循环地做这三件事。机器人是 感知我们周围的世界,物理世界。它通过传感器输入世界。它们可以是摄像头、激光雷达和雷达,各种传感器,无论传感机制是什么。它对输入的内容进行一些理解。它理解输入的内容。本质上,第一个神经网络AlexNet就是这样做的。它从现实世界中获取一些信息,一张图像, 照片,并理解其中的内容。接下来它做的事情是,物理世界中的机器人代理,它获取这些信息,即感知到的信息,并做出一些决策。决定它应该如何行动,它计划并决定它将如何影响世界。 第三件事是驱动。它实际上在世界上做一些事情。因此,一旦它做出决定,它就会做一些实际上移动或影响物理世界的事情。一旦发生这种情况,它就是一个循环。你感知你对世界的改变, 更新你的决定和计划,然后驱动。根据这个定义,许多东西都是机器人,而不仅仅是我们通常认为的机器人,比如C-3PO或R2-D2。自动驾驶汽车绝对是机器人。它必须感知周围的世界。其他汽车、停车标志、行人、骑自行车的人在哪里?它们的速度有多快?我周围的世界是什么样的? 在汽车周围,做出一些关于如何到达最终目的地的决定,并驱动、转向、制动或加速,而这个东西在一个循环中运行。如果你这样定义的话,很多东西都是机器人。我现在所在的建筑,也就是我们的Endeavor大楼,我们的总部, 每天当我进入它时,在接待区,我们有旋转门。那里有传感器。有一些摄像头。他们知道我什么时候走到旋转门前。它感觉到我走近了,然后根据图像分类算法决定我是谁, 与最初的AlexNet类似。一旦它确定我是Rev,它就可以在我的数据库中查找我,我是否应该有访问权限,然后它就会在世界上驱动。它打开旋转门让我通过,并在某个地方更新一些计数,现在我已经进入主要区域了。 因此,这座建筑本质上就是一个机器人。因此,如果你这样考虑机器人,并将机器人系统视为计算与价值100万亿美元的与物理世界打交道的行业的桥梁,你就会开始兴奋起来。你会想,哇,我们现在有可能对许多其他行业产生重大影响。 因此,关于这一点,我的意思是,这很有趣,对吧?你谈论的是几十年来工厂没有发生变化。你是对的。有一些企业资源计划软件来跟踪物品的库存以及物品的移动方式。但是原子世界并没有像比特世界那样取得那么多进步。为了释放这些基于物理的行业中巨大的、物理的、巨大的机会,缺少的部分是什么?我们今天没有的是什么?你们正在构建什么来实现这一点? 是的。这就是模拟发挥作用的地方。如果我们回到我们过去如何编写软件以及这种新的AI形式的关键区别,一个是超级计算,另一个是你需要那些数据或示例集来提供它,以便我们能够编写函数。那么,我们从哪里获取这些数据呢? 来学习我们周围世界的物理规律。你如何收集这些数据?它并不存在于互联网上。我们在互联网上拥有的东西主要是那些易于数字化的东西,而不是物理世界中的东西。 因此,我们的论点是,我们获得所需所有数据的唯一方法是,基本上将物理世界和所有物理世界的定律放入计算机中,创建一个物理世界的模拟。一旦你有了它,你就可以产生所有你需要的数据,基本上是这些AI学习物理世界的训练场。你不再受限于 我们现实世界中存在的所有限制。我们可以比现实世界中的时间更快地进行训练。只需增加更多的计算能力,你就可以每秒进行数百万秒的模拟。哇。从现实世界收集数据非常昂贵。让我们以一种机器人为例,自动驾驶汽车,自动驾驶车辆。 如果你想训练一个网络来感知一个孩子在任何条件下穿过街道,任何光照条件,任何城市。一年中的不同时间,所以不同的天气。是的,不同的天气条件。你将不得不实际走到现实世界中,让一个孩子在你车飞驰而过时穿过街道,并捕捉到它。 我的意思是,首先,很明显,这样做是不道德的,我们不应该这样做。 但仅仅是它的繁琐性,在每种可能的长期尾部场景中捕捉它,这简直是站不住脚的。你做不到。这太昂贵了,而且根本不可能。你知道,有一些非常罕见的天气条件。你可能希望在火山灰飘落的相同条件下,这可能发生在夏威夷。你甚至如何构建这种场景呢?但在模拟中,我们可以创造这一切。 此外,当你从现实世界获取数据时,你只有所需数据的一半。我们还需要了解这些信息和非结构化信息内部的内容。标签。标签,没错。因此,对于AlexNet,当他们训练它时,他们不仅有图像, 但他们还有标签,说明该图像是猫还是狗。当我们模拟一个世界时,我们可以完美地自动生成标签。你几乎可以免费获得它。但是,当你在现实世界中这样做时,你必须有一支人类军队或其他一些添加标签的机制,而且它们将是不准确的。在你将其部署到现实世界之前,你可能希望确保它能够工作。我们不想将机器人大脑放入自动驾驶汽车中。 并且只是希望当那个孩子穿过街道时它会工作。去测试它的最佳地点是在虚拟世界中,在模拟中。这是一个非常冗长的方式来达到,这本质上是我近年来一直在研究的。 在NVIDIA,我们多年前就看到了这种需求,所以我们开始构建我们所谓的Omniverse。Omniverse是一个“操作系统”,我们将所有模拟和虚拟世界技术都收集到其中。Omniverse的目标是专门进行尽可能物理精确的模拟。 这是关键。它必须与现实世界相匹配,否则我们的机器人将学习来自错误的物理定律。这与我之前所做的工作截然不同。 以及我在电影中的工作,以及进行模拟以制作我们在视觉特效和CGI电影或视频游戏中看到的令人惊叹的图像,这一切都是为了创造看起来非常酷的奇幻世界图像,虚假世界。 有各种各样的东西我们都在作弊。我们添加额外的灯光和化妆,并且为了使电影有趣、酷炫或令人兴奋,我们正在违反物理定律。尽管如此,这其中确实有一些诗意的东西。 它基本上可以追溯到你职业生涯的开始,就像所有这些东西,你们构建的所有这些能力来模拟物理定律,比如光传输,以及使材料特性正确。因此,反光、光泽、反射和折射都看起来非常好。这正是你所需要的。正如你所说,显然以物理精确的方式进行了调整。因此,这些机器人拥有某种可信的数字孪生、副本或现实世界的复制品,它们可以自由地犯错,并且 但还有你提到的时间膨胀方面,你可以扩展并让这些模型在数字领域做一些事情,就像在物理世界中需要永远才能做的事情一样。而且感觉这还有另一部分,就是你创建这些现实世界的数字复制品,它成为训练数据,因为正如你所说,你没有互联网可以从中提取所有这些文本或图像数据。 但是,你让机器人尝试一些事情,并且存在这种模拟与现实世界之间需要跨越的领域差距。你们正在构建的其他一些能力是什么,以实现这一点?是的,我有点过于简化了我们如何构建这些AI,只是将数据输入机器人。 进入超级计算机,然后就会出现这个令人惊叹的机器人大脑。这就是我们做的一些事情,但还有许多不同的学习形式。我认为你正在谈论的是所谓的强化学习。事实证明,这些机器人,学习的最佳方法之一就像人类和生物学习一样。当一个婴儿出生时,一个婴儿出生在这个世界上, 它仍然不理解周围世界的物理规律。婴儿看不见深度,他们还不能真正看到颜色,他们必须学习如何看颜色。随着时间的推移,几周后,他们开始学习这些东西。他们开始学习如何分类。他们对妈妈、爸爸、兄弟姐妹和苹果、保罗、苹果等周围的所有东西进行分类。他们只是通过经验来学习。 他们还通过大量的实验来学习物理定律。因此,当你第一次开始给你的婴儿食物并将食物放在他们面前时,他们做的第一件事之一就是扔掉它或扔掉它,打破东西,扔东西,弄得一团糟。这些本质上是科学实验。他们都是小科学家,他们尝试各种东西直到他们学会它。一旦他们理解了物理规律是如何运作的,他们就会继续前进。机器人以同样的方式学习。 通过这种称为强化学习的方法,我们将它们扔进虚拟世界或现实世界,但在现实世界中这样做太慢了。通常,我们是在虚拟世界中进行的。我们赋予这个机器人感知和驱动虚拟世界中的能力。 但它实际上什么也不知道。但我们给它一个目标。我们会说,“站起来”。我们让他们尝试数百万次站起来。你所暗示的,这个Isaac Sim,这是我们构建在我们Omniverse平台之上的机器人模拟器,在这个“操作系统”上,它允许你做许多你需要做的事情来构建机器人大脑, 其中一件事情就是强化学习。它几乎就像一个构建在Omniverse之上的训练模拟器,它可以自由地犯错。而且你几乎就像,就像你说的,我喜欢时钟时间和加速它的概念。你将所有这些学习和进化的时期压缩成一些可管理的东西。然后你把它放到一个现实世界的机器人中,它仍然有效。这是完全正确的。 模拟时间不受时钟时间的限制。如果我将计算量增加一倍,将计算机的大小增加一倍,那么我就可以进行两倍的模拟,也就是两倍的模拟小时数。因此,缩放定律在这里以一种深刻的方式适用。这真是神奇。 让我们谈谈物理AI的应用,它显然适用于许多不同的领域。我们谈到了自动驾驶汽车。有机器人辅助手术。你提到了自动化仓库。你能分享一些物理AI目前如何影响这些领域以及它为这些过去停滞不前的行业解锁了什么方面的例子吗?我认为它影响最大的第一个地方,第一个领域是自动驾驶汽车。 一旦我们发现了这种深度学习机器学习的东西,我们立即看到的第一批机器人 所有这些来自不同公司的努力,去建造自动驾驶汽车,无论是机器人出租车还是商用汽车内的辅助系统。它现在已经成为现实。就像,我不知道你是否去过旧金山或凤凰城或......我们在奥斯汀也有Waymo。是的,Waymo。我没想到他们也在奥斯汀。这太棒了。大约一个月前我在凤凰城机场,而且...... 我在等我的Uber,五辆Waymo接走了站在我旁边的人。这非常普通。只是又一天。只是又一天盯着他们的手机,就像什么也没发生一样上了车。这是十年前无法想象的。 现在它已经变得司空见惯了。所有这些都是由这些AI算法驱动的。现在,我不知道Waymo或任何其他公司的内部情况,但有一种趋势正在发生,我们正在从更具体的AI转向更通用的统一模型,这些模型建立在转换器架构之上,这与大型语言模型相同。我们开始看到这些机器人模型 更通用。这就是我们所说的物理AI以及下一波浪潮。本质上,拥有这些对我们周围物理世界有普遍理解的基础模型, 你用它作为基础,作为基础,然后根据你的特定目的进行微调。就像我们有LAMA和GPT和anthropic模型一样,然后从那里你对特定类型的任务进行微调。我们将开始看到许多新的物理AI模型,它们只是理解物理世界的普遍规律。然后我们将采用这些模型,并对其进行微调,使其专门用于不同类型的机器人任务。 因此,机器人测试就像你知道你家里的Roomba一样,当然还有仓库机器人,甚至还有自动驾驶汽车,没错,它们可以是仓库中的拾取和放置机器人,它们可以是AMR,它们基本上是小型驾驶平台,在这些仓库和工厂中穿梭,它们可以是在工厂内部、外部飞行的无人机,这就是我想要的,顺便说一句,我希望像我的阳台上一样,通过无人机送来一杯热拿铁,而不必在交通中穿梭。而且它是热的,而且会送到你那里。是的,我不确定我是否同意你这一点。我不知道我是否想让数千架无人机在我的社区周围飞来飞去,到处投放拿铁。这是我亲自在家手工制作的为数不多的几件事之一。是的。 你喜欢你的拿铁艺术吗?我每天早上为我的妻子做一杯。这几乎是我每天做的第一件事。它让我融入这个世界。所以我不需要无人机这样做。说得对。说得对。你如何看待我们目前在物理AI能力方面所处的位置?我不知道GPT-1234命名法是否是思考这个问题的正确方法。但我很好奇,当你考虑我们现在所处的位置以及我们要去的地方时,你对未来有什么看法? 在物理AI能力的成熟度方面,特别是这种对能够理解并能够在物理世界中采取行动的代理的更通用的方法,我们处于哪个阶段?我认为我们正处于起步阶段。我不知道如何将其与GPT-1234精确地联系起来。我不确定这是否有效,但我们正处于这一阶段的开始。 话虽如此,我们也在构建GPT-1234,以及大型语言模型本身。输入这些基于文本的或大型语言模型的数据实际上也与物理AI模型相关。在用于训练它们的文本描述中,包含了关于物理世界的的信息。我们谈论的是红色等事物,以及 将书放在书架上 以及物体下落,这些抽象的概念仍然是相关的。它只是不够充分。如果一个人从未见过这些东西,从未触摸或体验过它,只有描述红色的词语,他们就不会真正理解它。正如你之前所说,它没有扎根于物理世界。对。因此,他们将采用所有这些不同的信息模式并将它们融合在一起,以更全面地理解我们周围的物理世界。 一个很好的类比就像我们大脑的不同部分?就像这些大型语言模型非常擅长推理这种象征性的文本世界一样。关于视频模型能够走多远以及如何再现世界的物理规律,存在着各种争论。但听起来你只是创建了另一个与这些其他部分协同工作的基元,它实际上扎根于现实世界,并且已经看到了物理世界以及你所谈到的所有极端情况的例子。然后,作为一个整体的系统,它具有更强大的能力。 没错。我认为关于你能用这些视频模型走多远存在争议,因为有物理世界的存在。现在,即使是我们现有的更有限的视频模型,它们也不是只用视频进行训练的。它们是多模式的。来自非视频来源的信息很多。有文本和字幕以及其他包含在其中的内容。因此,如果我们可以 引入更多模式的信息,例如你在模拟器中拥有的世界状态。在模拟器中,我们知道3D空间中每个物体的位移。我们知道每个像素的距离。我们不仅看到世界上的事物,我们还可以触摸它,我们可以闻到它,我们可以尝到它。我们有多种感官体验融合在一起。 为了让我们更全面地了解我们周围的世界。就像现在,我坐在这个椅子上。我看不见我的头顶后面,但我敢肯定,如果我把我的手放在我身后这里,我将能够触摸到椅子的后面。这就是本体感受。我知道这一点,因为我对周围的世界有一个模型,因为我能够通过我所有的感官来综合它,并且那里有一些记忆。 我们基本上是在复制相同的过程,相同的基本思想,以及我们如何训练AI。首先,缺少的部分是这个转换器模型,这个想法是我们只是抛出各种非结构化数据,这个东西,它会找出,它会创建这个通用模型。 它可以通过理解复杂的模式来做各种不同的事情。所以我们有了它,我们需要所有正确的数据来输入它。因此,我们的信念是,很多,如果不是大多数的话,这些数据将来自模拟,而不是来自互联网上恰好存在的数据。所以你关于 是的,世界状态的观点很有趣。就像你拥有,用书呆子的话来说,3D场景图。正如你提到的,是的,就像各种物体的向量,所有这些你在视频游戏中认为理所当然的东西,然后可以与其他图像数据一起扔进转换器中,也许可以将其简化为看起来像一个真实的传感器。然后突然之间,你可以,就像,它会建立一种理解或建立一种,我听说它被描述为一种通用的函数逼近器,来找出如何,是的,恢复 模拟所有这些其他感觉,如本体感受以及所有这些其他东西。我认为大约有30或40种。我听到我们有这么多时有点惊讶。也许机器人可以,我的意思是,它们甚至不受艺术的限制。你之前提到了激光雷达和激光,对吧?或者红外线。因此,就像在某些时候,这些机器人将,回到我们谈话的开始,超人。是的。我的意思是,我们也有在某些方面是超人的动物,对吧?蝙蝠可以用声音来感知。是的。 是的,鹰拥有变焦视觉。它们可以放大。当然,为什么它们不会在感知世界和在世界中行动的某些维度上成为超人呢?当然,在许多方面它们已经是了。我们有图像分类器,它可以比任何人都更好地对动物、所有品种的狗和植物进行分类。千真万确。因此,我们当然会这样做,至少在某些维度上是这样。♪ 在与Rev结束谈话时,我想到了几件事。哦,我的上帝,NVIDIA一直在玩长期游戏。他们找到了合适的切入点,电脑游戏,来降低所有这些基础技术的风险,而这些技术现在已经完全转变了。 世界各地的公司甚至政府都在购买Nvidia GPU,以便他们可以训练自己的AI模型,创建越来越大的计算集群,有效地将CEO黄仁勋变成了一种造王者。但特别诗意的是,他们投资的所有技术都是他们将让机器人漫游世界的工具。 我们正在创建一个现实的数字孪生,一个镜像世界,如果你愿意的话。它远远超出了预测天气等现实方面。它实际上是关于创建一个现实的完全保真近似值,机器人可以在其中自由地犯错,并且可以摆脱时钟时间的束缚。我也非常兴奋,因为创建这种类型的合成训练数据对我们作为消费者有很多好处。 例如,在家里训练机器人。我们真的想在家里最私密的地方收集大量数据吗?合成数据提供了一条非常有趣的途径,可以以保护隐私的方式训练这些AI模型。当然,我仍然想知道模拟与现实之间的差距是否真的能够克服。但看起来,差距将不断缩小。 谁知道呢?当元宇宙第一次进入公众意识时,每个人都在对它嗤之以鼻。就像,谁真的想要这个互联网的3D继承者呢?现在我认为元宇宙的杀手级用例根本不是为人类服务的,而是为机器人服务的。

Deep Dive

Key Insights

Why are robots struggling to master physical intelligence compared to humans?

Robots lack the years of practice and learned experiences that humans have accumulated through a lifetime of physical interactions. While humans can instinctively calculate trajectories and movements, robots require extensive training in simulated environments to achieve similar capabilities.

How does NVIDIA's simulation technology help robots learn faster?

NVIDIA's simulated environments allow robots to practice and learn at a supercharged pace, compressing tens of millions of repetitions that would take humans years into minutes. This accelerates the development of physical intelligence, enabling robots to master new skills much more quickly.

What is the potential market size for physical AI applications?

The market for physical AI is estimated to be around $100 trillion, encompassing industries like transportation, manufacturing, and drug discovery. This is significantly larger than the $2-5 trillion IT industry, highlighting the vast potential for AI to transform physical world industries.

What is the role of simulation in training robots for the real world?

Simulation allows robots to gather the necessary data to learn the physics of the real world without the constraints of the physical environment. It enables robots to practice in virtual worlds where they can make mistakes and learn from them, compressing real-world time into simulated time.

How does reinforcement learning help robots develop physical intelligence?

Reinforcement learning mimics how humans and animals learn, allowing robots to experiment and learn from their mistakes in a virtual environment. This method is particularly effective for robots to develop an understanding of the physical world through trial and error, similar to how babies learn.

What are some current applications of physical AI in industries?

Physical AI is currently transforming industries like autonomous vehicles, robotic-assisted surgery, and automated warehousing. For example, autonomous vehicles like Waymo are already being used in cities, and robots are being deployed in factories and warehouses to address labor shortages.

Why are humanoid robots gaining attention for general-purpose tasks?

Humanoid robots are seen as the most natural form for general-purpose tasks because they can navigate and interact with environments designed for humans. Their human-like shape allows them to be deployed in various settings, from factories to homes, making them versatile for multiple applications.

What are the potential benefits of physical AI in everyday life?

Physical AI has the potential to increase productivity by automating tedious and dangerous tasks, freeing humans to focus on more fulfilling work. It could also lead to a world of radical abundance by addressing labor shortages and improving efficiency across industries like agriculture, manufacturing, and transportation.

What challenges remain in bridging the gap between simulation and reality for robots?

The main challenge is ensuring that robots trained in simulations can effectively transfer their skills to the real world. While simulation provides a controlled environment for learning, the real world is unpredictable, requiring continuous refinement and testing to close the gap between simulation and reality.

Shownotes Transcript

Translations:
中文

Hey, Belaval here. Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible.

The world of AI is advancing at an incredible pace. And it's no secret that in many areas, computers have long outperformed humans. But there's been one area that's been tough for robots to master.

Physical intelligence. We've talked a lot on this podcast about text and image generation, technologies that took years of research, immense computational power, and vast datasets to develop. But when compared to mapping 3D spaces and predicting the chaotic randomness of the real world, that's all child's play. So what gives humans the edge here, at least for now?

It's simple. We've had a lot of practice. Imagine you're a pro baseball player in the outfield watching a fly ball come your way. In an instant, your brain calculates the ball's speed, spin, and trajectory to predict where it will land. To you, it feels automatic.

But it's the result of years of practice and learned experiences, not just from baseball, but from a lifetime of physical interactions. From childhood, moments of trial and error in the physical world have trained your brain to understand how objects move and react. And for humans, mastering these skills takes time because real-world practice can't be rushed.

But fortunately for robots, it can be rushed. And NVIDIA, the AI giant historically known for its graphics cards, has developed incredibly powerful simulated environments where robots can practice and learn at a supercharged pace. Tens of millions of repetitions, which might take humans years, can be compressed into minutes. We're already seeing this in self-driving cars, but the potential goes far beyond that.

By building AI that understands the physical world, NVIDIA is setting the stage for machines that could revolutionize industries, assist in complex surgeries, and even help around the house. So what does it mean for robots to develop a kind of physical intuition? And what challenges and opportunities lie ahead as we continue to push the boundaries of robotics?

I'm Bilal Volsadu, and this is the TED AI Show, where we figure out how to live and thrive in a world where AI is changing everything.

Does your AI model really know code? Its specific syntax, its structure, its logic? IBM's Granite code models do. They're purpose-built for code and trained on 116 different programming languages to help you generate, translate, and explain code quickly. Because the more your AI model knows about code, the more it can help you do. Get started now at ibm.com slash granite. IBM, let's create.

Your business is modern, so why aren't your operations? It's time for an operations intervention. The PagerDuty Operations Cloud is the essential platform for automating and accelerating critical work across your company. Through automation and AI, PagerDuty helps you operate with more resilience, more security, and more savings. Are you ready to transform your operations? Get started at PagerDuty.com.

As we approach the 250th anniversary of the Declaration of Independence, TED is traveling to the birthplace of American democracy, Philadelphia, for an exciting new initiative. Together throughout 2024, TED and Visit Philadelphia started to explore democratic ideas in a series of three fireside chats that will shape our collective future as we work towards a more perfect union.

Our third and final event of 2024 about moving forward together took place on November 20th at the historic Reading Terminal Market. Hosted by TED curator Whitney Pennington-Rogers, we featured TED Talks and a moderated Q&A with world champion debater Julia Darr and head of curiosity at the Eames Institute, Scott Shijioka. Thanks to Visit Philadelphia and our supporting partners Bank of America, Comcast NBCUniversal and Highmark.

Go to visitphilly.com slash TED to learn more about this event and to hear about the exciting things we have coming up in 2025. Our guest today, Rev Liberadian, began his career in Hollywood, where he worked on visual effects for films like Mighty Joe Young and Stuart Little. His experience in creating detailed dynamic 3D worlds laid the foundation for his role today as VP of Omniverse and Simulation Technology at NVIDIA.

There, he's using that expertise to push the boundaries of robotics by applying simulation technology to teach robots physical intelligence. In other words, how to understand and interact with the real world. In our conversation, we explore how NVIDIA, known for its role in gaming technology, became a key player in the development of generative AI. What a robot even is, and Rev's vision for a future where robots enhance our lives.

So, Rev, welcome to the show. Thank you for having me, Bilal. So in the first part of your career, you worked in entertainment, helping audiences become immersed in fantasy worlds. And now your work involves helping robots become immersed in simulations of the real world. Can you explain to our listeners what your role is at NVIDIA?

Technically, my role is, the title is Vice President of Omniverse and Simulation Technology. It's a weird title. I don't think there's many others like it out there. It's strange because it's a new concept relatively speaking. I started my career, as you mentioned, in media entertainment doing visual effects and computer graphics for that purpose. I joined NVIDIA 23 years ago with the hope of taking what I was doing in movies

creating this imagery of high-fidelity, high-quality fantasy worlds and doing it in real-time, doing it really fast using our GPUs to power that computation so that it could become what's a linear experience in movies could become an interactive one like in a video game or in an immersive experience like XR.

It took a while for us to get there, though. Speaking of that, you've had a very unique vantage point over the years watching NVIDIA almost evolve from basically a gaming hardware company to a leader in AI and simulation. Could you share a little bit about your journey at NVIDIA and how NVIDIA's mission has transformed over the years?

That's a really, really great question. I think a lot of people don't really understand how Nvidia, this "gaming company" or this chip company that made chips for gaming PCs is now the most valuable company in the world and at the center of all of this AI stuff. But if you go back to what the idea behind the creation of the company was all the way at the beginning, it actually makes a lot of sense.

The founding principle of the company was this idea that general purpose computers, ones built around CPUs, the same architecture that we built all computers around since the 1960s, starting from the IBM System 360. They're really great, but there are certain computing problems that they just aren't fast enough to solve.

Now, at the time, we had this law called Moore's Law. It's not law like of law physics. It was more like an observation of how semiconductors were essentially providing double the compute for the same price or the same amount of power every year and a half or two. At its height, Moore's Law made it so that we could get 100 times speed increases for the same

price or the same power over a 10-year period. But we looked at Moore's law and said, well, if we wait for Moore's law to give us enough computing power to do certain things like rendering for computer graphics for video games, we would have to wait decades or maybe even hundreds of years before the computers would be fast enough to do some of the things we wanted to do. So NVIDIA set about creating this new form of computing,

that doesn't do everything, but it can do many things that would otherwise be impossible with this generic kind of computer. We call that accelerated computing. We invented the idea of a GPU. The first problem we chose to tackle was the problem of 3D rendering for producing these images in video games.

At the time when NVIDIA was formed in 1993, there was no market for this. There were actually no 3D video games. They were just starting. There was Doom and Wolfenstein, like the first ones that just showed up. Yeah, that came a little bit later, I think. It was not '93, maybe '95, I think.

So we imagine that this problem, if we could help solve it, a market would form around that, and then we could expand into other markets with the same accelerated computing architecture. That's essentially what happened. Fast forward a few more years, in the early 2000s, we added a critical feature to our GPUs. It's called programmable shading.

which is simulating how the light interacts with the material inside a 3D world. That's what makes plastic look like plastic, aluminum look like aluminum, wood look like wood. Up until that point in time, the kinds of shaders we could have, the kinds of materials were very limited and they made the video games look very simple or cartoony, not quite realistic.

In the movie world, we weren't limited by time and how much time you have to render. We could spend hours and hours rendering. So there's this big disconnect between how the quality of computer-generated image in a movie and what you could see in a video game. We introduced programmable shading and that feature of making it programmable unlock the possibility of us using the same GPUs for more than computer graphics and rendering.

And very quickly, we saw researchers and other people who weren't doing computer graphics take advantage of all the computing capabilities that were in our GPUs.

By taking their problems, other sorts of physics problems like molecular dynamics and fluid dynamics, they would take these problems and phrase them like they're a computer graphics problem. And when we realized that, that that was happening, people were willing to contort themselves into using graphics APIs to do this other stuff. We said, let's make it easier for them.

and we introduced CUDA, which was a more natural way of programming general-purpose things that weren't graphics on our GPUs. We essentially waited for six, seven years to see what the killer app would be. We imagined some developer somewhere, probably a grad student, is going to go figure out something amazing to do with this computing capabilities, and it took a while. We introduced CUDA in 2006,

At the end of 2012, almost seven years later, we finally had that moment. And what happened was two research students and their professor at the University of Toronto, Ilya Suskiver, Alex Grishovsky, and their professor, Jeff Hinton, who just won the Nobel Prize,

They beat all of the benchmarks in image classification with a deep learning neural network called AlexNet at the end of 2012 when they published that. That essentially changed everything.

And this is insane because up until that point, basically every other approach for the ImageNet benchmark was not really winning because of this deep learning approach. This was the first time deep learning kind of blew everyone's mind in the realm of computer vision. And it's kind of wild to imagine it started off with programmable shaders and trying to make like cinematic visuals from Hollywood run in real time on your computer. But that same capability, like you said, is you made it easier for developers to

unlocked this whole new world in computer vision and certainly caught the whole world's attention, particularly y'all's, probably sooner than everyone else, I assume. That's exactly right. It seems counterintuitive that this thing built to create images is somehow the same thing that you need to build intelligence. But really, it all just comes down to computing.

The form of computing we had to build for computer graphics, we process a lot of pixels, a lot of triangles, a lot of light rays bouncing around in a scene. That same form of computation is the same thing you need to do all of the tensor math, all of the matrix math. The problem of image classification, that's been a longstanding one that we've all known would be great if we could solve. They've been trying to solve it since the 1950s.

It's a really, really useful thing to do to be able to distinguish what's inside an image that you provide the computer automatically. Up until that point, we would take a really smart person, a computer scientist, that person would imagine an algorithm that can do image classification, and then transcode what's in their brain into the computer and produce a program. What changed here was for the first time,

We were able to create an algorithm to solve something that no human could actually imagine. The way we solved it was by taking a large computer, effectively a supercomputer. We gave it millions of examples of images and said, when you see an image that looks like this, that's a cat. When you look at an image that looks like this, it's a dog. When you look at this image, it's an airplane. So we did that enough times that it wrote the software, it wrote the algorithm,

that could do that image classification. And so it did it better than any algorithm that a human could imagine. - And that's wild, right? You're talking about this like era where humans have written software. Now software is writing software. - That's right. There's two basic ingredients, a supercomputer, lots of computation,

and you give it a whole bunch of data or examples of what you would like it to do, and it figures out the algorithm for you based on the examples you give it. The first one, building large computers, that's our happy place, right? That's what NVIDIA knows how to do. We love building powerful computers and scaling them up. And so that's what we set about doing over a decade ago. And the recent explosive growth of NVIDIA is essentially...

Because of the bet we placed over a decade ago that these big computers were going to be useful. That's what everybody is clamoring for right now. They're setting up these AI supercomputers.

Yeah, and every country and company wants more of your GPUs. And of course, the recent demand has really been driven by large language models and diffusion models, which we've talked about a bunch on the podcast. But it's interesting, like as cool as ChatGPT is and as cool as it is to be able to type a prompt and get an image out, this stuff isn't the holy grail. These systems have their limitations, right? Could you talk a little bit about that as we

transition this conversation towards physical AI. Yes, that's exactly right. At that moment when we realized how profound this change was, that we could now produce algorithms that we never imagined we would have in our lifetimes through this new technique of deep learning and AI. The next question we asked ourselves was,

Now that we have this possibility of creating these amazing new things, which ones should we go create? What are going to be the most valuable and impactful ones? Now, if you just take a step back and think about the computing industry, the IT industry, it's somewhere between $2 and $5 trillion a year globally, which is a huge number, right? That's a really big industry. However, all of the rest of the industries out there, the industries that are

about our physical world, the world of atoms, that's $100 trillion. That includes markets like transportation, transporting humans, transporting goods. It includes manufacturing, which is reassembling atoms into products. It includes drug discovery and design, reassembling atoms into medicines, so on and so forth. Like all these things about our physical world,

at least the way humans value them through markets, are much greater value than information. Now, information is the easiest things for us to digitize. So it makes sense that the first algorithms that we develop using this new machine learning, deep learning AI technique, it's going to use all the data that we have readily available to us, which is essentially what's on the Internet. But if we could somehow take this new superpower,

and apply it to the realm of atoms, we unlock that $100 trillion market. All of those markets take manufacturing, for example. We've applied IT and computing to those markets like manufacturing. But if you go into a factory, it's not that different from a factory 50 years ago. They've been largely untouched by computing.

The reason why we haven't been able to do that is because we haven't really had a bridge between the physical world and the computing world. Connecting bits and atoms, baby. Let's go. Yes. And if you think a little bit more about that, bridge is essentially robotics. Totally. And so we thought about this and we said, this is now maybe possible. The robotics, it's been a dream for a long time. But what we've been missing are the fundamental algorithms we need to build robots

a truly useful robotic brain so that we could apply computing to the real world. And so what's a robot? A robot is essentially an agent out here in our real world that does three things and does these three things in a loop. A robot is

perceives the world around us, the physical world. It inputs the world through sensors. They can be cameras and lidars and radars, all kinds of sensors, whatever the sensing mechanism is. It makes some sense out of what's coming in. It understands what's coming in. Essentially, that first neural network, AlexNet, was doing that. It's getting some information from the real world, an image,

photograph and making sense of what's inside it. The next thing it does, a robot agent inside the physical world, it takes this information, what is perceived, and makes some decisions. Makes a decision about how it should act, it plans and decides how it's going to affect the world.

The third thing is actuation. It actually does something inside the world. So once it's made the decision, it does something that actually moves or affects the physical world. Once that happens, then it's a loop. You perceive your changes to the world,

update your decisions and your plan and go actuate. By this definition, many things are robots, not just the things we normally think of as a robot, like a C3P or R2D2. A self-driving car is definitely a robot. It has to perceive the world around it. Where are the other cars, the stop signs, pedestrians, bicyclists? How fast are they all moving? What's the state of the world around me?

around the car, make some decisions on how it's going to get to the final destination, and actuates, steers, brakes, or accelerates, and this thing runs in a loop. Lots of things are robots if you define them this way. The building I'm in right now, which is our Endeavor building, our headquarters,

Every day when I enter it, in the reception area, we have turnstiles. There are sensors there. There's some cameras. They know when I walk up to the turnstile. It senses that I've approached and then decides who I am

based on an image classification algorithm, not dissimilar from that original AlexNet. Once it determines that I'm Rev, it can look me up in a database, should I have access, and then it actuates in the world. It opens the turnstile so I can pass through and update some count somewhere that now I'm in the main area.

So this building is essentially a robot. So if you think about robots in this way and you think about robotic systems as essentially the bridge between computing and the $100 trillion worth of industries out there that deal with the physical world, you start to get pretty excited. You're like, wow, we now potentially have the opportunity to go make a big impact in many of these other industries.

And so on that note, I mean, it's interesting, right? You are talking about how factories haven't changed in decades. And you're right. There's like enterprise resource planning software to keep track of the inventory of stuff and how it's moving around. But the world of atoms hasn't seen as much progress in the world of bits. And to unlock that massive, like physical, like the massive opportunity in these physically based industries, what's the missing piece? What do we not have today? And what are y'all building to make that happen?

Yeah. So this is where simulation comes in. If we go back to what were the key differences between how we used to write software and this new form of AI, one is supercomputing, the other is you need that data or the set of examples to give it so we could go write the function. Well, where are we going to get that data?

to learn the physics of the world around us. How do you gather that data? It doesn't just exist on the internet. The stuff we have on the internet is largely the things that were easy to digitize, which is not stuff in the physical world.

So our thesis is that the only way we're going to get all the data that we need is by essentially taking the physical world and all the laws of the physical world and putting it in a computer, making a simulation of the physical world. Once you have that, you can produce all of the data you need, essentially the training grounds for these AIs to learn about the physical world. You're no longer constrained.

by all of the constraints that we have out here in the real world. We can train faster than time, than the real-world time out here. By just adding more compute, you can go for every real-world second. We can do millions of seconds in the simulated world. Wow. Yeah. Collecting data from the real world is really expensive. Let's take one kind of robot, self-driving cars, autonomous vehicles.

If you want to train a network to perceive a child running across the street in any condition, any lighting condition, any city. Different times of year, so different weather. Yeah, different weather conditions. You're going to have to actually go out there in the real world and have a child run across the street as your car is barreling down the road and capture it.

I mean, first of all, obviously, this is unethical to do and we shouldn't do that.

But then just the tediousness of that, of capturing it in every possible long tail scenario, it's just untenable. You can't do that. It's too expensive and it's just impossible. You know, there are some really rare weather conditions. You might want to have that same condition with volcanic ash falling that might happen in Hawaii. How can you even construct that scenario, right? But in simulation, we can create it all.

In addition, when you grab data from the real world, you only have half the data you need. We also need to know about what's inside this information and the unstructured information. Labels. Labels, exactly. So with AlexNet, when they trained it, they had not only the image,

but they had the label that said that image is a cat or a dog. When we simulate a world, we can produce the labels perfectly and automatically. You get it for free pretty much. But when you do it in the real world, you have to have an army of humans or some other mechanism of adding the labels and they're going to be inaccurate. Before you deploy it out into the real world, you probably want to make sure it's going to work. We don't want to put a robot brain in a self-driving car.

and just hope that it's going to work when that child runs across the street. The best place to go test that is in a virtual world, in a simulation. There's a really long-winded way to get to, this is essentially what I've been working on in recent years.

Here at NVIDIA, we saw the need for this many years ago, so we started building what we call Omniverse. Omniverse is a "operating system" that we collect all of our simulation and virtual world technologies into. The goal of Omniverse is specifically about doing simulations that are as physically accurate as possible.

That's the key thing. It has to match the real world because otherwise our robots would be learning about laws of physics from something that's just wrong. This is distinctly different than what I did before.

And my work in movies and doing simulations to produce the amazing imagery that we see in visual effects and CGI movies or in video games, that's all about creating really cool looking images that are fun of fantasy worlds, of fake worlds.

there's all kinds of stuff that we're cheating. We add extra lights and makeup and we're breaking the laws of physics in order to make the movie fun and cool or exciting. There is something really poetic about that though.

It basically goes back to the start of your career, like all this stuff, all these capabilities y'all built to emulate the laws of physics, let's say for light transport and just get the material properties right. So the glint, veneer, the reflections and refraction all look really good. That's exactly what you need. Obviously tuned in a fashion that's physically accurate, as you said. So these robots have kind of a believable digital twin or copy or replica of the real world where they're free to make mistakes and

But also the time dilation aspect that you mentioned where you can scale up and have these like models go do things in the digital realm that like would take forever to do in the physical world. And it feels like there's another piece of this, too, is like you create these digital replicas of the world that becomes the training data, because as you said, you don't have the Internet to go and pull all this text or image data from.

But then you have the robots try things and there's this like domain gap that this chasm that you need to cross between the simulation and the real world. What are some of the other capabilities that y'all are building to make that happen? Yeah, I kind of oversimplified how we build these AIs to just feed data into robots.

into the supercomputer and out comes this amazing robot brain. That's some of how we do it, but there's many different forms of learning. And I think the one you're touching upon is what's called reinforcement learning. It turns out that these robots, one of the best ways for them to learn is sort of how humans and creatures learn. When a baby is born, a human baby is born into the world,

it still doesn't understand the physics of the world around them. A baby can't see depth, they can't really see color yet, they have to learn how to see color. Over time, over weeks, they start learning those things. They start learning how to classify. They classify mom and dad and siblings and- Apple, Paul. Apple, all of those things around. They learn it just through experience.

They also learn about the laws of physics through a lot of experimentation. So when you first start giving your baby food and putting food in front of them, one of the first things they do is drop it or throw it, breaking things, throwing things, making a mess. Those are essentially science experiments. They're all little scientists that are trying things until they learn it. And once they understand how that physics works, they move on. Robots learn in the same way.

through this method called reinforcement learning, where we throw them into a virtual world or into, it could actually be in the real world, but it's too slow to do in the real world. Generally, we do it in the virtual world. We give this robot the ability to perceive and actuate inside that world.

but it doesn't actually know anything. But we give it a goal. We'll say, "Stand up." We have them try millions and millions of iterations of standing up. What you were alluding to, this Isaac Sim, that's our robotic simulator that we've built on top of our Omniverse platform on this "operating system" that allows you to do many of the things you need in order to build robot brains,

One of those things is reinforcement learning. It's almost like a training simulator built on top of Omniverse where it's free to make mistakes. And you're almost like, like you said, I love the notion of wall clock time and speeding that up. You're compressing all these like epochs of learning and evolution down into something that is manageable. And then you plop that into a real world robot and it still works. That's exactly right.

Simulated time is not bound to wall clock time. If I double the amount of compute, double the size of my computer, that's twice the amount of simulation I can do, that's twice the number of simulation hours. So the scaling laws apply here in a profound way. That's pretty magical.

Let's talk a little bit about the applications of physical AI, like obviously applies to so many different fields. We talked about autonomous vehicles. There's like robotic assisted surgery. You alluded to automated warehousing. Could you share some examples of how physical AI is currently impacting these areas and what it's unlocking for these industries that have sort of been stuck in the past? I think the very first place that it's impacting the most, the first area is autonomous vehicles.

The first robots that once we discover this deep learning machine learning thing, immediately you saw

all of these efforts from different companies to go build autonomous vehicles, whether they're robo-taxis or assistance inside commercial cars. And it's actually become a reality now. Like, I don't know if you've been to San Francisco or Phoenix or... We got Waymo in Austin here, too. Yeah, Waymo. I didn't realize they're in Austin as well. It's pretty awesome. I was in Phoenix a month or so ago at the airport, and...

I was waiting for my Uber and five Waymos picked up these people standing next to me. And it was super mundane. Just another day. Just another day staring at their phones and got into the car like it was nothing. This was unimaginable 10 years ago.

And now it's become mundane. And all of that is powered by these AI algorithms. Now, I don't know exactly what's inside Waymo or any of the other ones, but there's this trend that's happening where we're moving from the kind of earlier generations of AI that are some more specific AI, like AlexNet, where we trained people

these models on very specific datasets and then we string these different models together to form a whole system. Like task-specific models that you clutch together. Yeah. You put together to these more general purpose unified models that are built on the transformer architecture, the same thing that powers LLMs. We're starting to see these robotics models

that are more general purpose. That's what we're talking about with physical AI and being the next wave. Essentially, having these foundation models with general purpose understanding of the physics world around us,

that you use as the basis, as the foundation to then fine-tune for your specific purpose. Just like we have LAMA and GPT and the anthropic models, and then from there you go fine-tune those for specific kinds of tasks. We're going to start seeing a lot of new physical AI models that just understand the general laws of physics. Then we'll go take those and fine-tune them to specialize for different kinds of robotic tasks.

and so there's robotic tests it's like you know the roomba in your freaking house versus like you know of course a warehouse robot or even an autonomous vehicle that's right yeah they could be a pick and place robot in a warehouse it could be an amr they're like basically little driving platforms that that zip around in these warehouses and factories they could be drones that are flying around

inside factories, outside. That's what I want, by the way, is I want like a hot latte delivered like on my balcony by a drone, not having to navigate traffic. And it's like, it's actually hot and gets to you. Yeah, I'm not sure I'm with you on that one. I don't know if I want to have thousands of drones zipping around my neighborhood, just dropping off lattes everywhere. That's one of the few things that I do by hand and handcraft at home myself. Yeah.

You like your latte art? I make one every morning for my wife. That's like the first thing I do every day. And it kind of grounds me into the world. So I don't need a drone doing that. Fair enough. Fair enough. How do you think about where we are in terms of like physical AI capabilities today? I don't know if like the GPT-1234 nomenclature is the right way to think about it. But I'm curious as you think about where we are now and where we're headed, what do you think about the future?

What stage are we at in terms of the maturity of physical AI capabilities, especially this more general approach to agents that understand and can take action in the physical world? I think we're right at the beginning. I don't know how to relate it exactly to GPT-1234. I'm not sure that works, but we're at the very beginning of this.

That being said, we're also building on the GPT-1234, on the LLMs themselves. The information and data that's fed into these text-based or LLM models is actually still relevant to the physical AI models as well. Inside these descriptions in the text that was used to train them is information about the physical world. We talk about things like the color red and

putting a book on a shelf

And an object falling, those abstract ideas are still relevant. It's just insufficient. If a human has never seen any of those things, never touched or experienced it, only had the words describing the color red, they're not really going to understand it. It's not grounded in the physical world as you said previously. Right. And so they're going to take all of this different modes of information and fuse them together to get a more complete understanding of the physical world around us.

Is a good analogy like different parts of our brains? Like it's it seems like these LLMs are really good at reasoning about sort of this like symbolic textual world. And there's all this debate over how far the video models can go and like reproduce the physics of the world. But it sounds like you just create another primitive that kind of works in concert with these other pieces that is actually grounded in the real world and has seen examples of the physical world and all the edge cases that you talked about. And then that system as a whole is far more capable.

Exactly. I think there is debate over how far you can go with these video models because of the physics of the world. Now, even the current more limited video models we have, they're not trained with just video. They're multimodal. There's lots of information coming from non-video sources. There's text and captions and other things that are in there. And so if we can bring

bring in more modes of information like the state of the world that you have inside a simulator. Inside a simulator, we know the position of every object in 3D space. We know the distance of every pixel. We don't just see things in the world, we can touch it, we can smell it, we can taste it. We have multiple sensory experiences that fuse together.

to give us a more complete understanding of the world around us. Like right now, I'm sitting in this chair. I can't see behind my head, but I'm pretty sure if I put my hand behind me here, I'm going to be able to touch the back of the chair. That's proprioception. I know that because I have a model of what the world is around me because I've been able to synthesize that through all of my senses and there's some memory there.

We're essentially replicating the same kind of process, the same basic idea with how we train AIs. The first, the missing piece was this transformer model, this idea that we just throw all kinds of unstructured data, this thing, and it figures out, it creates this general purpose model.

that can do all kinds of different things through understanding of complex patterns. So we had that and we need all the right data to pump into it. And so our belief is that a lot, if not most of this data is going to come from simulation, not from what happens to be on the internet. So interesting what your point about

Yeah, the state of the world. Like you have the, to use nerd speak, the 3D scene graph. And as you mentioned, yeah, like the vectors of all the various objects, all this stuff that you take for granted in video games could then be thrown into a transformer along with other image data, maybe decimated to look like a real sensor. And then suddenly you can, like, it'll build an understanding or build a, I've heard it described as like a universal function approximator to figure out how to, yeah, revert

emulate all these other senses like proprioception and all these other things. I think there's like 30 or 40. I was like kind of surprised to hear that we have so many. And maybe robots could, I mean, they're not even limited by art. You alluded to LIDAR and lasers earlier, right? Or infrared. And so it's like at some point, these robots will be, going back to the start of our conversation, superhuman. Yeah. I mean, we have animals that are superhuman in this way too, right? Bats can see with sound. Yeah.

Yeah, eagles can have got like varifocal vision. They can kind of zoom in. Sure, why won't they be superhuman in certain dimensions of sensing the world and acting within the world? Of course, they already are in many respects. We have image classifiers that can classify animals, every breed of dog and plants better than any human can. So true. So we'll certainly do that, at least in certain dimensions. ♪

Hi, I'm Bilal Sadoo, host of TED's newest podcast, The TED AI Show, where I speak with the world's leading experts, artists, journalists, to help you live and thrive in a world where AI is changing everything. I'm stoked to be working with IBM, our official sponsor for this episode.

Now, the path from Gen AI pilots to real-world deployments is often filled with roadblocks, such as barriers to free data flow. But what if I told you there's a way to deploy AI wherever your data lives? With Watson X, you can deploy AI models across any environment, above the clouds helping pilots navigate flights, and on lots of clouds helping employees automate tasks, on-prem so designers can access proprietary data,

and on the edge so remote bank tellers can assist customers. Watson X helps you deploy AI wherever you need it so you can take your business wherever it needs to go. Learn more at ibm.com slash Watson X and start infusing intelligence where you need it the most.

Your business is modern, so why aren't your operations? It's time for an operations intervention. The PagerDuty Operations Cloud is the essential platform for automating and accelerating critical work across your company. Through automation and AI, PagerDuty helps you operate with more resilience, more security, and more savings. Are you ready to transform your operations? Get started at PagerDuty.com.

So let's talk about looking towards the future a little bit here. So you talked about physical AIs is transforming factories and warehouses. What's your take on the potential in our everyday lives, right? Like, how do you see these technologies evolving to bring robots into our home or personal spaces in really meaningful ways? It's like as intimate as it possibly can get, right? It's not really a controlled environment either. If you've been watching any of Jensen's keynotes this past year, within the last

10, 12 months or so, there's been a lot of talk of humanoid robots. Absolutely, yeah. And that's kind of all the rage. You're seeing them everywhere. I imagine for many people when they see this, they could just kind of roll their eyes like, oh yeah, yeah, humanoid robots, we've been

talking about these forever. Why does it have to look like a humanoid? Doesn't it make more sense to build specialized robots that are really good at specific tasks? And we've had robots in our most advanced factories for a long time, and they're not humanoids. They're like these large arms in automotive factories. And why are we talking about humanoid robots? The reason why this is coming up now is because if you take a step back and think about it,

If you're going to build a general-purpose robot that can do many different things, the most useful one today is going to be one that's roughly shaped and behaves and acts like a human because we built all of these spaces around us for humans.

So we built our factories, our warehouses, our hospitals, our kitchens, our retail spaces, there's stairs and ramps and shelves. So if we can build a general purpose robot brain, then the most natural physical robot to build, to put that brain in for it to be useful would be something that's human-like because we could then take that robot and plop it into many different environments

where it could be productive and do productive things. And so many companies have realized this and they're going all in on that. We're bullish on it. I think even within this space, though, there are specializations. Not every humanoid robot is going to be perfect for every task that

that a human can do. Actually, not all humans are good at every task. Some humans are better at playing baseball and some are better at chopping onions. Astronauts have a certain criteria, right? That's right. So we're going to have many companies building more specialized kind of humanoids or in different kinds of robots. The ones that we're immediately focused on are the ones in industry.

We think this is where they're going to be adopted the most, the quickest, and where it's going to make the most impact. Everywhere we look globally, including here in the US, there's labor shortages in factories, warehouses, transportation, retail. We don't have enough people to stock shelves.

And the demographics are such that that's just going to get worse and worse. So there's a huge demand for humanoid robots that could go work in some of these spaces. I think as far as in our personal space, a robot that can work side by side with a human in a factory or a warehouse should also be able to work inside your kitchen in your home. How quickly those kinds of humanoid robots are going to be accepted, there'll be a market for it.

I think it's going to depend on which country we're talking about because there's a very cultural element. Bring a robot into your home, some other thing that's human-like into your home, that's very personal. God forbid it makes your latte for you. Exactly. I don't want to do that in my kitchen. I don't even want other humans in there in the morning. But there's cultural elements here. In the U.S. and the West in general, we're probably a bit,

more cautious or careful about robots. In the East, especially countries like Japan, they love them, right? And they want it. But industry everywhere needs it now. And so for industrial applications, I think it makes sense to start there and then we can take those technologies into the consumer space and the markets will explore where they fit

the best at first, but eventually we'll have them everywhere. It's so fascinating to think about how many technologies that their early adopters are of, including virtual avatars and things like that, but sort of bridging virtual and the physical

The technologies you all are building aren't just limited to robots, right? As this tech improves spatial understanding, they could enhance our personal devices, sort of virtual assistants. How close do you think we are to that sort of, you know, in real life Jarvis experience, a virtual assistant that can seamlessly understand and interact with our physical environment, even if it's not embodied as a robot?

So this gets back to what I was saying earlier about the definition of a robot. What is a robot? Totally. The way you just talked about that, like to me, Jarvis is actually a robot. It does those three things. It perceives the world around us. Yep.

through many different sensors. It makes some decisions and it can even act upon the world. Like Jarvis inside the Avengers movies. Yeah. It can actually go activate the Iron Man suit. Right, yeah. And do things there, right? Like, so what is the difference between that and a C-3PO? Totally. Fundamentally. You're kind of inside a robot, sort of as you alluded to the NVIDIA building too, yeah. And if you think about some of these XR devices that immerse us into the world, they're half a robot. There's the perception...

part of it. There's the sensors along with some intelligence to do the perception, but then it's fed into a human brain and then the human makes some decisions and then it acts upon the world. Right. And when we act upon the world, there's maybe some more software, some even AI doing things inside the simulation of that world or that combination. So it's not black or white, what's a robot and

What's a human or human intelligence where there's kind of a spectrum between these things. We can augment humans with artificial intelligence. We're already doing it. Every time you use your phone to ask a question, you go to Google or perplexity or something. You're adding AI. You're augmenting yourself with AI there by asking chat GPT a question. It's that blend of.

AI with a Jarvis experience that's immersive with XR, it's just making it so that the that loop is faster with the augmentation. You beautifully set up my last question, which is as AI is becoming infused in not just the digital world, but the physical world, I have to ask you, what can go wrong and what can go right?

Well, with any powerful technology, there's always going to be ways things can go wrong. This is the most powerful of technologies potentially that we have ever seen. We have to be, I think, very careful and deliberate about how we deploy these technologies to ensure that they're safe. In terms of deploying AIs into the physical world,

I think one of the most important things we have to do is ensure that there's always some human in the loop somewhere in the process, that we have the ability to turn it off, that nothing happens without our explicit knowledge of it happening and without our permission.

We have a system here. We have sensors all around our building. We can see where people are, which areas they're trafficking the most. At night, we have robotic cleaners. They're like huge Roombas that go clean our floors.

and we direct them to the areas that people have actually been, and they don't bother the areas that haven't been trafficked at all to optimize them. We're going to have lots of systems like that. That's a robotic system. That's essentially a robot controlling other robots. But we need to make sure that there's humans inside that loop somewhere, deploying that, watching it, and ensuring that we can stop it, and pause it, and do whatever is necessary.

So the other part of the question was, what are the good things that are going to come out of this? We touched on a bunch of those things there, but ultimately, being able to apply all of this computing technology and intelligence to things around us in the physical world, I can't even begin to imagine the potential for the increase in productivity. Just look at something like agriculture. If you have effectively unlimited workers,

who can do extremely tedious things like pull out one weed at a time and thousands of acres of fields go through and just identify where there's a weed or a pest and take them out one by one. Then maybe we don't need to blanket these areas with pesticides, with all these other techniques that harm the environment around us, that harm humans. We can...

Essentially, the primary driver for economic productivity anywhere is the number of people we have in a country. I mean, we measure productivity with GDP, gross domestic product, and we look at GDP per head. That's the measure of efficiency, right? But it always correlates with the number of people. Countries that have more people

have more GDP. When we take physical AIs and apply them to the physical world around us, it's almost like we're adding more to the population.

And the productivity growth can increase. And it's even more so because the things that we can have them do are things that humans can't or won't do. They're just too tedious and boring and awful. So you find plenty of examples of this in manufacturing, in warehouses, in agriculture, in transportation. Look, we keep talking about transportation being the CG issue right now. Truck drivers, we don't have enough of them out there.

This is essentially a bottleneck on productivity for a whole economy. Soon, we're effectively going to have an unlimited number of workers who can do those things. And then we can deploy our humans to go do all the things that are fun for us, that we like doing. I love that. It's like we're finally going to have technology that's fungible and general enough where we can reimagine all these industries and, yeah, let humans do the things that are enriching and fulfilling, but

and perhaps even have a world of radical abundance. I know that's a little trendy thing to say, but it feels like when you talk about that, it sounds like a world of radical abundance. Do you feel that way? I do. I do. I mean, if you just think about everything I said from first principles, why won't that happen? If we can manufacture intelligence and this intelligence can go drive, be embodied in the physical world and do things inside the physical world for us,

Why won't we have radical abundance? I mean, that's basically it. I love it. Thank you so much for joining us, Rev. Thank you for having me. It's always fun talking to you. Okay, as I wrap up my conversation with Rev, there are a few things that come to mind. Oh my God, NVIDIA has been playing the long game all along. They found just the right wedge, computer gaming, to de-risk a bunch of this fundamental technology that has now come full circle.

Companies and even governments all over the world are buying Nvidia GPUs so they can train their own AI models, creating bigger and bigger computing clusters, effectively turning the CEO, Jensen Huang, into a bit of a kingmaker. But what's particularly poetic is how all the technologies they've invested in are the means by which they're going to have robots roaming the world.

We are creating a digital twin of reality, a mirror world if you will. And it goes far beyond predicting an aspect of reality like the weather. It's really about creating a full fidelity approximation of reality where robots can be free to make mistakes and be free from the shackles of wall clock time. I'm also really excited about this because creating this type of synthetic training data has so many benefits for us as the consumer.

For instance, training robots in the home. Do we really want a bunch of data being collected in our most intimate locations inside our houses? Synthetic data provides a very interesting route to train these AI models in a privacy-preserving fashion. Of course, I'm left wondering if that gap between simulation and reality can truly be overcome. But what it seems is that gap is going to continually close further.

Who knew? Everyone was throwing shade on the metaverse when it first hit public consciousness. Like, who really wants this 3D successor to the internet? Now I'm thinking maybe the killer use case for the metaverse isn't for humans at all, but really it's for robots.

The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Girard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker. And our engineer is Asia Pilar Simpson. Our researcher and fact checker is Christian Aparta. Our technical director is Jacob Winnick. And our executive producer is Eliza Smith.

And I'm Bilal Vosadu. Don't forget to rate and comment, and I'll see you in the next one.