We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

2025/3/20

No Priors: Artificial Intelligence | Technology | Startups

AI Deep Dive AI Chapters Transcript

People

Chelsea Finn

Topics

我被机器人技术对世界的潜在影响以及机器感知和智能发展问题所吸引，而机器人技术完美地融合了这两者。我在博士期间开始认真研究机器人技术，当时我们专注于神经网络控制，尝试训练神经网络将图像像素直接映射到机器人手臂的电机扭矩上。训练机器人执行特定任务相对容易，但让它在各种场景和物体中执行同一任务却极具挑战性。我一直致力于研究如何创建更广泛的数据集，利用这些数据集进行训练，并探索不同的学习方法，包括强化学习、视频预测和模仿学习。 Physical Intelligence 致力于构建一个大型神经网络模型，最终能够控制任何机器人，在任何场景中执行任何任务。与专注于单一应用的传统机器人技术不同，我们致力于解决更广泛的现实世界物理智能问题，关注泛化能力和通用型机器人。我们认为利用所有可能的数据至关重要，这不仅包括来自单个机器人的数据，还包括来自任何具有不同关节数或手臂数的机器人平台的数据，这有助于实现跨不同机器人平台的知识迁移。与语言模型不同，我们缺乏机器人运动的“维基百科”或互联网，因此需要在现实世界中收集真实机器人数据来推动机器学习的进步。实现泛化能力的关键在于收集更多样化的机器人数据，这比仅仅增加数据量更重要。我们选择开源模型和软件包，因为我们认为该领域仍处于早期阶段，并且希望支持研究发展和社区建设，从而为未来更强大的通用模型做好准备。我更担心没有人能够解决机器人技术中的难题，而不是担心竞争对手。我无法预测这些模型的首次应用领域，因为机器人技术的一个挑战在于，其输出结果通常由机器人自身自主完成，而非人类检查，这需要新的方法来容忍错误或实现人机协作。虽然人形机器人很酷，但我认为它们被高估了，因为我们目前的数据量有限，而优化数据收集效率比追求人形机器人更重要。人们低估了运动控制中的复杂性和智能性，即使是像吃麦片或倒水这样简单的动作也需要高度的复杂性和智能。一些研究成果，例如SACAN、RT2和RTX，以及LOHA，证明了在机器人技术领域取得的重大进展，这些进展推动了该领域的快速发展和新公司的涌现。我们开发了一种分层交互式机器人系统，该系统结合了高层模型（用于规划任务步骤）和低层模型（用于执行电机控制），从而能够执行更长时序的任务并与人类进行交互。虽然视觉信息已经取得了很大的进展，但我希望未来能够在机器人中加入更先进的触觉传感器和其他传感器，以提高鲁棒性和功能。与自动驾驶领域不同，机器人技术领域近期涌现了许多新的参与者，这表明该领域可能比自动驾驶领域更年轻，技术发展也更快。对于想要创办机器人公司的创业者，我的建议是快速学习，快速部署，快速迭代，并从实际经验中学习。虽然观察性数据（例如YouTube视频）对训练机器人模型很有价值，但机器人自身的身体经验对于学习至关重要，因此机器人自身的数据仍然是不可或缺的。我认为未来将会出现各种各样的机器人平台，就像厨房里有多种不同的电器一样，这将比单一类型的通用机器人更有效率。

Deep Dive

Chapters

Physical Intelligence is building a large neural network model to control any robot for any task in any scenario. Unlike other companies focusing on single applications, they aim for long-term generalizability across various robot platforms and data sources.

Physical Intelligence aims to build a general-purpose AI for robots
They focus on generalization and leverage data from various robot platforms
Their approach contrasts with traditional robotics focusing on single applications

Shownotes Transcript

This week on No Priors, Elad speaks with Chelsea Finn, cofounder of Physical Intelligence and currently Associate Professor at Stanford, leading the Intelligence through Learning and Interaction Lab. They dive into how robots learn, the challenges of training AI models for the physical world, and the importance of diverse data in reaching generalizable intelligence. Chelsea explains the evolving landscape of open-source vs. closed-source robotics and where AI models are likely to have the biggest impact first. They also compare the development of robotics to self-driving cars, explore the future of humanoid and non-humanoid robots, and discuss what’s still missing for AI to function effectively in the real world. If you’re curious about the next phase of AI beyond the digital space, this episode is a must-listen.

Show Notes:

0:00 Introduction

0:31 Chelsea’s background in robotics

3:10 Physical Intelligence

5:13 Defining their approach and model architecture

7:39 Reaching generalizability and diversifying robot data

9:46 Open source vs. closed source

12:32 Where will PI’s models integrate first?

14:34 Humanoid as a form factor

16:28 Embodied intelligence

17:36 Key turning points in robotics progress

20:05 Hierarchical interactive robot and decision-making

22:21 Choosing data inputs

26:25 Self driving vs robotics market

28:37 Advice to robotics founders

29:24 Observational data and data generation

31:57 Future robotic forms

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn 35:14 Share

No Priors: Artificial Intelligence | Technology | Startups

Deep Dive

Shownotes Transcript

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn