Is robotics about to have its own ChatGPT moment?

2024/11/13

MIT Technology Review Narrated

AI Deep Dive AI Chapters Transcript

C

Charlie Kemp

C

Chelsea Finn

D

Deepak Pathak

H

Henry Evans

J

Jane Evans

K

Ken Goldberg

L

Laura Pinter

R

Ross Allen

V

Vincent Vanhoucke

叙

叙述者

Jane Evans 指出家用机器人的主要挑战在于应对现实世界中家居环境的复杂性和不可预测性，例如家具摆放、地板规划以及宠物和儿童的活动等。这些因素使得机器人难以在非实验室环境中有效运行。 Charlie Kemp 认为该领域正处于转折点，廉价硬件、数据共享和生成式AI的进步使得机器人变得更胜任和更有帮助。 Ken Goldberg 阐述了莫拉维克悖论，即对人类来说容易的事情对机器来说很难，反之亦然。他指出机器人面临三大挑战：精确控制和协调能力不足；对周围世界的理解有限；缺乏对物理学的本能理解。 Deepak Pathak 强调了传统机器人训练方法的局限性，并指出人工智能的兴起正在改变机器人技术领域，重点从物理灵活性转向构建通用的机器人“大脑”。他介绍了强化学习和模仿学习这两种常用的AI训练机器人方法，并详细说明了其团队如何利用强化学习训练四足机器人进行复杂运动。 Ross Allen 认为模仿学习结合生成式AI，能够快速教会机器人许多新任务，并指出生成式AI有望在机器人领域引发类似于ChatGPT在语言模型领域所产生的影响。 Chelsea Finn 指出基于学习的方法在机器人领域越来越受欢迎，但需要大量特定于机器人的数据。目前数据稀缺，收集数据耗时费力。 Laura Pinter 进一步说明了数据稀缺的问题，并介绍了其团队开发的一种廉价的数据收集方法。她认为，要训练机器人完成更复杂的任务，需要更多的数据和演示。 Vincent Vanhoucke 认为大型视觉语言模型能够帮助机器人更好地理解周围世界，并进行推理和学习。谷歌DeepMind正在利用类似于机器翻译的技术，将自然语言指令转化为机器人的动作。 Henry Evans 分享了他使用机器人的经验，并强调了机器人带来的独立性，使他能够自己完成一些事情。

Deep Dive

Welcome to M I, T technology review narrated. My name is matt honan. I'm our editor in chief.

Every week, we will bring you a fascinating new in depth story from the leading edge of science and technology, covering topics like A I, biotech, climate, energy, robotics and more. Here's this week story I hope you enjoy IT. My name is Melissa hay clap, and i'm a sey report of her.

A I here, M I T technology review. The story about to listen to is about how A I is revolutionising robotics and bringing us closer to a decades long dream of building useful home robots. In the story, you'll hear from pioneering robotic researchers in a couple in california called Henry and jane Evans for the in testing home robots for over a decade to help with handy's disability. Their stories is a great example of how this technology could not only help with monday tasks like laundry, but actually change people's lives. I'm really excited to see where this field goes next.

And thanks for listening. Listen to more of the best articles from the world's biggest publishers, APP a news over audio 点 com。 Henri and jan Evans are used to walk with house guests for more than a decade.

The couple, who live in los altos hills, california, have hosted a slew of robots in their home in two thousand and two. At age forty hand, we had a massive stroke, which left him with quad, a pledge and an inability to speak. Since then, he's learned how to communicate by moving his eyes over a latter board.

But he is highly reliant on caregivers. And his wife, jane Henry, got a glimmer of a different kind of life when he saw charlie camp on CNN in twenty. Kim per robotics professor at George attack, was on T.

V, talking about P, R two, a robot developed by the company willow garage. P, R, two was a massive two armed machine on wheels that looked like a crude metal. Butler camp was demonstrating how the robot worked and talking about his research on how health care robots could help people.

He showed how the P. R, two robot could hang some medicine to the television host. All of a sun.

Henry turns to me and says, why can't that robot be an extension of my body? And I said, why not? James says, there was a solid reason why not? While engineers have made great progress in getting robots to work in tightly controlled environment like labs and factories, the home has proved difficult to design for out in the real messy world, furniture, r and floor plans differ wildly.

Children and pats can jump in a robots way, and clothes that need follow in coming different shapes, colors and sizes. Managing such unpredictable settings and varied conditions has been beyond the capabilities of even the most advanced robot prototypes that seems to be finally changing. In large part thanks to artificial intelligence for decades, roboticists have more or less focused on controlling robots bodies, their arms, legs, leavers wheel and the like via purpose driven software.

But a new generation of scientists and inventors believes that the previously missing ingredient of ali can give robot to the ability to learn new skills and the doubt to new environment faster than ever before. This new approach, just maybe, can finally bring robots out of the factory and into our homes. Progress won't happen overnight though, as the Evans is no far too well from there.

Many years of using various robot prototypes, peel, too, was the first route what they brought in and ded opened entirely new skills for Henry. IT would hold a beard sva and Henry would move his face against IT, allowing him to shave and scratching IT by himself for the first time in a decade. But at four hundred and fifty pounds, two hundred kilograms also, and four hundred thousand dollars, robot was difficult to have around IT could easily take out a wall in your house.

Jane says, I wasn't a big fun. More recently, the Evans has have been testing out a smaller robot called stretch, which camp developed through his startup, hello robot. The first declaration launched during the pandemic with a much more reasonable Price tag of around eighteen thousand dollars stretch ways, about fifty pounds.

IT has a small mobile base, a stick with a camera dangling off IT, and an adjustable ARM features a gripper with suction cups at the ends. IT can be controlled with a console controller, henri controlled stretch using a laptop with a tool. That truck has had movements to move a cursor around.

He is able to move his thumb and index finger enough to click a computer mouse. Last summer, stretch was with a couple for more than a month. And Henry says, I gave him a whole new level of autonomy.

IT was practical, and I could see using IT every day, he says using his laptop, he could get the robot to brush his hair and have IT told root cabs for him to snack on. IT also opened up Henry's relationship with his granddaughter teddy before they barely interacted. He didn't hug me at all.

goodbye. Nothing like that, jane says, but pop wei and teddy use stretch to play, engaging in ray races, bowling and magnetic fishing. Stretch doesn't have much in the way of smart IT comes with some preinstalled software such as the web interface that can we uses to control IT and other capabilities such as A I enabled navigation.

The main benefit of stretch is that people can plug in their own ally models and use them to do experiments. But IT offers a glimpse of water world with useful home. Robots could look like robots that can do many of the things humans do in the home.

Tasks such as folding laundry, cooking meals and cleaning have been a dream of robotics research since the inception of the field in the nineteen fifties. For a long time, it's been just that robotics is full of dreams, says camp, but the field is at an inflection point, says ken goldberg, a robotics professor at the university of california. Berkely previous efforts to build a useful home robot, he says, of ethically failed to meet the expectations set by popular culture.

Think the robotic made from the jetsons. Now things are very different, thanks to cheap hardware like stretch, along with efforts to collect and share data and advances in generative A I robots are getting more competent and helpful faster than ever before, where at a point where we're very close to getting capability that is really going to be useful. Gold berg says folding laundry, cooking shrimp, wiping services, unloading shopping baskets.

Today's A I powered robots are learning to do tasks that, for their predecessors would have been extremely difficult. There's a well known observation among roboticists. What is hard for humans is easy for machines, and what is easy for humans is hard for machines.

Called moroz ax paradox, IT was first articulated in the one nine hundred and eighty by hans moravec c, then a roboticist at the robotics institute of carnegie melon university. A robot can play chairs or hold an object still for hours on end, with no problem tying a shoe lace, catching a ball or having a conversation is another matter. There are three reasons for this, says goldberg.

First, robot lack precise control and coordination. Second, their understanding of the surrounding world is limited because they are reliant on Cameron's incenses to perceive IT. Third, they lack an innate sense of practical physics. Pick up a hammer and IT will probably fall out of your gripper unless you grab IT near the heavy part.

But you don't know that if you just look at IT, unless you know how hamers work, gold berg says on top of these basic considerations, there are many other technical things that need to be just right. Promoters to cameras to White psychoactive and hardware can be prohibitively expensive. Mechanically, we've been able to do fairly complex things for a while.

In a video from one nine hundred and fifty seven, two large robotic comes a day just enough to pinched a cigarette, place IT in the mouth of a woman at a typewriter and reapply her lipstick. But the intelligence and the spatial awareness of that robot came from the person who was Operating IT. The missing peace is, how do we get software to do these things automatically, says d park patti and assistant professor of computer science at carnegie melon researchers.

Training robots have traditionally approached this problem by planning everything the robot does, an excitation detail robot tics. Giant boston dynamics use this approach when IT developed its boogying and park coring humanoid robot, atlas cameras in computer vision are used to identify objects in scenes. Researchers then use that data to make models that can be used to predict with extreme precision what will happen if a robot moves a certain way.

Using these models, roboticists plan the motions of their machines by writing a very specific list of actions for them to take. The engineers then test these motions in laboratory many times and tweak them to perfection. This approach has its limits.

Robots trained, like there's a strictly choreographed to work in one specific setting, take them out of the laboratory and into an unfamiliar location, and they are likely to top over compared with other fields such as computer vision. Robotics has been in the dark, es. Partek says.

But that might not be the case for much longer because the field is seeing a big shakeup, thanks to the A. I. boom. He says the focus now shifting from feats of physical dexterity to building general purpose robot brains in the form of neural networks.

Much as the human brain is adaptable and can control different aspects of the human body, these networks can be adapted to work in different robots and different scenarios. Early signs of this work show promising results. For a long time, robotics research was an unforgiving field plagued by slow progress.

Other robotics instituting cn negi melon where partek works. He says there used to be a saying that if you touch a robot, you add one year T, L, P, H, D. Now he says, students get exposure to many robots and see results in a matter of weeks.

What separates these new crop of robots is their software. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and just their behavior recordings. Ly, at the same time, new cheap er hardware such as off the shelf components and robots like stretched is making this sort of experimental more accessible.

Broadly speaking, there are two popular ways researchers are using A I to train robots. Patch has been using reinforcement learning A A I technique that allows systems to improve through trial and era to get robots to adapt their movements in new environments. This is a technique that boston dynamics has also started using in its robot dogs called spot in twenty twenty two.

Protect team use this method to create four legged robot dogs capable of scrambling up steps and navigating tRicky terrain. The robots were first trained to move around in a general way in a simulator. Then they were SAT loose in the real world with a single building camera and computer vision software to delete them.

Other similar robots rely on tightly prescribed internal map of the world and cannot navigate beyond them. Portex says the team's approach was inspired by human navigation. Humans receive information about the surrounding world from their eyes, and this helps them instinctively place one fourth in front of the other.

They get around in an appropriate way. Humans don't typically look down at the ground under their feet when they walk, but a few steps ahead at a spot where they want to go. Part tax team trained its robot to take a similar approach to walking.

Each one use the camerata look ahead. The robot was unable to memorize what was in front of IT for long enough to guide its like placement. The robots learned about the world in real time without internal maps and adjusted their behavior accordingly.

At the time, experts told M I T technology review, the technique was a break through in robot learning and autonomy and could allow researchers to build leg robots capable of being deployed in the wild. Protect robot dogs have since leveled up. The teams latest algorithm allows a called robot robot to do extreme park call.

The robot was again trying to move around in a general way in a simulation, but using reinforcement learning, IT was unable to teach itself new skills on the go, such as how to jump long distances, walk on its front legs and lamb up tall boxes twice its height. These behaviours were not something the researchers programmed. Instead, the robot learned through trial and era and visual input from its front camera.

I didn't believe IT was possible three years ago, patti says. In the other popular technique called ima, learning models lead to perform tasks by, for example, imitating the actions of a human teller, Operating a robot or using A V, R headset to collect data on a robot. It's a technique that has gone in and out of fashion over decades, but has recently become more popular with robots that do manipulation tasks, says ross heark, vice president of robotics research at the toyota research institute and n.

MIT professor. By paring this technique with generative A I, researchers at the toyota research institute, colombia university and MIT have been able to quickly teach robots to do many new tasks. They believe they have found a way to extend the technology propelling generative ally from the realm of text, images and videos into the domain of robot movements.

The idea is to start with the human who manually controlled the robot to demonstrate behaviors such as whisking eggs or picking up plates. Using a technique called diffusion policy. The robot is unable to use the data fed into IT to learn skills.

The researchers have taught robots more than two hundred skills, such as peeling vegetables and pouring liquids, and say they are working toward teaching a thousand skills by the end of the year. Many others have taken advantage of generative a as well. Covariant a robotics start up that spunk from open ales now, shuttle robotics research unit, as built multimodal model called R F M one IT, can accept prompts in the form of text, image, video, robot instructions or measurements.

Generative A I allows the robot to both understand the instructions and generate images or videos relating to those tasks. The toyota research institute team hopes this will one day lead to large behavior models, which are analogous to large language models. Says he drink another.

People think behavior cloning is gonna get us to a ChatGPT moment for robotics, he says. In a similar demonstration earlier this year, a team at stanford managed to use a relatively cheap off the shelf robot costing thirty two thousand dollars to do complex manipulation tasks such as cooking shrimp and cleaning stains. IT learned those new skills quickly with the eye called mobile allow ha, a loose cryan for a low cost, open source hardware telly Operation system.

The robot learn to cook shrimp with the help of just twenty human demonstrations and data from other tasks, such as tearing off paper town or piece of tape. The stanford researchers found the day ali can help robots acquire transferable skills, training on one task can improve its performance for others. This is all laying the groundwork for robots that can be useful in homes.

Human needs change over time, and teaching robots to reliably deal a wide range of tasks is important as IT will help them adapt to us. That is also crucial to commercialization. First generation home robots will come with a hefty Price tag, and the robots need to have enough useful skills for regular consumers to want to invest in them.

For a long time, a lot of the robotics community was very sceptical of these kinds of approaches, says Chelsea fin, an assistant professor of computer science and electrical engineering at stanford university and an adviser for the mobile. Finn says that nearly a decade ago, learning based approaches were rare at robotics conferences and dispersed in the robotics community. The natural language processing boom has been convincing more of the community that this approach is really, really powerful.

SHE says. There is one cat. However, in order to imitate new behaviors, the A I models need twenty of data. Unlike chatbot, which can be trained by using billions of data points hoovered from the internet, robots need data specifically created for robots.

They need physical demonstrations of how washing machines and fridge are opened, dishes picked up, all laundry folded, says Laural pinter, an assistant professor of computer science at new york university. Right now, that data is very scarce, and IT takes a long time for humans to collect. Some researchers are trying to use existing videos of humans doing things to train robots, hoping the machines will be able to copy the actions without the need for physical demonstrations.

Pintos lab has also developed a neat, cheap data collection approach that connects robotic movements to desired actions. Researchers took a reacher rubber stick similar to the ones used to pick up trash and attached an iphone to IT. Human volunteers can use this system to film themselves doing household chores.

Mimmo king the robots view of the end of its robotic calm. Using this standing for stretches, robotic calm and an open source system called doby, pintos team was able to get a stretch robot to learn tasks such as pouring from a cup and opening shower curtains with just twenty minutes of iphone data. But for more complex tasks, robots would need even more data and more demonstrations.

The requite scale would be hard to reach with dobby, says pino, because you'd basically need to persuade every human on earth to buy the richer rubber system, collect data and upload IT to the internet. A new initiative kick started by google deep mind called the open x embodiment collaboration, aims to change that in twenty twenty three. The company partnered with thirty four researchers, labs and about one hundred and fifty researchers to collect data from twenty two different robots, including hello, robot stretch, the resulting data set, which was published in october twenty twenty three, consists of robots demonstrating five hundred and twenty seven skills such as picking, pushing and moving.

Sergey levine, a computer scientist that you see burkey, who participated in the project, says the goal was to create a robot internet by collecting data from labs around the world. This would give researchers access to bigger, more scalable and more diverse data sets. The deep learning revolution that LED to the generative A I of today started in to twenty twelve with the rise of imagine of last online data set of images.

The open ex embodiment collaboration is an attempt by the robotics community to do something similar for robot data. Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots called on T, X.

That could be either run locally on individual labs, s computers or access to fly of the web. The larger web accessible model was retrained with internet data to develop visual common sense or a baseline understanding of the world from the large language and damage models. When the researchers ran the R, T, X model on many different robots, they discovered that the robots were able to learn skills fifty percent more successfully than in the systems each individual lab was developing.

I don't think anybody saw that coming, says Vincent van hook, google deep mines, head of robotics. Suddenly there is a path to basically leveraging all these other sources of data to bring about very intelligent behaviors in robotics. Many roboticists think that large vision language models, which are able to analyze image and language data, might offer robots important, hence us, to how the surrounding world works.

Van hook says they offer semantic clues about the world and could help robots with reasoning. Did you see things and learning by interpreting images? To test this, researchers took a robot that had been trained on the larger model and asked IT to point to a picture of Taylor swift.

The researchers have not shown the robot pictures of swift, but I was still able to identify the pop star because he had a web scale understanding of who he was, even without photos of her in its dataset, says van hook. Ban hook says google deep mind is increasingly using techniques similar to those IT would use for machine translation to translate from english to robotics. Last summer, google introduced a vision language action model called r two.

This model gets its general understanding of the world from online tax and images IT has been trained on as well as its own interactions in the real world. IT translates that data into robotic actions. Each robot has a slightly different way of translating english into action, he adds.

We increasingly feel like a robot is essentially a chatbot that speaks robots, van hook says. Despite the fast pace of development, robots still face many chAllenges before they can be released into the real world. They still wait too cluster for regular consumers to justify spending tens of thousands of dollars on them.

Robots also still lacked the sort of common sense that would allow them to multitask, and they need to move from just picking things up and placing them somewhere to putting things together, says gold berg, for example, putting a deck of cards or a board game back in its box and then into the games cupboard. But to judge from the early results of integrating A I into robots, roboticists are not wasting their time, says pinter. I feel fairly confident that we will see some sembLance of a general purpose home robot now will IT be accessible to the general public.

I don't think so, he says. But in terms of raw intelligence, we are already seeing signs right now building the next generation of robots might not justice st. Humans in their everyday chores or help people like Henry Evans live a more independent life.

For researchers like pinter, there is an even bigger goal site. Home robotics offers one of the best benchMarks for human level machine intelligence. He says, the fact that a human can Operate intelligently in the home environment yards means we know this is level of intelligence that can be reached.

It's something which we can potentially solve. We just don't know how to solve IT, he says, behre in jane Evans, a big win would be together robot that simply works reliably. The stretch robot that the Evans is experimented with is still too bugging to use without researchers present to trouble shoot.

And their home doesn't always have the dependable wifi connectivity, and we needs in order to communicate with stretch using a laptop. Even so, Henry says one of the greatest benefits of his experiment with robots has been independent. All I do is lay in bed, and now I can do things for myself that involve manipulating my physical environment.

Thanks to stretch for the first time in two decades, Henry was able to hold his own playing cards during a match I kicked everyone's, but several times he says, O, K, let's not talk too big here. Jane says, and loves you were listening to the robots we've always wanted. Written by Melissa haya, this article was published in the main june twenty twenty four issue of M I T technology review and was read by.

Is robotics about to have its own ChatGPT moment? 26:17 Share

MIT Technology Review Narrated

Deep Dive

Shownotes Transcript

Is robotics about to have its own ChatGPT moment?