We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The State of AI for Robotics

2025/3/14

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive Transcript

People

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

主持人：谷歌发布了专为人形机器人设计的Gemini Robotics AI模型系列，这标志着具身AI领域发展迅速。具身AI，特别是用于通用任务的AI模型，研发难度很大。目前的人形机器人需要针对每个动作进行专门的训练，例如特斯拉的Optimus机器人。谷歌DeepMind的新AI模型Gemini Robotics旨在解决这个问题，它具备通用性、交互性和灵活性三个主要特性。 Gemini Robotics由高级视觉语言动作模型和Gemini Robotics ER（具身推理）两个模型组成，前者处理多模态输入，后者负责空间推理和规划。 Gemini Robotics能够胜任各种各样的任务，包括以前从未在训练中见过的任务，例如放置水果、折叠塑料袋和制作折纸鹤。推理模型Gemini Robotics ER可以帮助提高机器人的新任务执行能力，使机器人能够进行更复杂的规划和操作，例如玩井字棋或拼字谜。谷歌的突破对整个行业都有影响，Figure AI等公司也正在开发类似的模型，并开始在现实世界中部署机器人。中国公司也在生产机器人，但其技术可能不如谷歌的先进。英伟达公司虽然没有生产机器人，但其AI技术可以用于机器人训练，其Cosmos World Foundation模型可以创建虚拟模拟环境用于机器人训练。风险投资家认为具身AI即将迎来拐点，Dexterity Inc.和Aptronic等公司获得了巨额融资，表明投资者对该领域的信心。 Aptronic公司与谷歌合作，使用谷歌的AI模型来驱动其机器人，这表明谷歌的模型在该领域处于领先地位。

Deep Dive

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, Google's new model for embodied AI. Before that in the headlines, more information on Google's investment in Anthropic. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪

We kick off today with a report from the New York Times around Google's relationship with Anthropic. The headliner statistic was that documents obtained by the New York Times show that Google owns about 14% of Anthropic. Now, of course, we knew that Google had been an investor in Anthropic, so that's nothing new. Instead, this just gives a little bit more of a background picture around one of these very interesting deals that is, quite frankly, novel to the AI space.

Open AI's deal with Microsoft set the template for this, and the catalyst for it is the fact that AI needs so much money that the traditional venture capital establishment, which kind of taps out at a billion or two billion dollars usually, just couldn't keep up with the demand for tens of billions of dollars of capital. That effectively left the frontier labs having their only choice being to team up with one of the big tech giants. Part of the reason that the news media is so interested in this is that it's caught up in the Google antitrust case.

You might remember that back in August, a federal court found that Google had acted as a monopolist in internet search, and the Justice Department has made a set of proposals around how to remedy the situation, including forcing Google to sell any AI products that could possibly compete with search. That puts their relationship with Anthropic, whose Claude Chatbot is used as a form of search by some, squarely in the crosshairs. Now, Anthropic has argued that Google should not be forced to divest. They

They said that a forced divestment would, quote, harm both Anthropic and competition more generally. They said that it would depress Anthropic's value and hinder its ability to raise capital. Ultimately, this is just another interesting artifact in what is a fast-changing financial landscape alongside the AI startup scene. Speaking of the fast-changing AI startup scene, a company that has gotten more attention than just about any other over the last week or two is, of course, the AI agent startup Manus.

Well, that company has now announced that it's teaming up with Alibaba to be officially able to launch their product in China. In a statement, they said that they were engaging in strategic cooperation with Alibaba's Quen team to, quote, meet the needs of Chinese users. Basically, the deal right now is that if you are releasing an artificial intelligence product for the Chinese market, you have to work with a Chinese AI company. This is why, for example, Apple hasn't released even their basic Apple intelligence features in the country because they've been working to finalize that set of partnerships.

Given the excitement around Manus right now, TP Huang captured a lot of the sentiment when they wrote, Alibaba Cloud will need a whole lot more compute.

Speaking of Alibaba, that company has also released a new AI model they're calling R1 Omni, just firmly in the line of just great, memorable AI model names that they claim can read human emotions. The team published demos that showed the functionality in interpreting video inputs. In the video, a man in a brown jacket stands in front of a vibrant mural. His facial expression is complex with wide eyes, slightly open mouth, raised eyebrows, and furrowed brows, revealing surprise and anger.

Speech recognition technology suggests his voice contains words like you, lower your voice, and freaking out, indicating strong emotions and agitation. Overall, he displays an emotional state of confusion, anger, and excitement. While the specific use cases haven't been articulated for this, Bloomberg suggested it could be a way for Alibaba to keep up with OpenAI's GPT-4.5. On launch, OpenAI had said that their new model had, quote, a better understanding of what humans mean and interpret subtle cues or impact expectations with greater nuance and EQ.

Lastly today, beleaguered Intel has announced a new CEO, renewing hopes, at least among some, that the struggling company could be revived. Three months ago, Pat Gelsinger was fired as CEO after a four-year stint. He was installed at the head of the company in 2021 with a mandate to rationalize the business and turn things around. By the time he was ousted in December, however, it looked as though the once-great U.S. chipmaker was going to be sold off for parts.

A few months went by with various merger and acquisition rumors. There were even reports that the Trump administration was pushing a shotgun arrangement with TSMC, who would take over chipmaking boundaries. The board, though, has now named Lipu Tan as the new CEO. Tan is a 40-year veteran tech investor and served on the board since 2022. He resigned from his board seat last year, reportedly due to disagreements on how to turn the company around. And when he did resign, that left the board with a sum total of zero members with any experience in the semiconductor industry.

Now at the helm, Tan will be allowed to put his recovery plan into action. In a statement, he wrote, Following the appointment, though, news broke that the TSMC takeover plan is still alive. TSMC has pitched NVIDIA, AMD, and Broadcom on taking shares in a joint venture that would operate Intel's foundries. TSMC would take the lead role in operating the business but would not own more than 50% of the joint venture.

This would help ameliorate concerns from the Trump administration about a foreign company owning critical US-based chipmaking facilities. According to Reuters sources, Intel board members have backed a deal and held negotiations with DSMC, while some executives are firmly opposed. We'll have to see if that goes through, but overall, Wall Street likes the deal, Wall Street likes the new appointment, with Intel stock up 11% in overnight trading. That's going to do it, however, for today's AI Daily Brief Headlines edition. Next up, the main episode.

We talk a lot about agents on this show, but if you've ever thought to yourself, I don't want to talk about agents anymore. I just want to actually build and deploy something. I'm really excited to share something special with you today. We've partnered with Lindy to offer companies that just want to dive into the deep end of agents, a way to get their feet wet, a way to move fast and build something meaningful without breaking the budget.

The first five companies that email me, nlw.bsuper.ai, with Lindy in the title, will have access to work with Lindy to build an actual functional agent serving their specific needs for under $20,000. Some of the agents you can build include a customer support agent, maybe automating responses on your website.

You could build an SDR for generating or qualifying sales leads, or you could build an agent that's perfectly suited for your internal communications needs, be it note-taking, scheduling, or something else. Not only is Lindy structured to integrate with all of the places that you already keep data and information, it's also a full extensible platform, which means as you hire more and more agent employees and really build out your digital workforce, Lindy is going to enable those agents to be interoperable and basically be able to work together in a seamless way.

So again, if you are interested in diving in all the way to agents in a matter of weeks, not months, not years, email me, nlw at bsuper.ai, put Lindy in the title, and let's get your first digital employee online. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded.

Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk.

Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk and prove security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. Hey listeners, are you tasked with the safe deployment and use of trustworthy AI? KPMG has a first-of-its-kind AI Risk and Controls Guide, which provides a structured approach for organizations to begin identifying AI risks and design controls to mitigate threats.

What makes KPMG's AI Risks and Controls Guide different is that it outlines practical control considerations to help businesses manage risks and accelerate value. To learn more, go to www.kpmg.us slash AI Guide. That's www.kpmg.us slash AI Guide.

Today, we're going to do that thing where we take a bit of contemporary news and use that as a lens to look at a broader set of updates that have happened over the last few weeks. And as I mentioned, we are talking today about the intersection of AI and robotics. Now, the specific catalyst for this conversation is that Google has released a new family of AI models that are specifically designed to drive humanoid robotics, meaning it's a good time to talk about embodied AI.

This is a field that is moving extremely quickly, and a big part of that is driven by the advances in the AI models that actually power the robotics. It's less than six months since Elon Musk unveiled Tesla's Optimus robot at the big splashy Robotaxi event. And while those robots were visually impressive, it came out in the following days that the robots were largely being controlled by remote from behind the scenes. And as much as that was fodder for the Elon haters, it also reflected the fact that embodied AI is really hard.

especially when it comes to AI models that work for generalized tasks. Humanoid robots have so far required specific training for each action, with the AI models largely helping with edge cases and little deviations. For example, the Optimus robots could easily mix a drink during the demo, likely because they were trained to do that. However, they would have had difficulty if a patron asked to shake their hand without a human controlling them. That's the problem that Google DeepMind's new AI model is trying to solve.

Called Gemini Robotics, the new model is built on top of Gemini 2.0, inheriting Gemini's native multimodal functionality, meaning that the model can process visual, text, and audio inputs. In their announcement blog post, DeepMind wrote, To be useful and helpful to people, AI models for robotics need three principal qualities.

They have to be general, meaning they're there to adapt to different situations. They have to be interactive, meaning they can understand and respond quickly to instructions or changes in their environment. And they have to be dexterous, meaning they can do the kind of things people generally do with their hands and fingers, like carefully manipulate objects.

DeepMind has actually built a pair of models to drive different parts of the functionality required for generalized robotics. The first is their Advanced Vision Language Action model, which is functionally similar to other multimodal LLMs, but includes physical actions as a new mode of output. The second is called Gemini Robotics ER, short for Embodied Reasoning. The model takes the premise behind reasoning models and applies it to physical environments. As DeepMind put it, the model has "advanced spatial understanding."

Now, as an interesting note, this is similar to the way that the current generation of AI agents are being designed. Agent builders typically use a reasoning model for planning and analysis of the situation and then hand that off to a separate model for execution, meaning that it's not unreasonable to think of embodied AI as agents with eyes and hands. DeMind says the Google robotics model, quote,

and solve a wide variety of tasks out of the box, including tasks it has never seen before in training. As the model is built on top of an LLM, it has a general understanding of language inputs and can take instruction in natural language. One of the demo videos shows a table with a variety of fruit and containers laid out. The embodied AI receives a voice command and deftly places the banana in the clear container without having any specific training on that task. Google also demonstrated a big step up in fine motor skills, with the embodied AI able to close a Ziploc bag and even make an origami crane.

The reasoning model, Google Robotics ER, is added to help increase the robot's ability to plan for novel task execution. Dmind writes: "Combining spatial reasoning and Gemini's coding abilities, Gemini Robotics ER can instantiate entirely new capabilities on the fly. For example, when shown a coffee mug, the model can intuit an appropriate two-finger grasp for picking it up by the handle and a safe trajectory for approaching it." Functionality from reasoning LLMs also carries over into the real world.

meaning the robots can do things like play tic-tac-toe or complete a word puzzle using Scrabble tiles. The key breakthrough here is that this system of models allows robots to move from a narrow range of specific tasks to much more generalized applications. Kirthana Gopalakrishnan, who works on the Embodied AI team at DeepMind, posted, Gemini Robotics is out and is the most advanced VLA in the world. I'm especially blown away by the instruction following results. It's the first time where I've personally felt that building generic embodied intelligence is within reach, like a robot coming to life.

Bloomberg's Mark Gurman pointed out that the implications are for much more than just Google DeepMind. He said, And that's what a robot is.

Now, Google aren't the only ones that have been working on this form of embodied AI models. In early February, Figure AI ditched their partnership with OpenAI to use their own models developed in-house. A few weeks later, we got a look at what these models can do. The demo video showed a pair of robots working together to pack away a grocery delivery. The robots had never seen the items before, but were able to reason about where the ketchup bottle should go in the fridge. If one's trying to make direct one-to-one comparisons, some might think that this demo wasn't as impressive as Google's demos from this week, with

with the robots acting much more slowly, seeming less dexterous, and promising a more limited range of tasks. But on the other hand, Figure AI has their own humanoid design and production, while Google were demonstrating their software on hardware sourced from other companies. Still, both companies seem to be working on the same basic system design of pairing a reasoning model with an execution model. When they dropped the OpenAI deal, Figure AI CEO Brett Adcock said, "...we found that to solve embodied AI at scale in the real world,

you have to vertically integrate robot AI. We can't outsource AI for the same reason we can't outsource our hardware. And Figure AI has begun deploying their robots in real-world settings. They have one pilot program currently underway in the BMW manufacturing plant in South Carolina, and a second undisclosed contract that the company says could potentially allow them to reach 100,000 robots shipped. The company indeed showed a video of robots sorting parcels, making many think that the client is one of the large US shipping companies.

These are both commercial clients, but much of the excitement and appetite, at least from an investor perspective, is what seems to many as the inevitable future of bringing humanoids into the household setting. Figure AI also seems to have demonstrated that humanoid companies are past the speculative phase, at least in terms of their valuations. Last February, during their Series B, the company was valued at a very decent $2.6 billion. But last month, Bloomberg reported that they are in talks to raise their Series C at a valuation of $39.5 billion.

Of course, we are now also living in the world of deep-seeking manas, and everyone is wondering what's going on in China. It feels like every day on X, you can see a video of some Chinese-produced robot carrying out some feat of dexterity. Earlier this month, one company called X-Robot went viral with an extremely lifelike female robot with a good voice model behind it.

Now, this video that you're watching here had the sci-fi factor turned all the way up, so who knows how real the product is. Then again, with what we've seen out of Chinese AI in recent months, I certainly wouldn't count it out. One Chinese company that is definitely producing real products is Unitree. They had a huge range of robots and assorted form factors on display at CES in January. You also might have seen the company's latest viral video showing a Kung Fu robot kicking a stick out of a person's hand. Now, many of the videos from trade shows still have a human operator in control.

which gets us exactly back to why, potentially, this Google model is such important news. As Google may have just demonstrated a path to fill in the blanks where Chinese embodied AI is lacking.

Right now, Unitree is offering these G1 units starting at $16,000, but you have to think those prices are going to come down precipitously in the years ahead. Another key player in embodied AI that's worth mentioning in this roundup is NVIDIA. The chipmaker isn't working on robots per se, but they've definitely made some big advancements in the AI used to train them. In January, NVIDIA released their Cosmos World Foundation model. The generative model can be used to create virtual simulations of real-world scenarios for robot training.

Improvements in world models have been one of the big breakthroughs over the past few months, with several startups showing off their own versions of the tech in development.

The idea is that a digital twin of a robot can be placed in a simulation, which allows synthetic training data to be quickly generated. This doesn't help necessarily with the reasoning and generalization problem that Google is working on, but it does allow for big improvements in dexterity and specific movement training. The Cosmos reveal in January also came with some very bullish statements from NVIDIA CEO Jensen Huang. He said the chat GPT moment for general robotics is just around the corner. He also delivered his keynote address standing in front of a chart showing the AI sector going exponential.

After agentic AI, the wave that we're currently in the middle of, the chart spiked even higher for physical AI, consisting of self-driving cars and general robotics. During the speech, Huang said that self-driving cars would likely be the quote, first multi-trillion dollar robotics industry. And while at this point we haven't seen anything that looks close to a fully capable general purpose humanoid, Huang did mention that he expects NVIDIA's products to power a billion humanoid robots over the coming years.

So far, I've hit a lot of the biggies, but even beyond these companies, VCs are definitely sitting up and paying attention to the potential inflection point we're hitting with embodied AI. Earlier this week, Dexterity Inc. raised $95 million at a $1.65 billion valuation to build robots capable of human-like dexterity. The company's pitch is remarkably similar to the way Google described their criteria for generalized robotics. CEO Samir Menon described that his robots can touch and recognize objects, are aware of and respond appropriately to surroundings,

and will move gracefully and adjust as needed. He added, "'The combination of those three is what we engineer and what we believe will drive the future of physical AI.'" Rivier's Jane, a partner at Lightspeed Ventures, said he was investing more money in the company because he believes we're reaching an inflection point for physical AI.

Also, last month, a startup called Aptronic raised $350 million in Series A funding at an undisclosed valuation. The company is a spin-out from the University of Texas and has been working on humanoid robots for over a decade. The round included participation from Google, with DeepMind partnering with the company to provide the AI to drive their robots. In fact, you could see the Aptronic robots putting Google's embodied AI through its paces in the demo videos from this week. The raise was vastly more money than the $28 million the company had raised prior to this round.

And CEO Jeff Cardenas commented that the mega round was necessary because his robots are almost production ready. He said, "What 2025 is about for Apptronic and the humanoid industry is really demonstrating useful work in these applications with these initial early adopters and customers, and then true commercialization and scaling happening in 2026 and beyond." Explaining the Google partnership, Cardenas said it made far more sense than creating their own models, adding, "We believe that right now, Google is at the top of the game in building some of the best models in the world."

So friends, that is a quick update on the state of embodied AI, the intersection of AI and robotics. And that is where we will wrap today's episode. Appreciate you listening as always. And until next time, peace.

The State of AI for Robotics 19:03 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

The State of AI for Robotics