We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Missing Data Stack for Physical AI

2025/7/1

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Demetrios

Nico West

Topics

Nico West: 我认为“物理AI”这个术语是由Jensen推广的，它涵盖了使用AI来分析或作用于物理世界的产品。它包括智能机器人、空间计算等，这些都与智能软件和现实世界的互动有关。我认为智能软件与现实世界的互动应用广泛，包括安全应用和体育分析等。 Demetrios: 我认为物理AI是在物理空间中，我们可以触摸到的，与现实世界互动的AI。

Deep Dive

Chapters

This chapter explores the definition of Physical AI, contrasting it with traditional robotics. It highlights the increasing use of AI in real-world applications and the opportunities it presents.

Physical AI encompasses AI applications interacting with the physical world, including robotics and spatial computing.
The term 'Physical AI', while relatively new, is gaining traction as a more inclusive term than robotics.
Many applications of physical AI, such as security systems and sports analytics, were not previously considered under the umbrella of AI or robotics.

And it's not time, it's like, oh, this is annoying to wait for. It's like the world continues to do things, you know, independent on how fast or slow you are. Like if you're a slow person, that doesn't change the speed of the rest of the world, right? And that's the same with a robot. Okay, so I'm Nico West, the CEO of Rerun. And for my bulk use of coffee, it's pretty, it's a filter coffee with milk. But I'm an enjoyer of many kinds of coffee. ♪

I think we should kick this off with an overview of your opinions on physical AI versus robotics. Because I've heard about robotics and I've also heard from people who are doing robotics that the big letdown is there's not a lot of AI inside of robotics. Most robotics companies that you hear about. Sure. So I think...

Physical AI as a term, I at least saw it popularized by Jensen quite recently. So therefore, it's obviously subject to a lot of the problems of hype, right? But I was really happy when that happened because we've been looking for a term that kind of encompasses products that use AI or, you know,

AI in the broadest sense, maybe. You know, the classic, I just algorithms that do intelligent stuff to like really deep models and that kind of stuff. And that kind of basically applies intelligence to the physical world, either to just analyze it or to do something in it. So I include like intelligent sort of, or maybe somewhat autonomous robotics and that, but also spatial computing and sort of, I don't know, we thought of something like long tail robotics

physical intelligence like i don't know like security applications uh yeah they're just a lot of different i don't know sports analytics so just a very large amount of stuff you might want to do with intelligent software that's somehow interacting with the world so i put personally or we put with all of those things into the bucket of physical ai

like video games too when you're grabbing all those sensors yeah I think well for our what V-Run is doing I think it's very relevant maybe I don't put video games into physical AI even though it's so close that you could talk about it I think there's also the other adjacent things are like like generative media you could totally put in there there's so many

uh similar kind of patterns to the software and the data and how you build these products that some aspects are like physical uh sort of generative media particularly if they're like video and they're trying to be more like realistic or so on but that that's close enough that you could in some some cases talk about that too but i think most people don't mean that when they say physical ai so um i think

I think there's a span of people meaning like the stuff that I said, which is like the broad sense to some people just meaning it's robotics with a cool name. I think both of those happen. Basically, if I'm grasping it correctly, it's AI that's out in the world, in the physical space, almost in a way that we can touch it. It's tangible. Yeah. Like interacting with the real world. I think...

not on the computer it's easy for us like um tech people to forget but in most of the world's gdp takes place in the physical world and um because historically software has really not participated in that other than maybe administering stuff so i mean you have software for maybe managing a doctor's appointment but it's not us it's not a robot like doctor right um

You have maybe software for keeping track of the building, like the schedule of construction or something. Or sending the invoices. Exactly. But you don't have software or tech that is just doing all the construction. So I think that's what we have in front of us, that that's really happening right now and becoming possible. And so in that sense, in the broad sense, I think physical AI is set up to transform things.

huge, huge part of the economy. So I at least believe that it has the potential and looks like it's going to also do that to be one of the biggest changes to the world economy sort of in history. Now, I happen to agree with you, but I also want to raise concerns

a point, which is we've been hearing that same thing from those IoT folks for the past two decades. And I still have not seen IoT totally transform the way we live, right? Maybe there's cool stuff that you have with smart homes, if you're really into that. Or I noticed that certain parking garages have sensors on if there are free parking spots.

None of those I would bucket into life transformational. Got it, yeah. I'm not going to defend IoT hype. I never understood it personally. But certainly with anything new, right? It hasn't happened yet, so it could not, right? Yeah. So it comes down to belief, I guess. I come down to thinking, the thing I maybe never understood with IoT was like, it sounded like a lot of very people who like tech

talking about like what is it it's like yeah you're connecting everything it's like that's not in itself solving a problem maybe you used to solve one but it's not actually describing something that you could do but i think performing work in the real world or or sort of automatically understanding what's going on in the real world those are like that's work like that's well that like that's how you unlock value for for real people i think that's it's pretty unambiguous

So in that sense, that is pretty different. But certainly it's up to if the technology works and if they can be brought to market in an effective way. But I think these categories are very different in those ways. Why now? Why do you think that we are on the precipice of things changing? I think it's mainly the AI part of physical AI, right?

And that's not said that all the great solutions will be dependent, like where AI will be the most important thing. But the real world is super complex and this just sort of unending complexity. There's a long tail of just things that can go wrong. The world world is super messy. It's very hard to build super general products that serve very large markets with a

when the software is not intelligent enough in that like fussy way like not able to handle ambiguous sort of fussy situations or where things change so because with classic tech like for the physical world the way to make an effective product is to constrain the use a lot

so that you can make right like a person can write an algorithm that handles each situation and this would be like what we see these days with the coffee making robots i think that's the yeah i just constrain it maybe maybe it's a coffee making robot or

Or something like one cell in a manufacturing line. It's like super repeatable, but you just constrain it a really, really lot. And if that thing is valuable enough, you can put all the effort into just making something for that. But a lot of the physical world things that are out there are much more messy than that. So basically you need something that's more flexible, more able to handle ambiguity. And that's really the kind of

That's what the technology of modern ML and AI is about enabling. So I think that's one. And what that enables then is if you can address a larger market, you can invest more into the hardware. The hardware is also important. It's not only AI, but you need to be able to invest in the hardware as well to make it good, to make it high quality, but also low cost. That comes from scale. Hardware is a scale game.

So, and when you have scale, then you can generate sort of, you get lots of side benefits. You think of like what happened in mobile phones, right? That generated a huge amount of like an ecosystem that produces components, right? But then you can use to create reasonably priced other, like sort of more niche products. So the mobile phone ecosystem drove the ability to make drones, right? Then you can make a good cheap drone because of that.

I'm dropping the word, but because of that ecosystem, basically. Almost like you have this innovation and there's those secondary and tertiary effects. So I think that's really the big thing that a couple of things need to come together. First, you need to have a technology that can handle this messiness, which enables you to build hardware products that serve much bigger markets. That enables you to invest more

heavily into those products which gets you the ability to get that scale and that kind of gets you that like scale flywheel going and particularly for AI and so on you actually need that flywheel as well for data collection but so for really good AI you need lots of hardware to collect data um

to improve the models that then allows you to deploy again and get better data because they're now doing more advanced things, right? So you need that flywheel going. And you also need the scale flywheel that is what leads to good hardware, like effective hardware products at a good price. So to get that ball rolling, you also need hype.

that's actually a really important component that you need to be able to believe in the future and invest deeply into it i think the chat gpt like llm side of ai has provided that uh so so that started that out and that i think generated a lot of interest and then there's been some like within the field of robotics there's been some some big sort of breakthroughs i guess

in methods like scalable robotics learning methods, which really has been a dream for a while, as my understanding at least, but not the reality. So with scalable AI, I just mean scalable in the same way that you tend to talk about AI, right? You can throw more data and more compute at it and it gets better.

Yeah. And like, we take that for granted now that the lens, but that was, has not been the case in robotics for forever. Right. But a couple of years ago, uh, this, I was aware of the, the first line of papers that showed those properties for these, uh, RT one, RT two, RT X kind of line of papers. I don't know if you know them. Yeah. There may have been something else that really started everything off, but that's from my perspective, what I, what I saw.

And that also kicked off like, okay, we're seeing scalable methods here too now. So that, from my perspective, is what like the combination of like having seen that with LLMs and then seeing that now also in robotics method is what started to really, really get the ball rolling. And now like the hype is very, very real in this space and it's a huge, huge amount of investment. And I think that that is for particular when hardware is involved, that's actually necessary. So it's a mega long-winded...

answer, I think, to that question. But I think those are the reasons for why it's now. Can you break down the life cycle of how physical AI is trained? Like what models are we using? What are the ways that we're collecting data? Is it all through cameras? Is it through other sensors? And how the platforms look? What do you need to enable if you want to be putting these models out into the world? Because I think

It has a lot of extra complexities since you are deploying to the edge in a way, but I don't know how much of...

edge deployments can you then also offload certain tasks to the cloud like what does that whole thing look like i feel like i am not clear and it's of course case dependent yeah maybe we can just take one specific case and talk through that sure um oh yeah it's super case but they're so complex and you can i guess imagine any solution like if you can imagine some setup someone's doing that

But maybe super high level. I like to think of like the two major systems that you need to think about as like the online systems and the offline systems. So with online systems, I mean just the things that are running as let's, I'm going to say robot now, but it could be some non-robot thing. But the thing that's running like

when the robot is doing stuff in the world. It's running on the robot. It technically doesn't matter. It could be running like some of the, maybe hit an API that runs a model and back. So I include that. But like mentally you can think of like what's running on the robot. That's understanding the world, planning, making decisions, like picking stuff up, like acting or whatever. So that's the online systems. And then you have your offline systems where you're,

basically you're using maybe running stuff on your laptop or workbench or on some data center somewhere. And that's going to be about observability. Like, wait, what's happening right now with my fleet of robots maybe? It's going to be where you maybe prototype new algorithms and new ideas. We run...

like analytics to understand performance or just dig into things like they're just trying to understand your data that you're collecting and where you uh curate and collect and curate and transform like data through data data pipelines into data that's ready for training and then train and deploy and sort of all those those things so i put that in the bucket of offline systems and how many sorry

Just in this fictional scenario, how many models typically would be running on device or online? Yeah, so that's a hard question to answer. But I think we can maybe think about, maybe take a little bit of a historical perspective. So if we start thinking about, so running on device, we're talking about the online systems. So classically, everything that was running online was, again, no machine learning or maybe some...

Maybe you learned some classifier that did something or whatnot, but it's mostly handwritten stuff, acting with 3D planning algorithms and so on. But it's all C++ algorithms written by a robotics engineer or something, optimizing the state of doing SLAM, where is the robot, all that kind of stuff.

And so that's like how things were done before. And then, you know, deep learning happened and might start switching out small modules. Like, oh, we actually, our computer vision sort of works a little bit. So maybe we just detect objects, but then it's just running at some frequency, run one, you know, object detector, right? And

everything else is like handwritten still it's just a little object detector it runs on whatever every fifth frame or every fifth you know camera frame right and we're detecting things but then the rest of all the pipelines are kind of treating that as like nothing special about that it's just some data and we write algorithms to fuse that over time and reason about what to do when and so on so that's maybe the next step but i put that into the

Let's see, when was AlexNet? 2012? Was it 14? I can't remember. Yeah, 12. So maybe in the 2018 Canera style. I just said it with a ton of confidence. I'm going to fact check that right now. I was like, no, no, 12. It's definitely 12. That feels right to me, but it's that range. And then, so then, you know, you start, so that works, and you add some more models into there, and

But it's still just like modular. You have one model. We have many models. Maybe they do like single things. Like maybe you have another model that's like look at some other input signals or images and they output like an estimate of the motion or something like that. So you just have like these small modules. It's more like a library, but then you can think of it as like it's just a function that does stuff.

So that's been sort of, there's been a trend of just more and more of those things, right? And that has a lot of problems because in fact, you can't treat them as black boxes because there's a lot of uncertainty and like, you know, ML models don't, you know, they even like really high performing ones, they only work well when...

They're sort of operating on roughly the same kind of data as they were trained on. Yeah. The only way to, like, and that's hard, right? That's a hard problem. How do you know when you're outside of that data and so on? And then you get a lot of these, stitch them together with, like, handwritten algorithms, and it gets a mess, and it's pretty hard to build complicated systems. And I think this is how people try to build, like, self-driving cars, right?

with this approach and well it didn't really work right so I guess the trend from there is to just be I mean the idea of deep learning is you know do things end to end and that has been happening more and more basically so over time you just take okay now we have like four modules like okay can we swap them all out and have it be like one neural net that does more things end to end so that's I think generally been the trend so that can go quite extreme I think in like

Some of them like very end-to-end focused, like humanoid projects, you could have like two neural nets or maybe one, they call it one, but it's really like two. And then you might have one

Lower level one that's faster and smaller that's focused on like fast low level like whole body control. So it's really taking in like IMU like signals and maybe pressure and some other like sensors like that. And just it has some target of you know the pose or where it should be. Like how the body pose should be. And it's like basically doing that control that you might previously have done with more classic like optimization based methods.

and then you have some larger neural net that's like maybe no skills like that a higher level and like go reach for this thing or so on and that thing can be slower so that's something like maybe even have a third level above that takes like text input and plans and stuff like that that's if you're like very very ai first but yeah they could swap out pieces of that with with handwritten systems and and so on but

I haven't seen a lot of single neural net things that does everything. I've seen that marketed, but I don't know if that happens in practice. You bring up a great point, is that in these systems, especially the online ones, you are so constrained by different

things because you're out in the world and whether that's you have to be hyper focused on battery or you have to be focusing on speed nobody wants to have a robot that you tell to do something and then 20 minutes later it comes back and is like actually i can't do that i went through and i planned it out and yeah no i researched the topic and i can't get to it right so

What are some other constraints or things that you need to be cognizant of when you're doing stuff in that realm? I think the most important difference that is like, yeah, the most important is time. And it's not time is like, oh, this is annoying to wait for. It's like the world continues to do things independent on how fast or slow you are. Like if you're a slow person, that doesn't change the speed of the rest of the world, right? And that's the same with a robot. If it

It's doing something and it's like, oh, let me grab this thing. And then that thing is moved. And it didn't matter if you collected it. You know, it was going to do it right. It's not there anymore, right? And so on. And that's very different from even your ChatGPT style interaction, right? You would love it to be fast because that feels better. But it's still this like single stream of, you know, you take the inputs and then, you know, process them all and, you know, give you some output, right?

there's not really a concept of a world evolving around that so time is just so because yeah that changes everything really you you need to be much more sophisticated about how you think about that in everything how you understand you know what your software just did right you need to keep track of of how everything evolves over time and you maybe have multiple notions of time like the

compute time, the real world time, whatever's happening in the real world. Then maybe you have an algorithm that takes a certain amount of CPU time or how many certain amount of iterations. Maybe you want to keep track of like, oh, what time was this sampled at? And then when am I making this decision? The decision is a little bit later in time that you've made it because you have to compute stuff, but it's relating to old information. So just dealing with time is the really, really big thing that you get into.

Sounds really messy on the back end too when you're trying to create systems and you need to look at all the different ways of time being interpreted. Yeah, it gets messy. So you need to build, I mean, it increases the complexity of like the data tools that you need, right? It's pretty different than like, oh, I train one image classifier. You can, it's shockingly simple, right? A problem in comparison. And these are,

robotics models or whatever other things they're operating on sequences of time so even that is more complex but then they have some internal like notion of like steps or something and then that in the real world is like oh and sort of overlaid on the real time of the real system they're operating with so that that is time is the really really big thing i'd say yeah you know then there's obviously whatever resource constraints and battery and things like that that are really difficult but

similar to other things like you have constraints and maybe they're more difficult on the edge but it's still the same idea

Well, yeah. Talk to me about the data side of this, because that feels like, again, it would be very hard to deal with all of this different data that you're getting in different formats. And specifically, all of the video data has got to be very heavy. And then how you're training models with the video data, you might have some time constraints.

or just time data and so more tabular style. Exactly. I think we had this idea about the online and offline systems. And so on the robot, right, on the online systems, what you'll do is you're trying to record

what happened. Yeah. And so the real world, you have this, like, things happen at different rates. Maybe you have, you know, image, your videos happening every, like, 30 FPS. But maybe you have, like, motion sensors that are going at, like, 1,000 hertz. So very different rates. Sometimes these things are, like, kind of distributed. Like, a robot can be, like, a distributed system. You even have different clocks and stuff. Oh, wow. All this data changing at, like, different rates. And you're also recording what happens. So...

you don't really know the exact shape of the data set beforehand because you're recording what happened. So these things, like the data that you're recording there is super messy. It's kind of like basically logs, right? But it's logs of multimodal data streams. So lots of different types. It could be like 3D information, this different structure, like this data is often structured in like deeply nested structures and so on. And you have maybe audio and video and...

3D sensors, motion of different kinds, internal metrics. So it's really, really messy and really complex and difficult to handle from a data perspective effectively because you have this problem of combining really fast small signals

with large, heavy, like, you know, big tensors and images and point clouds and stuff like that that maybe are slower. And storing that together is actually pretty hard. So classic robotics, or in general, you'll have, like, on the system, you tend to have stored data to this, like, very specialized file formats that are sort of very write-optimized. Oh, interesting. They're just good at, like, recording exactly what happened and to do minimal operations to just get them onto disk really fast.

That's like the, without like disrupting anything that's running on board. And then, so that's the step one. Then you want to get the data off the robot, upload it. And that's, you know, depending on the volumes, maybe you upload all of it or you have some selective, like only upload when something happened or that kind of thing. But

Somehow you got to get it to the more centralized place where you can use it. And that's where you're throwing it into like an S3 bucket or is it still... I would say like just to make it super simple, right? The absolute most simple like thing would just be, yeah, you periodically write these logs to file and then you have a little job that uploads them to S3, to some S3 bucket and then you have them there. So that would be start one, part one. Wait, so that's simple. What is advanced?

Well, advanced just of getting it off is like, okay, we're going to be run, like collecting so much data that it doesn't even make sense to upload. Like if you think about a self-driving car, they will collect the data when they dock back somewhere. And then you'll just swap out the SSDs, right? And put in some new SSDs. And you may never upload it. Or if you do, you need to send like trucks of SSDs to AWS, right? Yeah.

where you have your own local... So there you could make choices, right? You only upload it when it's needed. You have some kind of storage architecture where you keep everything on a local data center that you have right at where the data is collected. And you just have it there until you upload the metadata and you go fetch it when it's needed. It can get really, really complex at large scale.

But let's keep it simple. You just write these files and upload them. Let's assume that's possible. So even before that, you have another problem. So you want to be able to look at the current state of the robot. So you want to... Visualization is super, super important. Just to... Basically, you want to see what all the... If you're working on a robot, then you want to be able to live...

visualize like all these streams of data like you want to see a 3d if it's work you know if it has a 3d understanding of the world you want to see that like 3d map and you want to see the robot you know walking around in that map and see what it sees and see the internal state of different algorithms and um what are all the camera feeds and you want to be able to scroll back and forth in time right to so wait like something goes wrong you want to scroll back what happened right so

So that's something that you need when you build these kind of systems, just live visualization. And then you want to look at those files that you recorded after the fact, right? And just analyze those. And that's just like a per session kind of observability. So it's a super, super core aspect. Okay. So that's important. And so even before going to offline systems like this,

set of just like recording what happened to some right optimized file and then having some visualizer to like either look at the files or look at you know live like you cannot build these products without that those things and even in classic robotics like that's how you would you need those things and in classic robotics there's a sort of ROS like robot operating system be the most sort of commonly used setup that gives you this like

data recording and some like visualization capabilities and there are like slightly more modern visualizers that are built for that that scenario uh but they're really like and so they're great they work well for that and that's like orvis webvis xvis foxglove there are a bunch of tools in that vein they're kind of like robotics log visualizers really important um and that was like designed for this world of like pre-ml world where

where that was the main complexity of your product, like what ran on the robot. But you need that. But then what happens, then you can think about what happens in the off, sort of what you now want to train, improve your models, right? So you've uploaded this data and the current state of the world, at least, is that you then need to make that data usable by sort of the kind of systems that you have installed

to do ML Ops, right? So ML, you know, data pipelines and so on. So before training, you want to have it in, I don't know, TF records or, you know, whatever, HDF5 files that are, you know, just optimized, ready to train. And those, all of those things tend to be like very structured and they're not like good at storing this sort of messy log style data. On top of that, so, and you want to also run analytics, like,

run some statistical job, compute metrics, all that kind of stuff. So basically all the offline data tooling that's out there, if it's like Databricks or Datadog or whatever things, these tools do not understand this kind of physical AI, robotic-style data. They do not know the storage systems, do not know how to read into these log-structured, messy file formats. They want everything to be like a table with columns and so on.

They don't know how to handle huge unaligned data. And there's no built-in visualization, which is crucial for debugging. So then teams end up building these very complex data pipelines to try and transform and clean the data and do data set curation.

And because those offline systems don't understand really the source structure of the data, these things get super complex. Sounds miserable. Yeah, super miserable. And then it's super complex and really brittle. And then you don't have the ability because you don't have any built-in visualization. So you don't really have the ability to debug. If you're last, like right before training, you just suddenly, like all the data is showing up upside down. Like where did that happen?

like you don't even have built-in visualization so maybe like i've talked to like self-driving companies who who started using rerun and like found bugs like oh we were training on something and you know the some the orientation of something was flipped for two years during training it was giving bad performance and like because of some they no one saw it we didn't have a good it was too hard to like debug the data pipe like the state you know of like after each step in the data pipeline they didn't it was just too hard to do so

So that kind of stuff. Yeah. That stuff gets really complicated. So you just end up like with these robotics companies in a tough spot, right? They end up with two stacks, like classically, you have your like online data system stuff that was built for like classical robotics, but doesn't understand the kind of data.

And you have your offline systems that are built for like large scale learning and stuff like that. But they don't understand physical data. These don't talk to each other. And yeah, it's just, it's a mess. That's the kind of base state of the world. You created some visualizations, right? And you decided to, or tools to help with the visualization and so that the

physical AI can understand the world and you can see where and how they're understanding the world and you decided to open source it. Can we talk a bit about everything that you've been open sourcing until now and the inspiration behind that? Sure. I would just like to frame it first. So what we run as a company is doing, we're basically trying to

to solve the problem I just talked about. So like we want to build like a new unified kind of data stack that handles like both this like online and offline scenarios for physical AI so that you get like a consistent like easy to use experience with kind of built-in visualization and like much more efficient and easy to use like querying and things like that because the kind of the data stack understands both of these kinds of like types of data. Yeah.

Okay, so we started out two and a half, what is it, three years ago roughly, and spent most of the first two and a half years, I'd say, on an open source project. It's called, you know, Rerun, like the company. And that project is focused on like logging and visualizing multimodal data that changes over time. So a broader application than just like, just, but broader than robotics.

So we started out actually focused on more like computer vision outside of robotics and I've kind of expanded into to be much broader and so that is a project that has you have like SDKs in Python, Rust and C++ that allows you to you can think of it like it's like you'd log text or something log a metric something like that but you can log sort of

anything, you know, like a tensor or a 3D point cloud, build up a full like 3D scene of things happening or, you know, normal metrics and video and have everything connected, like cameras moving around and you like hover an image and it will like highlight where that ray shoots out in 3D. So those kinds of things and allows you also to

scroll back and forth in time i just gotta say this is kind of some star wars shit right here this is what you know when they plug into the droids and stuff this is what i imagine they're seeing on their little computer i hope so yeah i hope i hope so and it's pretty cool like if i can you know say so myself it's it's pretty cool application or framework um

So that's, yeah, we've been building that open source project and it's a pretty extreme thing. So we basically like said, okay, none of the old things work. We rebuilt sort of this whole stack, like a data logging and visualization stack from scratch in Rust. We took a lot of inspiration from like how modern gaming engines are built. So the data model is built around like an entity component system. Nice. It's basically more like composable data model.

And so we had this goal that we wanted to unify, if we talked about this online and offline system, we wanted to unify the open source project to unify the visualization side of that. So you should be able to use the same visualization framework for like your like dirty little Python script where you kind of like where you might use like matplotlib or something. You want to just, I have a little algorithm and I want to just pop in some data, blah, you know, just dump it in and it should just show it and then you're going to analyze it, go back in for some time and that kind of thing.

to like your centralized like visualization dashboards i don't know if you've seen this like maybe marketing video from waymo or something where they show all the like lidars and map like things updating on a map and all that kind of stuff so teams you know use rerun to build those centralized things and also actually recently with our last release allow you to build like data annotation apps so like you can have interactivity click on things it'll like respond

back with like the data that you clicked on so you can build like data annotators with it and that's good for these anomalies that sometimes you'll hit or the edge cases you need to i mean we work on yeah you tend to annotate data like label data there's many different that could be whatever right you always need to be doing that yeah that's true there's a lot of that right there could be oh here's something weird happen but it could also be yeah i'm just gonna this is how we annotate data you know and it's

yeah it's basically like wherever you want to look at your data which you should want to do a lot uh you want to have a consistent view you want that to be like you ideally like that to look the same like wherever it is like if it's in production or a little script or or and people use it to visualize maybe the eval their evaluation runs during training like pipe like training pipelines uh just a lot of different things so we wanted it we knew that it was

The goal was to unify all of that, to be able to do it in the same framework. And that required like extreme flexibility and performance and so on. So that's been the goal there. That's a never ending job, but I think we've come quite far. Pretty good adoption. And I think both like spatial computing and robotics and like from, you know, two person startups. And I think now like,

Meta and Apple and Unitary and Hugging Face and I'm forgetting companies, but like they use rerun in open source projects at least. Damn. So it's used, you know, from smallest to the largest. And so that's been really cool to see. And I think it's about, we really focused on like extreme ease of use and flexibility when you want to do like whatever you as a researcher need to do and the performance. Yeah.

So that's been that project. That's open source. It's always going to be open source. And it's almost like you went with the visualization aspect of the open source side rerun. And then when you thought about building an actual product, how did you think, all right, we're just going to complete the cycle and incorporate rerun into a greater platform? We think that they kind of needed to reinvent the whole data stack, right?

And so the open source project forced us to do a couple of things. So one of them is to really develop like a really good data model because you need a data model that's like expressive, but also fit for purpose enough so that it's easy enough to use, but also and composable and flexible and extendable and all that stuff. And that's really difficult. And you need to have something like that that can also be performant together with the right query engine. So those things,

those two parts are kind of, we've been forced to build. So query engine basically to build a visualizer like this, that's fast and flexible and allows you to like scroll back in time. And I talked about this like unsynchronized streams of data. You need to have a, basically build a small query engine or like a small in-memory database to make that work well. So we had to develop that as well. So those are the core pieces.

So the query engine is really focused on time alignment and those kinds of like robotics. I was going to say that's probably like one of the biggest boons that you can give someone is just making sure that all these disparate sources of data can line up. So if there's some kind of an event that I want to look into, I say, what happened there with

How do I get a 360 view of what's going on there as opposed to, okay, I see that something happened with this sensor. Did anything happen with the other sensor? And now I got to go sift through the data and try and figure out where in time that is with that certain data source. Yeah. No, it's both hard and really important. And so that we kind of forced to work on those technical challenges for the open source. And so for a commercial, yeah,

product that we're working on right now it's kind of you can think of it as a database that has that has sort of visualization built in so it's a storage and indexing engine it's a query that basically so this thing is

It's built for the constraints that we have here. So they have source data in varied forms of the right optimized robotic style file formats that you need to have a plugin system so you can support many different. So you need to be able to handle these recording unstructured style data sets of say 100,000 recordings of what happened on a robot. And you also need to understand normal tabular data. So you need to understand both of them.

So like a storage and index engine that can make working with that like fast and unified. I guess a data model that gives a way to consistently interface with this data. So you want to maintain, so you want to visualize any data that you have stored. But you also want to have a query engine that can operate on top of it. So you need this consistent data model to do that.

And then, yeah, the next step above that is the query engine. And there, what you really want is to be able to have this query engine sort of understand the physical AI data model. So what that can mean is like one simple thing is maybe, okay, you have a data pipeline, you have your raw data, and then you run some transformations on it and produce some nice, more structured, easy to work with table. You'd like to not lose all the semantic information. Like what does like the first column mean?

This is a column of 3D point clouds. And you want to know that. Or a column of 3D positions that are part of a point cloud that change over time. That may be one column. Another might be a video. Another might be some sensor reading or something. You want to keep track of what everything means. And if you do, then you can

sort of visualize sort of something and debug like a table that's like five steps into your data pipeline. So that's one of the things you want to have your query engine like maintain that data model. And the other is you want to be able to do sort of robotics sort of oriented operations in the query engine. So imagine like writing a SQL expression and one of the parts of it is like, oh, like doing time alignment, right?

And you might want to do things like do 3D transforms in the SQL expression because you want to have all your data come out in the reference frame of your robot's gripper or something like that. I mean, maybe abstract if you haven't worked with this kind of data, but the ability to kind of push these kinds of operations into Query Engine can make just working with the data a lot easier.

And so that's one of the next part and then kind of a data catalog that can understand this kind of data set. So it really is like the full data stack. So that's kind of what our commercial product is that we're building towards. You talked about these janky pipelines before. Does this eliminate the need to create pipelines or are you still seeing folks create pipelines just with a much more high quality data?

I think, so first off, this is like still in development. We have a couple like, you know, paying design partners working with what we have now, but it's still pretty early. So this is the first to preface it. Yeah. You know, to not, I don't want to like say that we have something that we don't yet. Yeah. But I think our goal is for you not to have to have like any steps in your pipeline. Wow.

um that i think that in all cases that's not achievable but but you should be able to just record and then build up a series of queries and like train off of that directly like that i want i want that to be possible right that's not going to be most efficient or the best way to do things because uh you'll want to yeah that's just not

you want to save the intermediate results and you want to be able to inspect them and do quality control and and like not redo all the computation like during training that it doesn't really make sense efficiency wise or kind of structure wise to do that but i i want to i want it to be possible such that when you want to iterate really fast you just have very very few sort of materialized intermediate steps and then as you know what you want to do you kind of say okay these like

parts of the pipeline should be stored. So you can flexibly choose that. So in reality, I think any company is going to have multiple steps in their pipelines, but hopefully they're a lot easier to manage, a lot more efficient to run and build and a lot less complex than they have to be now. So that's kind of the goal.

Is there anything you want to talk about that we haven't hit on? Yeah, there are things. If you think about the rerun, so the thing that's mostly widely deployed is our open source project. And there are massive, you know, Mag7 style, like companies that have switched over such that at this point, all the computers, for instance, they do use this rerun to like they use to debug it. And that goes from the,

little systems that the researchers do to how they debug their data to kind of really like all the way through so like very wide deployment like that and it's hard to like give a specific like example there but that just reduces friction at every point right yeah and kind of increases productivity like you know how do you even like put the value of like looking at your data it's sort of the core core thing that oils the wheels for for everything you do um

So that on a broader sense, it's that, but I mean, maybe specifically there are things like that, like very well-funded, like self-driving companies finding long, multi-year bugs in their data pipelines that were leading to bad model performance. But after adopting, like using rerun to debug their data pipelines, that's another kind of example of that nature. Mere, how are they using it with the Ray-Ban glasses?

So what's public from Meta is the Aria glasses. That's the new ones that are coming out. No, this is like the research glasses. They're pure data capture devices that are open research. And yeah, they're basically pair glasses with a lot of sensors on them. And so their rerun is...

is the kind of official visualizer for the data sets that come from there. So there's the Ego 4D data set, for instance. They record a bunch of data in the home and things like that. It's also built into the ARIA dev toolkit as the main visualizer there. That project gets used for spatial computing things and now also quite more commonly used in robotics too, to collect robotic-style

but collected by a human and tried to retarget that to robotics applications. So it's used like that. You know what actually I would love to ask you about is you mentioned before there's different schools of thought or ways that folks are trying to implement successful robotics or physical AI. One is as many sensors as possible. The other is the least amount of sensors as possible. Yeah.

Are there other vectors that folks play on that you've seen interesting or surprising for you? Interesting or surprising? I don't know. I think the kind of major vectors I think about are definitely like how deterministic is something, like how okay are you with just saying, ah, you know, we learned to train the model and it seems to perform well versus like, no, I need to have like mathematical guarantees about some behaviors. Mm-hmm.

That would be one. Another one. There's like modularity. So the extreme being like, oh, we have one neural net that does everything. We don't have any code. It's just a neural net. But that's the extreme that you can think about. Another being like, no, it's very important to have modules and to test all the modules separately. And so the extent at which you go after...

and value modularity versus sort of performance is i think another really really big uh big one and then in general like some some teams don't even believe in training at all they think or like they shouldn't you shouldn't use much machine learning at all um that's certainly a bunch of folks like that they're like ah it's you know unreliable doesn't work you just your perception should well maybe they use it for like detection and things like that but like nothing sort of

I don't know, more complex. So you use SLAM to build, you know, 3D maps of the world and you write classic planners that decide how to move and they feel like, oh, all these technologies are proven and you should use that. And for certain applications, that's totally the right way to go. And some people are a bit more purist about it, but that stuff is certainly still around and probably...

the right approach in a lot of more like structured environments you can make that work really well like you know a warehouse or something yeah yeah exactly like if you know what environment you're playing in then there's probably a lot stronger case to make it the least stochastic or what's the word stochastic as possible it's nice to know what's going on right and uh you can make things fast and cheaper and so on it's there's a lot of benefits of course

It comes with a lot of like taking training seriously comes with a lot of costs. It really increases the complexity of your offline systems. And the more end-to-end you go, the more like there's kind of this like you can have simplify your online systems by just going fully end-to-end, but then you get like really complex offline systems like.

It's engineering, right? There's trade-offs. It's not magic. Those are fascinating trade-offs. That's really cool to think about.

The Missing Data Stack for Physical AI 52:42 Share

MLOps.community

Deep Dive

Shownotes Transcript

The Missing Data Stack for Physical AI