We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode The Power of Time Series Data in IoT | InfluxData's Evan Kaplan | Internet of Things Podcast

The Power of Time Series Data in IoT | InfluxData's Evan Kaplan | Internet of Things Podcast

2025/1/28
logo of podcast IoT For All Podcast

IoT For All Podcast

AI Deep Dive AI Chapters Transcript
People
E
Evan Kaplan
Topics
Evan Kaplan: 时间序列数据是带有时间戳的信息,对物联网中的遥测至关重要。传感器是物联网的基础,它们使用时间序列数据进行通信。管理时间序列数据的挑战在于其高摄取率、数据汇总和转换、数据清除以及高基数。InfluxData通过构建具有内置功能的开源数据库来应对这些挑战,优化摄取和响应时间。从一开始就使用时间序列数据库,以便在系统扩展时获得真正的优势。

Deep Dive

Chapters
This chapter defines time series data as timestamped information crucial for telemetry and autonomy in IoT systems. It highlights the role of sensors in generating time series data and the evolution of databases used to manage it.
  • Time series data is timestamped information.
  • It's fundamental to telemetry and building autonomous systems.
  • Sensors generate time series data.
  • Databases have evolved to handle this data more efficiently.

Shownotes Transcript

Translations:
中文

Hey, Evan, welcome to the IT for All podcast. Thanks for being here this week. Hey, thanks for having me, Ryan. Absolutely. Let's kick this off with a quick introduction, if you don't mind. Maybe a little background and who you are, as well as a little intro to the company as well. Where do I start? I'm a native New Yorker who lived on the West Coast since I went to college, the University of Washington. Lived in Seattle for years, started my own startup early stage, grew that quite big through the bubble, had to shrink it post the bubble, the first internet bubble.

And then was able to sell that to SonicWall slash Dell, ran a public company called iPass for six years, and then sat at a venture firm at Paul, the founder of Influx, when we were pre-revenue, and I joined when we were 20 people, and that's been a little over eight years now.

So today's conversation, we have a really interesting topic that I think we touched on briefly in the past, but never really dove into it in a way that I think our audience would benefit around time series data. A lot of our audience, I'm sure, is a little unfamiliar at times with what that topic means. So let's kick this off, if you wouldn't mind talking about what time series data is for those who are unfamiliar, how it kind of fits into the IoT landscape.

You know, at its most basic level, it's the process of collecting and reading time-stamped information. So information that is indexed, ported, captured by what time something happened. And if you look at that, that's a huge set of data out there. And its relevance broadly is about telemetry. The whole process of human design systems, systems overall, is collecting telemetry, reading out the telemetry, iterating on it to improving the system's operation capability, and

You go on a journey of autonomy over time and timestamp data or this kind of telemetry data is critical. And where it relates to IoT is sensors speak time series. That's the language that they speak. Pressure, volume, humidity, light, whatever the measurement is, it's spoken within a timestamp measurement.

What happened? What happened? What happened? What happened? What happened? So time series is just pretty foundational. It's been around for a long time. You know, if you were doing this 25 years ago, you used an Oracle database or an Informix database. If you're doing it 10 years ago, you might have used MySQL or a Cassandra database.

If you're doing it now, you're using this whole category of emergent databases. We're really specialized on time series. And as it relates to IoT specifically, there is a class of databases that would not necessarily describe themselves as databases, but industrial historians that have been built into IoT.

IoT systems, particularly industrial, for a long period of time. So there's a long history here. What's relevant now is just how important that data is now and the ability to collect it and the specialized databases that are emerging to hand. You know, IoT is about connecting to things, sensors, usually physical things, but it doesn't have to be physical things, but usually physical things.

And so IoT is about the instrumentation and operation of the real world. And in the world that we're currently transfixed on LLMs, which is instrumentation operation of the digital world, IoT is about instrumentation operation of the physical world. And so in order to instrument, operate, learn more, evolve the physical world,

you need to collect all the data about what's happening all the time. And that data is, it can be very simple. It can be what was the pressure at time A and what's the pressure at time B, but it can be incredibly complex is what was the pressure? What was the light? What was the humidity? What was the position of the sun? What was the weather pattern, right? What did the barometer reading? It can be a really rich set of data around a timestamp.

That describes the real world. The richer the description, the more valuable your picture of the real world is. And IoT is fundamentally that. What are some of the unique characteristics of time series data in IoT and why is managing it oftentimes present a challenge? It's a very unique data type. One is because of how it's collected. Everything's got timestamp data.

which makes, if you can optimize around that, you get some benefits. Usually, you know, IoT, if you can imagine a broad-based system, whether it's a self-driving car or an energy system, could have thousands to millions of sensors collecting around the world. Your ingest levels are through the roof compared to most data types. And if you're connecting, you know, our base resolution is nanosecond resolution, and some customers actually use that. So you can start thinking about the ingest level associated with nanosecond resolution with

hundreds of thousands or millions of sensors, you start talking about, you know, billions, billions of points per second coming in. So the injustice. Two is you're often summarizing that data because you don't need to keep high resolution data around long periods of time. You're often summarizing it. So you're often downsampling and transforming it. And so knowing that if you build a database that's oriented towards doing that,

you're in a better position. Third is, as part of downsampling or changing, you're evicting data quite fast. Most databases aren't really good at evicting data. It's really a side process. It eats a lot of CPU. But time series databases have to be excellent at it because you're evicting data, you know, sometimes if your retention is short, almost as fast as you're collecting it.

And so you can imagine this kind of small factory of data going at a different speed and pace because it knows what it's actually collecting and how it's working with.

And then the other thing, the last thing about time series is cardinality. Cardinality is a description of the number of measurements or tags or fields associated with each timestamp. And cardinality can be explosive. So if you have highly descriptive data on a timestamp, databases get choked by cardinality. They start not performing. They start really, they start failing. They start failing unpleasantly. And so time series databases have to be designed to handle cardinality.

There's probably three or four things that are different. How do you all focus on kind of addressing those challenges that you just mentioned? One is you build it into the actual open source and the actual database. These capabilities, you build it in, you optimize it for it. And so our latest versions, which will probably be out by the time you post this, our discussion,

you know, has unlimited cardinality. It has super fast response time. What we as a time series database, what Influx has to be great at, have to be great at three things. That's our focus. This is our attention. We have to be amazing at ingestion. We just have to be able to pull that data in and we have to be able to be read right away. It can't be we're ingesting and then you can read it in five seconds. It has to be ingested and read immediately. Because if you're going to build an operational system around this data,

you have to be able to read a media. So one is we have to be able to ingest and read a media.

Two is we have to be able to organize that data so it can be handled and managed appropriately. So you can imagine the amount of data. So some of it should go to object storage, some of it should stay on disk, some of it should go on memory, some of it should be indexed. So it'd be super great if you take it in, you organize it. And the third, maybe even the most important is we have to be able to query. In some cases below 10 milliseconds for last values, below 50 milliseconds, below 100 milliseconds for longer things.

One second, because the idea is you're building an operational platform that is not only collecting the telemetry, but then is acting on it. And acting on it is a function of the queries. So if you query what's the state of this system at this point, let's act. That query has to be super fast. One example I could give you is, you know, we power Tesla Powerwalls. I have an app on my phone. I can update that app every second and it'll tell me how much power is being generated by my house and going to my Powerwall.

Those kind of queries have to happen very fast in order for that app to run correctly.

Absolutely. Yeah. And I'm sure there's people out there listening that struggle with managing time series data. If you were to give them maybe one piece of advice on kind of what's that, maybe that first step they should take on reevaluating their approach, is there something that comes to mind? It's about architecture and design. And so, you know, most people have started, they're comfortable with Postgres or they're comfortable with Mongo or they're comfortable with Xamarin. And so they build the thing they want and it's kind of operational and it works for them.

But as the system scales, this is where we get most of our customers. They begin to say, wow, this is, we're starting to run into either cardinality, performance, query performance limits. We need to look at it. It's not just us, but other time series vendors do a reasonable job. And they start to look at time series specific stuff. If I'm sitting talking to a group of,

you know, let's say IoT, industrial IoT or energy IoT. I'm saying start your process with a time-series database, right? It'll perform well at low volumes, but you'll start to see the real benefits as you start to really engage and mechanize your system. And so...

That's been my primary advice. And there are ways you run it in the cloud, you run it on prep, you can run it at the edge. Are there leading applications in IoT that you're seeing rely on time series data? Or is it basically across every type of application use case? Which ones are the most compelling to you? You know, so we think of it as primary operational applications. And this is actually an important discussion because it bridges into the role between operational time series and AI. When you build a system, let's say,

Let's use a standard IoT system, maybe a self-driving car. When you build that system, what you're trying to do is you're trying to go on a journey of intelligence. You're trying to say...

I've instrumented the car with, I'm making up a number here, a thousand sensors. I'm running it on the road for a week. I'm collecting all that data. I'm looking at it. I'm evaluating it. And then I'm correcting specific design capabilities of the car to reflect on that data. If I do that little loop a billion times, I might have a fully self-driving car.

handling all the situations that the car can encounter, all of that sort of stuff. What was the sun? What was the angle of the sun? What was the temperature of the road? What was the tire inflated? All that sort of stuff. I can run that loop and I get an increasingly intelligent and increasingly autonomous system, which is what all human design systems want to be. It's true of a factory floor where I'm manufacturing something. I keep perfecting using that telemetry data. Eventually, I want the system to self-permeate.

The rule of time series is lots of where the self-perfecting comes in is getting automated by AI.

And so I'm taking this time series data and I'm building training models based on it, right? I'm building real world training models, whether it's robotics or self-driving cars. And I'm typically doing that in kind of a lake house environment, in a deeply where I can take structured, unstructured data. I can put it together with the time series data and I can basically build a model about how I think the world's going to work. But then I have to operationalize all that.

And so the operation isn't just the collection of the telemetry, it's the actual taking. So I build these inferences and then I have to implement them. So when the car sees X, how quickly can it respond to X, right? When the factory floor sees some condition out of sync, how quickly can the factory floor adjust? That's where the operational database, the time series database comes in because it's going to query in real time. It's going to have the inference information from the model. It's going to query in real time. It's going to take action.

That's the automation that we're looking at. Most of what happens in time series today is people looking at dashboards or dashboards or triggers. That's about to change, right? That's going to change because these systems are going to be self-driving. They're going to be

growing in intelligence. And so the operational time series is separated from the analytical training and model stuff. Absolutely. That kind of alludes to my next question, which you've already kind of hit on some, which I'm sure would be the answer here, especially when talking about AI. With the exponential growth of sensor data that we're seeing as more applications get out into the world, are there other trends that you're seeing in how companies will collect, store, and analyze time series data just in general? I think that the most important trend is the relationship between the model building, the machine learning, and

and the organic time series data. That's becoming, you know, most everybody we talk to, all of our customers have figured out these architectures that are breaking out the operational data from the analytical data that drives the models and trying to figure out ways to integrate them. Most people are relatively early in their journey.

of integrating that. But the biggest trend is we see that requirement is emerging. Before, you know, if we were talking three or four years ago, the primary requirement is let's monitor this system and report. But now we're seeing like, okay, we can actually begin to do move steps towards autonomy, which is, you know, and so that's, I think, the most important trend. But, you know, obviously we're seeing that in the digital world too, in the form of LLMs, we're seeing it all over. But where it hits the real world

And then you know, because you've been covering this a while, is we're not getting less sensors. We're getting cheaper and more sensors, and we're sensorifying the physical world at an incredibly rapid pace. Are there any additional challenges that you foresee coming out of the fact that we're going to have more sensors and we're going to have more data being collected from the physical world as we kind of look out into the future? My techno-optimism, I'm sure there are externalities that will be problematic.

But overall, that ability to monitor the physical world more effectively, whether it's health care, whether it's climate, whether it's space, that's a very powerful and important thing. And I think there are probably going to be some negative externalities, but most of what I see is really positive. You know, particularly where we talked earlier before we got on about, you know, Southern California and the fires and climate changes.

the ability to monitor the physical world is gonna be really, really important. - Tell me a little bit about kind of your all's future outlook with what role will InfluxDB play in addressing kind of these challenges that we're talking about on here?

Yeah, I want to be humble facing it. We are a data platform, but all things start with data. So, you know, our mission is to be excellent at those three things I mentioned before, the ingest, the organization, the querying of that data so it's available. If AI and machine learning and the increasing, you know, world of autonomous systems is

is happening, then where are the picks and shovels of that? If AI is the refinery of all the information, then we're pulling the oil and gas out of the ground. Whatever it is, we're providing the raw material to manufacture that intelligence. And then we're providing the ability to act on it.

That's kind of how we organize the world. So we built, you know, this next version of the product is built with those things in mind. It's deeply integrated with lake house environments. Data has always been talked about as how important that is in our daily life and every business's life. But as we start to really pull more and more data from the physical world and AI grows and its capabilities and functions and to solve business problems, a company like yours is helping make that easier. And I think being able to do that's a very big role in the future of

of where we're going. And I think that's an important thing to highlight. Where can our listeners learn more about what you all have going on, what you're doing, what applications you support, get in touch, follow up, all that kind of good stuff. It's the classic. It's at influxdata.com. There's also a bunch of GitHub repos. We have huge contributions to the open source, Apache, Data Fusion, Flight, SQL, all that sort of stuff. There's our site, there's blog posts. It's all the stuff you need to know.

Well, this has been a great conversation, Evan. I really appreciate your time. What I'd love to do is obviously we'll get this out, but probably have you back later on this year just to kind of talk about the growth of the space and what's happening as AI grows, as IoT grows, you know, just more data out there. I would love to kind of keep understanding what y'all's take is from your perspective.

A hundred percent. I really enjoyed it, Ryan. First of all, it's great to meet you and I really enjoyed the conversation and congrats on your success. Thank you. Yeah, we've been having a good time doing it and hopefully our audience keeps finding value. And obviously the more experts like you we can spotlight, the more value hopefully you can provide. So thanks again for your time. All right, man. Take care.