We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Real-Time Forecasting Faceoff: Time Series vs. DNNs // Josh Xi // #305

2025/4/11

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Josh Xi

Topics

Josh Xi: 我在Lyft的市场团队工作，负责构建预测模型来平衡供需。我们使用实时预测模型，对每个地理哈希单元（约1平方英里）进行未来5分钟到1小时的预测。这需要处理数百万个数据点，并整合外部数据源，如事件和天气数据，但这些数据整合起来具有挑战性。我们发现，传统的基于时间序列的预测方法（如自回归模型）在实际应用中表现优于深度神经网络 (DNN)。时间序列模型更易于解释，更容易进行人工干预调整，并且在处理实时、高频数据时更有效率。我们每分钟对模型进行再训练，以保持高精度，尽管这会带来高昂的计算成本。我们也尝试了时间序列基础模型，在某些用例中效果良好，但在实时、高粒度预测方面不如传统的自回归模型。在模型训练和服务方面，自回归模型的成本远低于DNN，并且更容易进行在线再训练，从而更好地适应市场动态。我们通过在线再训练、集成多个模型以及使用启发式规则进行偏差校正来提高预测精度。我们计划将区域级离线预测模型与实时预测模型结合，以提高精度并更好地处理特殊事件。处理时空数据的模型需要考虑空间相关性，这增加了模型的复杂性和训练难度。由于人类行为的复杂性和变化性，准确预测出行需求具有挑战性。我们主要关注区域层面的异常值分析，并通过自动化的模型调整和偏差校正来处理预测误差。 Demetrios: (主要以提问和引导讨论为主，没有形成具体的核心论点)

Deep Dive

Chapters

This chapter introduces Lyft's real-time forecasting system, focusing on the challenges of balancing supply and demand in a dynamic market. The discussion highlights the use of geohashes for granular forecasting and the importance of incorporating external data sources.

Lyft uses real-time forecasting to manage supply and demand.
Forecasting is done at the geohash level (e.g., GeoHash 6, approximately one square mile).
External data sources (events, weather) are incorporated but pose challenges.

Shownotes Transcript

Translations:

中文

Josh XI, I've lived for a little bit over five years, a staff data scientist. How do I want to take my coffee? Latte is good. I like it. Creamy milk. I recently found those 6% fat milk that's super creamy. Almost like half-half, but it tastes just so good. Well, hello, hello, everyone.

We are back for another MLOps Community Podcast. I'm your host, Demetrios. And today we talk about time series, machine learning models versus deep neural networks. And Josh has some strong opinions on them, but we would love to hear if you also are opinionated about one or the other. He just told me as soon as we stopped recording that he would love to get comments in case he's missing something. And he wants to know how...

and what folks are doing if they are doing it differently than him. Let's get into this conversation. First off, tell me what you're working on because I find it fascinating and it is deep, deep in the weeds and I love it. So I want to know everything about it.

So I'm in this big or core marketplace and that's basically the team manage the supply balance of the lifted marketplace. So essentially we are a platform team. So you have the demand side, the supply side, and you want to achieve the market balance between the two.

So there's different levers you can do that, like pricing. Typically, if the demand is too high, you can increase the price to suppress the demand. Or you can offer driver incentives to sort of acquire more drivers. So both can happen in either real time or long term. So long term pricing is more like coupons. You can send out coupons so you can attract more riders onto the platform. But all of these models or levers, they take lots of the signals that goes into them.

So like the most basic one will be forecasted demand or forecasted supply. So my team is called Market Signal. So that basically provide all these key signals or features that goes into these models. So you need to know what's happening now, but you also need to know what's going to happen in the future. So there's lots of forecasting for those levers. And I think I've heard something from folks who,

left lift about just the sheer amount of models that you're running at any given time. And maybe it's not that

it's different models. It's just different models in different parts of the world or parts of the US. And so it's very similar models, but kind of different, I guess, same, same, but different. Yeah, so it's actually a huge amount of models. So basically Lyft is very data-driven. So all of these like levers there

running in different regions at the same time. So my team, one set of problem we're facing is basically, we call them real-time forecasting. So basically, for every geohash, for those who are not aware of geohash, it's like a typically standard zone definition, kind of like zip code, but they have different levels. We're usually looking at geohash 6,

which is like a one by one mile-ish like a cell in a location. So every city, you typically have like a couple thousand, sometimes to even like 10,000 geohashes. And so basically for every geohash, we need to generate a forecast in real time for the next, say, five minutes up to an hour. So you will probably have millions of data points to forecast at the same time.

Wow. Okay, so each geohash, which makes sense because...

You want something that is accurate down to the block. Yeah. So typically if you're thinking about like pricing or like driver incentive, like essentially you're trying to reallocate drivers in a way or attract them to a certain busy area. So it needs to have certain like granularity. So if you're just forecast the whole region, it doesn't really help the drivers to know like where they should go. So, and yeah,

in order to have the actual action you can take to make some sort of impact, you have to sort of have that granularity. So for that reason, we typically start from GeoHash 6 level, but depends on use cases. Some of them we might aggregate them to a high level to sort of fit those needs because

The bigger the area they are, the more density you have and potentially the more accuracy you can achieve. And how often are you using features or data outside of the platform that is something like there's the Super Bowl going on and there's probably going to be a lot of demand around the stadium when the game is about to start or from two hours before the game and then two hours after the game?

That's a really good question, actually. So we try our best to sort of getting external sources. Events is definitely one of them. So we do working with some data contractor to get some input. So events, L ports, information like flight landing, departure, anything that we can, we think that could be helpful. But in reality, there's lots of challenges just to ingest those data. So

Taking events as example, like we can easily, typically we have information about like the event start time, like which is the ticket time when people normally will go to the event. But the tricky part is the event end time. Like imagine you're sitting in the football game and it's like,

two minutes left on the clock. And that two minutes can sometimes mean just two minutes or sometimes that can mean half an hour if there's overtime or any sort of uncertainty. It depends on how close the current score is between the two teams, right? So there's uncertainty for that part. For all the international listeners, that is when American football, two minutes is not actually two minutes for...

football around the rest of the globe soccer right yeah I guess you have a little bit of leeway at the end of the game but not like in American football but anyway sorry I didn't mean to cut you off I just wanted to oh yeah that's totally fine yeah soccer but yeah for soccer sometimes it might go into overtime so that might be means another half an hour or longer so yeah so we've been having trouble trying to figure out what's the right like event end time

that's definitely a big challenge. Other thing like we mentioned about weather data. So we also have been doing lots of sort of analysis on how weather impact the demand. Essentially, well, I think most people, their first instinct is like bad weather, more demand, which is maybe generally true. But when we are really looking at the data, like looking at

creating all these precipitation features, temperature features, or do all kinds of combination on top of them and trying to find correlation with our demand data. It's actually lower than what we expect. No way. One of the

conjecture here is actually sometimes not really about what the temperature is or how much rain you have. It's more about like people's expectation of what's going to happen and what really happened. So the forecast says, okay, it's going to rain and it's pretty accurate. Everybody is more likely mentally prepared or they already have some way to sort of prepare for the big rain, heavy rain. But

If something caught them off guard, everyone was like, OK, let me call a taxi or call Uber or Lyft. So I think that makes a huge difference. And also snow. One of the most interesting findings that was like back in 2014, that's when I actually first started to look into taxi data versus the weather.

And like everyone's like, okay, snow is going to make huge difference on taxi demand. So we were looking in the first half of the winter, we were looking at the data, we saw some correlation, but somehow second half of the amount of snow has really poor correlation with the taxi demand. And you know what happened?

Well, that year was like a super bad winter. And in the second half of the season, most of the cities start running out of like their salt to clean up the streets.

So they don't clean the streets, nobody can travel, people just give up travel or they just start, decide to stay at home. So there's really not much correlation between snow precipitation versus like how much demand you will have. So those are external information. Usually it's very hard to put into the model. So yeah, that will definitely, those kind of factors definitely will affect how we build our forecasting models or models

if like certain model were going to work better than others. I thought you were going to say when it rains, people just stay home. So the demand is lower, which I can imagine a certain subset of folks stay home, but it is more like what you were saying. If it says it's about to rain, then...

I am more mentally prepared to go out in my rain jacket or with my umbrella. But if it is supposed to be sunny and then it starts raining, it's like, oh no, I didn't bring my umbrella. I didn't bring the... I'm not prepared for this. I need to get home or get to wherever I need to be quick. Yeah, that's totally true. And also I wonder like if pandemic changed people's like travel behavior. So like...

Before that, everybody's sort of mandatory to go to office. So even if it's rain, they're more or less likely to sort of, okay, it's rain, I still need to go to office. These days, they're, okay, if it's rain, I'm in a bad mood. I only need to go into office two days a week. Let me pick a different day or something. Yeah, completely. It changes how likely we are to do something.

When we talk about this data acquisition, though, that just seems like a mess, man. It seems like something that is so difficult, A, to get the right data and then B, to clean it or to transform it and then create insights from it, especially for this external data. I imagine you have some kind of pipelines set up and.

You're probably constantly tweaking them or you're doing, you're playing around with it to see, can we create better features from this or something that gives us more insight that we can feed to the model? Am I off base there when I say that? Yeah, yeah, that's definitely something we do a lot. Internally, we have teams like looking at the event data to curate them, um,

But in the end, it's like so much like labor to just get them right. We focus a lot on the top events and that's the one that actually works well in some of our models. But for most of them, it doesn't. So that sort of kind of lead to what I want to sort of thinking about in my head to talk today is like why traditional like time series broadcasting works.

better in reality. Like, a lot of people's concepts are like, okay, the model is going to learn the future on its own so it can plug in as many data as it wants. So let the model learn what's important and what's not. That usually means lots of training data you will need and lots of computation power you need.

versus time series type model like autoregression, ARIMA is trying to predict basically the future by mostly focused or almost exclusively focusing on the history because at least even the market is not stable. They are not sort of

keep having the same trend but within a short amount of time things tend to repeat on its own. Time series models looking at what happened yesterday the day before same time last week it actually already covered lots of what's going to happen in your future. So this type of models is something we actually being sort of has been our top choice so far and because they're

sort of very interpretable because your future is some sort of a weighting average of your history. So based on how much you believe the future is going to change relatively to the history, you can relatively easily apply any human intervention to do some sort of adjustment.

say based on what we know, okay, last week there's no Super Bowl, but you know, next week can, there will be a Super Bowl. So you can actually easily to adjust or, um,

demand in that case. But with DN, I guess in the case of Super Bowl, we can do the same thing if you know DN has never trained on Super Bowl. So you can make adjustment too, but there's also all other kinds of local events that's happening. And some of them might be much harder to make adjustments and you're not sure if DN can capture that or not when feeding those models. So you have to versus

auto regression you know you sort of know for sure it's like okay if i don't because typically like auto regression does not it's just history of all values so it does not really explicitly put any uh events information there so you can just assume there's no events like in my model so you can easily add on certain things

there's more advanced models like decomposition models. You can decompose trends, seasonality, and so you sort of know like, okay, which part I need to adjust if I believe there's a

local events happening, it's a spike. It's not in the seasonality. It's not in the trend. So you can take those two components, add on your spike. So that make it a bit much easier to adjust your model, adjust your forecast and make it a little bit more accurate.

And it almost seems like with those spikes, you can also leverage the history to say, okay, this looks like a spike that we had maybe two weeks ago or last week. So we'll trend in that direction. What I'm really interested in, there's kind of two big questions that I have with DNNs. And then also, I've heard a lot about time series foundational models.

And so the first off, I guess the simple question is, have you played around with time series foundational models? Have you had any success with them? Because I've

generally heard from people in the community that they are not very valuable and they haven't had success with them. So actually, I haven't, I played, well, I actually played a little bit with TimeGVT. I know my coworkers also have played a lot with some of them. So I actually learned my firsthand information from them, not myself. But my learnings based on my conversation with them

is it's accurate when you're looking at some of our use cases because we have actually have forecasting for both real-time spatial temporal models like what I described earlier that like super granular in that sense and because every minute you're looking at forecast next five minutes ten minutes up to an hour and it's all these thousands of cells so there's lots of variances all kind of things going on and

you also need it to be fast because every minute you just keep refreshing forecasts. Versus there's another type of forecasting, which is more like offline or short-term, near-term. Every company or team might use different names, but they are looking more at, like, say, a regional, like the total host San Francisco or break up into a few sub-regions and looking at, like, hourly, daily signal values for the upcoming week or two. So...

Most of the learning actually is it works well for those type of models works well for the later use cases. The first use cases haven't been that well so far. Right here.

I think that's mostly talking about accuracy, but I think beyond accuracy, another really big issue is the real-time cadence. It's just happening so fast, and those type of models are a little bit too big to sort of spin up quickly and just keep emitting features. And I'm not sure if there's already a sort of eng infra designed to leverage those models for

use cases like that. Yeah, especially at that latency that you need, it makes a ton of sense. And that was the question that I wanted to talk about with the DNNs is what kind of infrastructure is needed to serve those types of models or trying to utilize those models in your use cases? And is that not adding extra value

headache to the problem when later you find out like, damn, these don't even perform that much better. Why are we breaking our back to support the DNNs? It's just because maybe some folks want to have it on their resume. Is it like, what is it about? I know it's not that you all probably wanted to like thoroughly test it. But at the end of the day, if it's much harder to support and they're not giving you that much of a lift,

And I'm assuming that it's much harder to support. I would love to hear from you what, in reality, the difference is between supporting these two models. Let's compare two types of models, which actually is something we've been testing a lot on. And one is sort of auto regression, which I mentioned earlier. Your future is a weighting average of your history and your

The other one, let's see, and this is, there's this very famous paper from Microsoft Lab focusing on spatial temporal DNs. There's also other models like LTSM used for time series forecasting too.

Although most LTSM papers focus on just one single time series, in our case we're talking about many, like thousands of cells in a spatial, like in a city or something. So you also have to capture the spatial correlation there, so make the model a bit more complicated. So the Microsoft paper is one of the most well-studied or used paper in lots of the application or research.

So from training, I think the two make a huge difference. First is auto-regression. Do not use GPU, just CPU. A single CPU, if you want to do any backtesting, training, it's best. And spatial DN, you use GPU. And so cost-wise, it can be at least 100 times difference. That's from our first-hand experience. Now serving them...

So if you want to just like, uh, server, like, uh, DN model, we do not necessarily need to use GPU because your model weights already pre-calculated. So you can just, uh, spin up machine and just, uh,

take the weights, reconstruct your model, and take your inputs. A little bit more computation, but I wouldn't say it costs that much extra because thinking about it, typically in our case, we do forecasts like the cadence, like every minute. So it's every minute you try and predict for the next, say, 30 minutes. So as long as you can finish your whole forecast in like 30 seconds, it should be good enough to go.

So I think just a general machine loading the model weights, reconstructing model, take your input through the calculation, 30 seconds is sufficient. Time series models, much less weights. So of course, much faster. So student, definitely no issue. So there's why there's not much sort of difference from cost perspective on the inference part. Yeah.

How about on the data side when you're training it? What kind of data you need for each of these? Because I feel like you need a lot of features for the auto regression, but maybe you don't need the features or you don't need it as clearly in the deep neural network.

Oh yeah, that's a good point. That's another difference on the training site is the training data. So for ARIMA, we can just take a couple weeks of data in our experience and view features, which is future values versus historical values. If you're trying to use history for saying things

time of week from one to two, three weeks ago, then you also need a little bit longer history. For DNN, it's doable with a small amount of data. Then the question is like the weights might not be as good in the beginning because you have more weights. You are more likely to run into an under-training situation. So ideally, you want to take much longer history.

And auto regression, you can even start without actually sort of any weights. You can just assuming a moving average. It was like 20% on last three minutes and 20% same time of day from yesterday, 20% from the day, sorry, the week before something, just making up your own prior of how the weighted average will be. What you can do is,

that actually is online, back to the online difference, although there's no cost, there's actually a huge sort of advantage with auto regression or classic models is you can refit them. Those are linear models. You can refit them at a super fast speed because it's just a few weights, right? There's probably...

20, 30 historical features. You take average on so you can just, you have a new observation from the past five minutes or 10 minutes, half an hour, which depends on what sort of retraining cadence you want to do. And you can just plug in to this linear model and

and just run a refit, adjust your weight. So you can do online learning, make the model sort of adaptive to whatever it's changing in the real-time marketplace. So any sort of changes, like a spike's picking up, you can use refit or model to quickly put more weights on, maybe your recent values to catch up the spikes.

With DN, it's a much larger model. So you can do retraining too, right? 'Cause that's how we train the model anyway. It's like a batch training or something. You can always take a small sample data and retrain the model.

But the problem comes in to the cost because retraining in the DN model is actually very expensive too. So similar to the training case, anything you want to do retraining online, trying to adapt to any changing situations happening in the marketplace, that will incur much higher cost. So that's actually another sort of shortcoming of DN in practice.

Because that cost, you can probably, you can only do less often retraining, which also means you are sacrificing your accuracy with less retrain. So yeah, and it's similar experience that can also means 10 times or 100 times differences in your training cost.

Well, especially if, like you're saying, you're retraining continuously and constantly just retraining when you're learning new things about the world. So I imagine, like, how often are you retraining? Is that pipeline just set up to be triggered every couple seconds? Are we talking every couple hours, every day?

Every minute right now. Like it's just a choice because the machine is basically standby for our models the whole time. So I know like inside the marketplace team, my team has one of those expensive teams like using machine powers. Yeah. But it's worth it, I guess, if you're continuously getting that accuracy, just that point.

3% accuracy lift is gaining you a whole lot of revenue. Yeah. So it makes sense that you want it to be as...

accurate as possible. Yeah, we do some sort of a cost benefit trade-off, more or less. So how much we've been running these machines, how much they cost versus like the estimated impact of the accuracy. So those are trade-off in our daily work we need to look at too. Yeah. And so I imagine it's got to be automatically re-triggered for this retraining. How do you then go and add

add extra value? Is it that you introduce a completely new model and then that gets thrown into this retraining loop? Or is it that you just are continuously updating the model and adding new data sources? Like where do you plug in to make sure that whatever model is out there is performing the best possible?

With online autoregression models, it has been working well for most of the use cases, like capturing that spike and making it adjust faster to the marketplace. The challenges so far we have learned is actually probably mostly around events. Our models do capture that, like when events start to increase demand,

Our model will capture that, but there's a little bit delay in our model here. So depends on how we set the learning rate in our model relatively to the historical data. It's basically you have an old weight, you have learning rate. Based on what you observe, it's battling, okay, should I put more weights on my recent spike or should I just trust more in my history? So

it has been very hard to tune in those parameters because every region is different. So every region, you typically have their own models, but we haven't really had a good approach to finding that perfect parameter tuning for each model. Ideally, you can just keep doing everything offline, keep testing, and

running lots of machines that cost lots of money. So we have some sweet range. Typically this parameters works well. So we use them for all the models we are running. So on top of that, what we can do is...

you can always observe how the model performed in the last few minutes, right? Once we see the real data, what's forecasted and what the error is. So for external adjustment,

Instead of using sort of, okay, you can't, instead of like having people looking at the data and making adjustments or say there's a football here. So let me add more demand. Another approach actually just looking at the forecast errors and see how constant or the error has been. And you can add some sort of bias correction or some sort of adjustment by setting up some

some sort of heuristic rules to, okay, if it's, say, last 10 minutes, it's constantly on the forecast, let's bump up the demand. So the adjustment can be, like, using ratio based on, like, how much it was, like, if the forecast has been constantly 90%-ish of what the actual is, you can sort of divide it by 0.9, or if the forecast

So but if you're thinking about like actual human intervention, so far we haven't really tested that out because the type of problems we are facing is like spatially like huge, right? You have thousands of cells in a region. It's really hard to just looking at each one of them. So everything is sort of automated based on more like just accuracy versus... Another sort of direction we're looking to is actually more ensemble models. So

Auto regression is just one of the model. It's easy to explain. So that's what we pick. But there's other sort of linear in the linear model area. You can run refit quickly or you can also like basically changing the model set up based on your assumption of how data is spatially distributed. You can basically apply more new models.

to sort of running them at the same time. As long as they're relatively small, it doesn't really cost that much. And you have different models running at the same time and you have different performances. And then based on the recent performances, you put more weight on the one that gives you better forecast. So that's another approach to sort of bring up the accuracy.

So it's not like you're taking the average of the five models that you're running. It is that you're taking the model that is being the most accurate. Yeah, you can also weight them. So the best one gives 80% the accuracy.

weight on the forecast and the second best give them 20%. So it's like a weighted average of like the two best models. That's another sort of approach we are taking. So, so far we, I will say my team have been focusing mostly on using those approach to get better accuracy.

Yeah, because just human dimension, so we only have a few data scientists on the team. It's hard to just like, okay, let's look at what's happening here today, what's happening there tomorrow. That's a little bit like beyond what we can do right now. Yeah, and it does seem that the idea of the more quote-unquote simple models versus the deep neural networks

are easier to interpret. So actually another way we can actually do, which is actually sort of what we're hoping to achieve this year is to online models, like very granular, but we also, I also mentioned about like another type of forecasting, which is people running at the regional level forecasting or sub-regional, like instead of looking at the cells, they look at like maybe

10 sub-regions in the city and they're looking at hourly forecast for the next few days

And so from those models, you can get the trend, the seasonality. And they also, typically those models also, they have more people to look at the impact of the event. So they will do some sort of a menu adjustment based on a knowledge of, okay, around event time, this is how much spike you will see. And so we are sort of thinking, taking that as output and then feed into the

actually the last layer of our real-time forecasting. And sort of once we forecast, right, we have, say,

We are looking at a cell level. And for this event happening in the sub-region, we can aggregate all the cells and see what's the total demand. We forecast it and see how far that is from the offline forecast. And then you can take that difference and do a multiplier or something. And that's how we can do adjustment. Yeah, I think it's more end challenge because we have to have

the real-time system talking to the offline system. But the concept is pretty straightforward. Taking your output of the real-time forecasting, compare that to the offline forecasting with human intervention, and just check the differences between the two. And so is the idea there that you know...

what it should be because of the seasonality or the, you know what it should be because last year at this time it was this and you're looking at a, you're opening the aperture and by opening the aperture, it's going to give you

another feature that you can feed into the model? Yeah, because auto regression typically look at the loss, what happens last week and sort of capturing the average sort of seasonality based on last week or two weeks of data. So if you offline, offline has more longer, they look at longer histories and they do with events more explicitly.

So if they believe the event's going to bump up the demand and they already

like did their adjustments and their final outcome. So our assumption here is they have a better capture of those like special events or seasonality. Because they've seen it before last year or last quarter or whatever. And I get it. Yeah. And offline models, typically they do training looking at two years of data because it's typically different.

a few time series and over like

just a few sub-regions. So it's not really that much data to sort of train using a longer history. So it's totally doable. And also for their use cases, typically they run on a weekly or daily basis. They run forecast once every week, looking at the daily or hourly forecast for the upcoming week or two. So they don't have to just like in real time keep generating those forecasts. So cost to them is a less concern.

So they have more, their model tend to be a little bit more complicated dealing with seasonality, external events, and they also look at a regional level. So, and they can also sort of pick out the major events in the region and sort of to do adjust on top of that. So they definitely have more, a little bit easier in terms of

making adjustments, beating more complicated models. Because they've seen it. Yeah, that makes sense. It's like it's not so new and it's not like just out of left field and you have that data. I didn't realize that you get two years of data. Yeah, or even more if you want to. Yeah, so if there's a local sports team and they have during their football season, they have a game every Sunday.

And then you know, all right, when it is football season, this is more or less what the demand is going to be like because we've seen it for the last two or more years. Yeah. And for those and it's sort of more or less embedded in the feature, like the time series already. And so sometimes it's more like the thing that happened only like a few times a year. And that's very hard for the time series to pick up.

So in that case, likely human intervention will be needed. So I think for whoever working on long-term forecasting, the last step, they focus a lot on those big but only occur a couple times a year events. Yeah, I can see that. So the other thing that I wanted to ask is when you are testing out these new models, do you do like...

Side by side analysis. Are you doing some kind of champion challenger release? Is it? I think there was something else that I had heard of, but I can't remember how it worked where you're just giving it dummy data and you're simulating what it would predict and you're seeing if it would be more accurate than what you have live.

Yeah, for time series forecasting, one of the most common techniques is called backtesting. I wonder if that's what you were thinking. So basically, you can pretend...

uh this is you were you were sort of sitting in the past at some time like and then at that time on you run a model training and then you pretend this is okay what the model's weight's going to be and then you start to like following the time like a simulation moving forward um and then generate forecasts and because actually everything has already happened um

in real life. So you can actually see what the model predicted versus what the history actually was observed. And you can calculate the bias or any sort of metrics, performance metrics for your accuracy. So that's how we sort of compare models. Or every time we're trying to make tweak into our models or evaluate new models, backtesting is what we do to sort of

help us decide, okay, if this model actually will perform better than another one. How big of a factor is it that these models are also having a geospatial aspect to them? Simple answer. It's very difficult. So, yeah, so for DN, right, if it's single time series, you do long short-term memory and you don't have to worry about how data correlated spatially.

So model is simpler, less model weights, training faster, forecast faster. For spatial models, now another thing people actually need to do is to sort of dealing with the correlation between different spaces because you can't treat every cell as its own model. But then if there's 3,000 cells, you have 3,000 models who do not do that. So

more practical solution is sort of treating the whole cells at, sorry, the whole region as a sort of different dimension in a single input. So the spatial temporal paper from Microsoft lab that I mentioned earlier, what they do is actually they use some convolutional neural network approach. So, so basically every time snapshot is,

you have, say, what your demand is, what your supply is, and at different locations. And you can consider that as an image. So your latitude is your, like, X axis in your image. Your longitude is the Y axis in your image. Every cell is some sort of value on

your image. That's how image processing is basically handling their data. So we can use the same approach. And in CNN, you can apply sets of kernels to sort of basic learning the spatial correlation, like three by three cells or five by five cells to get their

across the space and applying different weights so you can learn like okay maybe when this guy is going up that guy is going up like

when this location have a big demand, the nearby location have a big demand, or opposite. So it's learning all these spatial correlations. So you end up having model weights in your neural network to capture those informations. But to learn those weights well, which means you need more training data set. And although one thing we learned is

those correlations can change over time. So you're trying to learn more about the correlation, but then if last year's correlation is different from this year's correlation, it does not do well. You sort of can only learn a limited amount of correlation. And this could be like some kind of construction that's happening for half of the year. And so you're creating a correlation that is only because of the construction, right? Yeah.

And also maybe, you know, some offices, people relocate to another building or pandemic. That's a huge hit, right? People all of a sudden travel differently and people moving constantly from location to location. Economy, macro economy can affect like a strip mall sort of running out of business and changing people out of control. Although let's say those are less issue, but

You can only use a limited amount of data in that sense. I could see how the creating patterns for the spatial data

For example, in my head, I play it out as if there is a stadium and you have an event happening at that stadium, it's the north and the south side that people are going to be entering. So you would expect that there's some correlation there that when one side is busy, there's also the other side busy. Yeah, that can happen sometimes.

that actually remind me very interesting learning we also learned is venues they operate differently time from time somebody jumping okay last month

We did this with these two like entrance, right? South, north. So that's how you're directing people to depart from the venue. Then it was like, okay, that was horrible. That was a mistake. Let's change. Like let's make east, south, plus some other location.

And one month, they're like, let's do a bus shuttle, like, so to move people from this location to another. Next month, you know, shuttle service is not reliable. People are still complaining. So the venues interest, they keep changing too. They also have a lot at airports, like...

Like sometimes they use lot A for like Uber Lyft pickup. Sometimes they use lot B and that's changing over time too. So that keeps messing up our data unless we know exactly how the venues or the airports being operating. But sometimes it's hard because there's always an information delay. Which also makes sense if you think about a football game versus a Taylor Swift concert.

and how the crowds of these events operate and then the times of these events. And so maybe you're thinking, okay, generally when there's an event, if you just have it as event at event,

stadium and you don't know more information about that is it a football game or is it a concert then you're going to get burnt and so you probably have to have more granularity onto what type of event is it yeah and yeah that that's definitely so true and sometimes it's hard and also like

The same singer, sometimes you might attract different crowds. Sorry, the same venue, but by different singers. They might try to attract different crowds. And certain crowds might be more leaning towards riding an Uber, Lyft. Certain crowds might prefer driving themselves. So we also noticed that sometimes it's the same venue, same time of the week, but somehow they have very different demands on us.

That's where your forecast model, I'm guessing it just gets blown out of the water. And so that's what you're talking about, where you need to kind of look at the

the misses or the negatives instead of looking at what it is doing correctly? Yeah, that's like refitting or bias adjustment is so important because we know we can't model them correctly. And so the best we can do is learning from our mistakes. Okay, last five minutes, we only forecast by this much. So let's do a refit of model or let's apply some bias correction.

because sometimes we look at the errors like we don't even know why we see this big bias in our forecasts it's it's just mind-blowing yeah so yeah there's like in reality it's like so like simpler model interesting make it a little bit easier because you can apply adjustments easier and faster versus complicated model dns it's um it's theoretically very interesting but uh

If you're dealing with, especially when it comes to data related to human behavior, that tends to come alluded with many other unknown factors. So it's very hard to really forecast them well.

versus I would say like if you're dealing with image processing or language, those are more structured, right? People talk following certain grammar. You have some variation, but then it does not really fall too much outside the box. So that's why it's more predictable because it's not so outside the box. So it's probably easier to train them well. And also you have so much data like

like language. Oh, I think when a language changed, like totally will probably take thousands a year, right? You could all of a sudden start talking differently using different idea concepts. At least for the last five, 10 years, I wouldn't say we will talk that much differently. But when it comes to travel, sort of the human behavior is like, oops, five years ago, it's a totally different world. You can't use the data to train your model anymore. Yeah.

And the... You say that, but you obviously have not spoken to many Gen Z people, I guess. You don't...

That's true, yeah. No, but if you compare Shakespearean language to our language, yeah, that is a monumental shift, but that also took hundreds of years or a thousand years. I can't remember how long ago that was. And so if you're looking at this, there was something that I wanted to ask you, though, about how often you're trying to root cause

root cause these anomalies that will come through. Like you said, sometimes you just don't know why was there so much bias in this prediction? Why did we not get it right? How much of your day goes into figuring out why you didn't get it right versus just

business as usual? Yeah, so it depends on the use cases. So I think my team, since we are looking at cell level, like millions of data points, we don't really have that much capacity to really look into that level. So what we typically do is we have certain metrics at more like regional level, like aggregate all cells together, look at the forecasts versus actual data.

So we have certain metrics we trend. If it's like the values is too far off or if there's some sort of drift in the trend, a set of sort of metrics we monitor. If they trigger a certain alarm, that's the time we will spend effort to look into that.

Other time is typically because all of these actually goes into downstream models like pricing, right? Or driver incentives. So they have their performance metrics on their side too. Typically, they're meeting their sort of certain pricing targets. If their price somehow their model just start to spitting out like

price that's right outside their bounds or something, they will look into that and see if that's triggering by forecast or other factors, if they do believe it's something on the forecasting side, and they will come back to us and we will sort of look into that. So that's actually how things are sort of set up for the real-time side due to the granularity, I think, challenge.

Yeah, but for offline forecasting, that's not my thing, but there's other cases where they do just regional level hourly value for the next, say, one week or two. They have people sitting down, review like region by region. So that's actually a more like frequent cadence on their side. Yeah. And

But then they typically change.

are very sensitive to the differences because that's actually their downstream typically sort of decides the company to decide, okay, how much money they want to spend next week. Because if there's a big gap in incentives or, sorry, if there's a big gap between supply and demand, they need to decide, okay, how much money they need to spend to acquire drivers or acquire riders. And those are like a big chunk of money.

Yeah, there's big decisions being made on that. So you want to make sure those decisions are correct. Yeah, and also I think it's technically, it's also more feasible. Typically, you can look at top regions or you can look at regions that you know there's going to be big events going on. So we can easily check, okay, if a forecast during that time at that location is accurate. I guess we can do the same thing at cell level if we want to say, okay, this is the event. But then our forecast actually changes

happens every minute for next 30 minutes or an hour-ish. So the only time we will see that forecast is like 30 minutes before that. And so I don't know if that's feasible if the event happens at midnight. I don't know if we're going to wait to like midnight and

quickly evaluate the performance so everything is automated we are more looking at okay as the forecast happening that we observe the actual if the difference is bad or not if it's bad then let's do adjustments in the model like asap so that's a sort of very different concept

Real-Time Forecasting Faceoff: Time Series vs. DNNs // Josh Xi // #305 53:41 Share

MLOps.community

Deep Dive

Shownotes Transcript

Real-Time Forecasting Faceoff: Time Series vs. DNNs // Josh Xi // #305