We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Kubernetes, AI Gateways, and the Future of MLOps // Alexa Griffith // #294

2025/3/7

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Alexa Griffith

Demetrios

Topics

Alexa Griffith: 我在软件工程领域的职业发展受益于选择具有实际商业价值的问题，这让我能够参与许多开源项目，例如Envoy AI Gateway和KServe。在早期使用Airflow和Kubernetes的经历中，我发现Airflow经常崩溃，无法处理大量的并发工作负载。随着时间的推移，出现了许多新的工作流程工具，例如Kubeflow和Argo Workflows。在AI/ML领域，基础设施工程师和AI/ML工程师在工具和技术偏好方面存在差异，为了更好地服务AI/ML工程师，我们努力将Kubernetes的底层细节抽象化。最近，我在KubeCon上介绍了Envoy AI Gateway这个新的开源项目，它是在Envoy Gateway的基础上构建的，增加了针对大型语言模型的特性，例如基于令牌的速率限制。Knative Serving是一个帮助运行无服务器应用程序的开源工具，它构建在Kubernetes之上，KServe是一个简化AI模型运行的抽象层，它提供易于使用的配置和统一的API。Envoy AI Gateway也提供统一的API，可以访问不同云平台和本地部署的模型。大型语言模型带来了新的挑战，例如模型大小和冷启动时间，KServe正在努力解决这些问题，例如模型缓存。Celery是我对开源的首次贡献，它与Airflow有关。如果真的想参与开源项目，我建议从一些简单的任务开始，例如good first issue，并积极参与社区活动。在工作中参与开源项目，并将其与实际业务价值联系起来，是最好的学习方式。在评估项目价值时，要考虑其用户数量、盈利能力和实际用途，避免在无用功能上浪费资源。与超级用户建立联系，可以帮助发现痛点并改进产品。 Demetrios: Alexa在职业生涯中受益于选择具有实际商业价值的问题，这让她能够参与许多开源项目。我很好奇Alexa在使用Airflow和Kubernetes时遇到的最大难题是什么。我很好奇Alexa在使用Airflow和Kubernetes时遇到的最大难题是什么。在AI/ML领域，基础设施工程师和AI/ML工程师在工具和技术偏好方面存在差异。Envoy AI Gateway可以作为抽象层，连接不同的模型端点。大型语言模型带来了新的挑战，例如模型大小和冷启动时间。你如何看待传统机器学习用例和新的LLM AI用例之间的差异以及它们在平台中的融合？

Deep Dive

Shownotes Transcript

Translations:

中文

My name's Alexa Griffith. My title is I'm a senior software engineer at Bloomberg. I love whole milk, so I just do regular coffee and milk. I'm a savage. I'm weird. Welcome back to the MLOps Community Podcast. I am your host, Demetrios. And today, there was one gigantic piece of this conversation that I cannot stop thinking about, and that was when Alexa mentioned

Useful it has been for her to have mentors and be on teams that could choose problems with real business value and how that has translated for her in her career to working on a whole lot of open source projects because they were able to properly champion for the need for

to work on these open source projects and tie it back to how it would help the companies that she has been at. I love hearing that. I loved talking with Alexa. Let's get right into it. And for those of you that are listening in on good old Spotify or podcast land, we've got a song for you today, and that is...

Keeping in theme with Japan, because Alexa just got back from there, I want to play you this song by Air. It is called Lost in Kyoto. No, sorry, it is called Alone in Kyoto. And Air, if you haven't heard of them, they're from the 90s, but really, like, they're timeless. They are just timeless.

Let's get into this podcast with Alexa. Wishing you all a very special day. Hope you enjoy.

But tell me about Japan, because I really want to know, you were skiing there? Yeah, it was awesome. It was crazy. There was so much snow, over three meters of snow. I mean, it didn't stop snowing there. I went to Hakuba, actually, which is right outside of Tokyo, but it was amazing. I know Niseko is pretty popular.

But yeah, it was an amazing trip. I think Tokyo is really great too. We went to Seoul as well. So I hopped around for a bit. I was gone about two weeks. So it was a long trip, but it was so amazing. Yeah, I know that you just got back and you are still having fun with a little bit of jet lag, which is always nice. I found a trick for when I do the trips across the pond to go back and visit family and

And if you wear compression socks, then the blood doesn't like coagulate in your feet. And apparently by keeping the blood flowing, it helps in some magical way combat jet lag. Wow, that's amazing to know. Okay. Yeah. I'll keep that in mind for next time for sure. Maybe that's my problem is that I didn't wear those socks.

You should have worn the compression socks. That and a lot of water apparently helps a ton. And then seeing sunlight in the place that you're at as early as possible also helps. That's a good tip. Yeah. So you've been in tech for a while. I think that you've got a really cool story. And actually what I want to start off with is that you've written a lot about your tech journey. And one...

thing that you wrote about was using Airflow in a past job, right? And using Kubernetes and Airflow. And I am fascinated by what to this day, do you remember as being like the biggest pain when it comes to Airflow and using Airflow?

It's funny you ask that because it has been a while. And we were using Airflow, like we were running it ourselves, building it ourselves, deploying it ourselves. And our data scientists were building a lot of DAGs with it. I remember that we also built a few pipelines for the data scientists as well. I was on a data science infrastructure team. So that was part of something that we own. And my first task was

getting logs to show in Airflow so that people could view the logs of their jobs. And basically, we just had logs going into a database. And then I was pulling from that database and showing them on a screen while the task was running. So I think there were a lot of operational things that we had improved. And there are a lot of things that improved within Airflow as well. We had like a fork of it at the time too. What I remember the most was just

it would crash a lot. It couldn't handle a lot of workloads running and spanning at once. Now, this was right when I started, like I said, so maybe there was a resource thing or maybe tuning the resources was also an issue, but I think our scheduler would kind of crash out a lot at what we were trying to do, which also begs the question, like maybe there was a better tool for, you know,

The the large fan out that we were trying to do as well, but I remember that at the time was an issue. It was really great to start off writing about airflow for me because, as you mentioned, I have a.

I would say atypical, but I feel like a lot of people now in tech are a little bit atypical, too. I didn't have a CS degree. I had a chemistry degree. I actually studied computational chemistry. I did research in computational chemistry. And so that's kind of how...

I jumped into the world of software engineering and a way to learn and digest that information. I started writing about it and that's kind of how that came about. And it was really great that my first project was Airflow because at the time, so many people were also interested in how other people are using Airflow and the problems we were having, how to best set up a DAG, how to structure it and make the best use of it. So,

It was really great. It was a really great opportunity for me that led into a lot of other things. Yeah, it's interesting that you say like, oh, maybe this wasn't the right tool for the job. But back in the day when you were using it, I think it was...

kind of the only tool for the job. And there wasn't this maturity that you see now. There's all kinds of pipelining tools, whether you're doing ML pipelines or you're doing data pipelines, you have a lot more options. But just four years ago, even you didn't have those options. And that's what spurred. I think a lot of people like yourself had experiences where airflow is crashing or it wasn't working in the way that they wanted. And then they went

And they started their own companies because of that. And so you see now the maturity around the pipeline space.

That's so true. I mean, I remember at some point we started to discuss, okay, now should we look into Kubeflow because Kubeflow was becoming popular? Should we look into Argo workflows? I mean, we used Argo CD at the time, which is a great tool too. I mean, so nice to use. I remember when they implemented Argo CD, it was like, wow, this is amazing that we can manage our deployments and see all of our resources in such a great way. So yeah, Argo was such a hit and still is.

But yeah, I think we started to kind of think about other things that we had invested in. Now I don't know if they're still using it or not. Maybe they have changed. But yeah, it's right. I think even I've been in tech for, I think, five years now or so. Yeah, coming up on five or six. Who's counting? But I think it's sometimes easy to lose that context of how much has changed. So I'm thinking now there are other workflow tools. But then you're right. Yeah.

they weren't as mature as they are now and how quickly that changes. It's crazy. Have you played around much with Argo workflows? Personally, no. We have a team that manages Argo workflows, but I have some for jobs and it's super useful. And from my understanding, pretty easy to use, similar to Argo CD. It has a similar layout from what I understand. But personally, no, I haven't used it so much. Yeah, because I always wonder about

The different pipelining tools that are out there, at the end of the day, you've got these DAGs. And I was just on a conversation earlier this week with a guy who was talking about how basically everything is a graph and everything is a workflow in one way, shape, or form, whether it is a technical workflow or it is a business procedural workflow or flowchart. You can look at it. And he mentioned that usually...

The business procedural flow charts are much more complicated than the technical DAGs because he was like, yeah, and his name's Alex, Alex Malowski. And he was saying how a...

technical DAG, in general, you get like three or four steps. It's like, I want it to do this and then do this and then do that. Going back to the original idea on why I was mentioning this with Argo workflows is that you have different folks with different backgrounds and needs that will come to pipelining tools with their needs.

ways and opinions of doing things. And Argo workflows, I think folks who are coming from a DevOps background really tend to gravitate towards that. And then you have the

or the daggers and the prefix and mages of the world. And folks who are data engineers can understand those or like those a little bit better. And then you have the cube flows and the meta flows and Zen MLs of the world. And folks who are like ML engineers kind of vibe with those better. And so you get all this space to play with your pipelining tools that you like. And this is completely...

forgetting about the pipelining tools of the like no code low code pipeline tools let's just take those out of the picture but it's fascinating to think about that and how each background lends itself well to one type of tool or one type of space that you'll get into

Yeah, that's so true. I think you see that across a few different things, especially in the AI ML world where you have the infrastructure people creating things and then also the AI ML engineers using it. So even, you know, I think they favor or AI ML engineers favor Python. Typically, it seems like infrastructure engineers favor Go or YAML configs and the differences in how you interact with the APIs or the tools. For sure. I've seen that quite a bit.

Have you found ways that you see the interactions between those like two personas do better or worse? Because there is a very much like opinionated side for the infrastructure folks versus the opinions of the ML and even data scientists. That's almost like.

very removed opinions and workflows and what they need to get done. But I wonder if you've noticed things that have helped bridge the gap between these different personas. I think you're exactly right. They are a user of our platform. And

But we're both engineers, so it's an interesting interaction. And I think there's a lot of work now. And it's not exactly always easy, but to abstract away Kubernetes and the things that, what's a pod, what's a container, what's, you know, I mean, and you can get really, like, what's a virtual service? Do they really need to know that? And trying to abstract as much away as we can from them, I think that seems to be the move to start

be able to clearly tell you based on the three different types of errors that you get in your YAML spec, like if there's an error, like what does it actually mean for the person that deployed it who maybe doesn't really know much about Kubernetes? Like what action do they need to take? Because there should be some action they need to take, something wrong with their service they can fix. And trying to bridge that gap is something we've been working on a lot that I found quite interesting too.

And at first my reaction was, what do you mean they don't need to know about Kubernetes? You know, from our side, like, of course they need, they have a pod, they're running it, they have a container. But I mean, you see it more and more. And in general, Kubernetes has so many abstractions on top of it to build all these tools. I mean, that's what a lot of KubeCon is a huge Kubernetes conference. That's a lot of, you know, what it is, a ton of different tools that people have built on top of Kubernetes to make it easier to run.

in all these different ways. So I think that seems to be kind of what we're working on is trying to make sure they don't have to need to see all that stuff. Speaking of KubeCon, you spoke recently, right? At KubeCon and the KubeCon AI day. Yes. Can you talk to me about what your presentation was on?

Yes, recently I gave a keynote at KubeCon for the Envoy AI Gateway. It's a new open source project that the engineers from Bloomberg and Tetrate have been working on together. Again, it's an abstraction layer, basically on top of the Envoy Gateway project, which is a

server proxy that is used in a lot of our, or is used in our inference services to manage things like traffic, race requests, laws and monitors, traffic flow. So it's super useful for us. We use it and when

all these Gen-AI models are coming out. There's a lot of different complications that come with them compared to the inference services in the past. Like they're way larger, they use tokens. So...

For those reasons, they have a different set of problems that need to be solved. So no longer do we want to rate limit a service based on a request, but what would be really useful is to be able to rate limit a service based on the number of tokens, because that's the unit of work that we're trying to distinguish against there. So that was a really cool experience for me, because I never talked in front of that many people before.

And it was really cool to introduce a new, a really, really interesting new open source project as well on behalf of, you know, all the engineers that have worked together at Tetrate and Bloomberg. So I'm super excited about that. Yeah, it's always fun, especially when I think I heard a stat that 95% of people would

rank public speaking as a greater fear than dying. Oh my gosh, really? Yeah. Okay, that's funny. Yeah, it is for a lot of people. That's true. And I was nervous, but in an excited way. So talk to me about the difference or how KSERV and the proxy that you set up, what is it called? It's Envoy? Envoy at Gateway. Okay.

Yeah, Envoy AI Gateway, which is different than Envoy, right? There's an Envoy and then there's an Envoy AI Gateway? Yeah, Envoy AI Gateway is the new project that builds on top of Envoy Gateway. It basically just adds some more features that are specific for these LLM Gen AI models. Okay, and how does that interact with KSERV? Like where and how does the stack look with those two? Yeah, yeah, for sure. So, I mean, I've kind of,

assume that most people know what Kubernetes is here, but these are all open source projects that are built on top of Kubernetes, which helps you to run your services at scale and to manage them. So Envoy is a service proxy. So every request that comes into your cluster

for clusters, can be routed through Envoy. So this helps specifically with things like traffic control, traffic management, observability. It can do request routing. So things like that about managing the requests and how it moves through the system.

But yeah, so Knative is, I'll just mention that also. It's another tool or a building block of things we use. So Knative Serving is the main project within Knative that we use. It's also open source. It's great for helping to run applications with this idea of being serverless, you know, without having to worry too much about the infrastructure. So again, it's a...

It's an abstraction on top of Kubernetes with the goal of just making it easier to set up and run a service that's serverless. By serverless, you basically mean something that can provision resources dynamically. It doesn't really care so much about what server these resources are. You don't have to specify that. It figures it out for you.

So it makes it easier running. It's all to make things easier to run because if not, you have all these YAML files with all these configs. You can be copy and pasting them and you're setting up all these resources to make sure everything runs. I really like

running kubectl tree. There's a tool called kubectl tree. And from that, what you can see is a tree of the top resource you have because in Kubernetes, all these different resources like deployments, pods, and there's a few more depending on what you're running. And it can show you all the different resources that are made.

and you can go into the namespace and see what the configurations are and what they're doing. I find that super helpful for understanding these types of things. But basically, these tools help you to automatically set up all these things so you don't really have to worry about it. So they're all making some assumptions that most everyone needs these configs. And usually, for most tools, if you need something more specific, well, you can add that or change it as well. But not so you need to know and specify everything yourself.

But Knative, that's the instruction to help you just to get things running in a serverless way. It helps you with auto-scaling, so scale up and scale down your services based on things like

usage and also to scale to zero, which is something we actually use quite a bit because GPU resources are limited. So some services, if they can, they can use this thing called scale to zero, this feature. And if you can make a setting like, oh, if I haven't gotten any request in an hour or two, then scale to zero and let someone else use that GPU resource instead of taking it up without being active.

Some can't use this, but when you can, it's a quite useful tool to have. So yeah, so all that to say, those are the building blocks of KServe. KServe is another abstraction that simplifies actually running AI models, AI services, so inference services themselves. So basically what it is, is you can have this really short YAML, it's like this big, if you're using a model out of the box or some

some AI, like Alama or something, they have support for that. And what you can do is you can just easily get it running in a serverless way. So that's what's nice about it, is that you don't have to worry about all these configs. The goal is that you don't have to have this whole team that knows everything about how to run AI models. They can focus on other things like building the platform as well. But yeah, so it makes it a lot easier to run these services. Okay, so basically the...

Gateway is on the front end as the traffic comes in. It can dynamically provision. And then you have the Knative, which helps provision. So it sees like, oh, Gateway is saying that we need more resources. Let's provision those. And then KServe is built on the Knative in that, oh, one of the resources that we need is KServe.

to be able to ping this model. And the model might be a large language model or it might be some kind of a random forest model. It doesn't really matter in that way. Is that what I'm understanding? Yeah, yeah. So K-Service specifically, like Knative can run just serverless services no matter what, but K-Service specifically for AI models, ML models, things like that, exactly. So it has a lot of out-of-the-box tooling specifically

specific, just like Envoy AI Gateway is specific for AI models, so is KServe. It has a lot of out-of-the-box tooling and features that are specific to what you need for an AI model. So it can support easy configs for running

a lot of out-of-the-box models, like it easily sets that up. Or you can have a custom predictor is what we call it. You can write your own predictor. So we just basically spin up the service. You get an endpoint, you can hit it. And we have this unified API, which is really useful because all of these different model providers can have different access patterns about how to hit the model. So one really nice feature of KServe is that no matter which model you're using, you can always use the same unified API. And

An extension of this, or Envoy AI Gateway is an extension of this that also, as one of the free MVP features, has a unified API. So if you're trying to reach a model in Bedrock or you have something on-prem, it shouldn't matter. You'll have one unified API going through the Envoy AI Gateway, and it will, under the hood, direct the traffic with the correct structure to wherever you're using your model.

Oh, nice. Yeah. So that was actually one of the other questions that I had is like, does this only allow you to use as on-prem services or open source models? Or can you also dynamically send some stuff to the OpenAI API? And then maybe you want to send some stuff to Anthropic or whatever. If you do have your own services and your own open source models, you can send it to those too.

Yeah, so Envoy AI Gateway allows you to be able to run cross-cloud, like hybrid cloud. And that's one of the big features and one of the reasons that the problem arose as well. It's because all these different cloud providers have different ways of accessing their system. But yeah, I mean, KSERV itself is similar to something like SageMaker, Google Vertex.

In cloud, you can definitely run KServe as well if you want to manage your own inference services. I mean, it's definitely helpful if you don't want to use one of these cloud providers. If you're worried about cost savings and you want to manage something yourself, a lot of people can use it on cloud as well. But, I mean, also there are these products for inference, running inference services that are also very similar, have the same goal. So the...

Envoy AI Gateway is, and just stick with me here because I know I'm a little slow on it. No, no, you're good. First time I'm really digging into it. And I really like these AI gateways. It's not the first time that I've heard of it, but I do find that

It is a problem folks have, especially when you get rate limited so easily. If you're using external services, you want to have some kind of a fallback plan. And so it's almost like Envoy AI Gateway is an abstraction out and you can throw whatever endpoint you need underneath that. So whether you're using a SageMaker endpoint or a Vertex endpoint or a KServe endpoint, you can...

Link those all up to the Envoy AI gateway and it will figure out where the request needs to go depending on what it is. Yeah, yeah, exactly. Yeah, it has a unified API where you can easily specify, you know, what you need to do and it'll auto route for you, which is great.

It's all about just making it easier to run and easier to manage. And yeah, you start to see patterns. Like I said, everything's kind of starting to be built on top of Kubernetes and built on top of these other tools, all with the goal to just make it easier to run and not have to worry about the infrastructure so much.

I mean, there are a lot of also really cool or really interesting problems brought on by these large language models and Gen AI systems as well. The model sizes are so large that downloading the model takes a long time as well. So one interesting problem that KServe is starting to work on is the model cache, being able to cache models and not have to download them every time a pod starts up for every pod. So that's something like little things also that will be super

for super helpful like gp working on gpu utilization as well of course because gpus right now are resource limited so a lot of these issues around running and getting enterprise enterprise ai up and running being useful and being optimized is something that we're definitely working on yeah that cold start is it's like you can't a while sometimes

Yeah, and you can't really base services off of that if you're like, oh, well, yeah, just come back tomorrow and maybe it'll be running. We'll see. Yeah, I mean, it's wild how large models have gotten in such a short amount of time, how much more they're able to do as well. I mean, I think, what was it, Lama 3.1 just came out and it has over, it can take over 128,000 tokens. I mean, it also has a,

a crazy storage size right like it can't fit on one node so yeah it needs at least two nodes around which is also makes it more distributed and you have to tackle those problems as well so I mean I think it's great there's just new problems and things are changing a lot but it's a very fun space to work in as well yeah well now talk to me about Celery what is that and

Celery? Yeah. It's been a while. That was my first open source contribution. So it happened because there was a little silly bug that we needed to fix, but it happened from working on Airflow. I remember they did a lot. It was something like they did. I think celery is...

I don't know if I'm saying the wrong thing, but basically it's a component that Airflow uses. And I was trying to set up a connection, like a private connection with Google Cloud, and it was something that just wasn't supported yet.

fully and they just said all these options don't allow like options A through C don't allow and like my options started with the letter C or something so I got my first open source contribution by saying allow this you know which I mean that's how it is but it was a cool feeling to have my first open source contribution and now I'm

involved in open source way more but uh it's really cool to make an impact and have other people view your code from other companies that's really cool too so that was my first attempt into that now that you've had so many uh issues accepted or prs merged in the open source realm and you've done a lot of work just like as a core contributor right what are some things that you would

tell yourself back then or as you were earlier on in the game about contributing to open source? I think what's really nice about all of the open source work I've done is that it had a business value, like a very clear business value. And that makes it a lot easier. And I think I got a bit lucky with that because

Like going to Bloomberg, my goal wasn't to work on open source, but it's a great part of it. You know, it's amazing that we can do that and we use it, uh,

every day here and everything we're doing it's features that we actually need you know so I think that that makes it I'd say if you really want to work on open source I love learning on the job I think hobbies and side projects are great for the fact of learning but when you learn on the job and you're using it in production and you're deploying it I mean that's the best way to do it if you can but I think if you really just want to get involved in open source and that's a goal of yours and you can't find a job that does that now or you're

for some reason you won't. Yeah, I think just, you know, most communities are super accepting and there are a lot of labels for good first issues. And if it's something you're really interested in, a good first issue. And a lot of them also have community meetings, you know, once a month, biweekly. So if you want to, you know, someone to help you on your PR, it's probably good to go to those and get to know people as well. But I mean, if you're in open source, you're also someone typically who knows

is, you know, really a supporter of open source and everyone coming together and contributing. So I've found that my experience with people in the open source community is very positive. Going back to what you said when it comes to having a clear business value, what are ways that you've found you can...

demonstrate that or champion for that to make it very clear to whoever you need to that this is the right problem to be working on, that this problem has business value and that you should be spending your time on it.

Again, I feel a bit lucky in the space I'm in because usually it's pretty clear. Like, because Gen AI has come, because LLMs are so large, like, we need this stuff. You know, it's very apparent that we need it. So I think, you know, from...

a resource perspective, it's very apparent from a usability perspective, there's a lot of room for growth in this area. So I feel like that's been a little bit easier, but something I think about quite a lot that I've seen, especially if I think about my first job, because it was a smaller company and there were just a few principal and staff engineers, but they were...

like rock stars. I don't mean like a coder rock star, but they were just great. They're really good. And they dug in and they're really good at finding business value problems. I think that is a key skill. And I thought a lot about some of it is luck. Sometimes you fall. I think, for example, in my first company, I fell into the best team. I had the best mentor that I could have possibly gotten.

Mean I feel extremely lucky and grateful for the opportunity that I was given because I didn't I didn't really know coding that well Like I did some Python stuff in school but I I got that experience and because they're so willing to help me and because They were so good at finding business value problems and then giving it to me because to help me grow I mean it made like a world of difference I think sometimes if I would have been on another team or at a different place would it have been the same? but I think seeing them work and

It made me think a lot about this topic. And I don't know exactly what the answer is, but I would like to. Well, the idea of like being able to sniff out key business value problems and having a good eye or nose for that is so important.

enlightening as you say it it's like yeah that is a skill right there yeah and I do think uh something I try to be clear on when we're designing something like what are the requirements and if you don't know then you should find out to make sure it's worth your time I think what I have tried to do and again I still would like to get better at this skill of course but what I've tried to do is at least know try to figure out what's not a good business value

I think there are some clear signs that maybe it's not. So how many people are going to use this versus how many people? Is this generating any money for us or what purpose does this serve? I mean, it depends on what the product is. But do you need that new button? Have we asked them, is it actually useful? Is that something they would really use? I think getting customer or client or user feedback is super helpful as well because I think the worst thing you can do is make something that no one cares about.

It's a waste of time. It's a waste of money and it's super demotivating. Yeah. And then you go and try and like justify what you've been doing for the past three or six months. Yeah. Yeah. And the worst thing too is also like working really hard on something no one cares about as well. So I think knowing where to manage your time and your energy is super important and just making sure as much as you can that it's going to be something really useful. Yeah.

I also wonder because there are certain things that maybe, you know, they're very important, but also if you don't properly evangelize them with your stakeholders, then it could still fall flat on your face. We had this guy Stefano on here, like,

a few months ago and he talked about how he built the most incredible ML platform ever with everything that you could ever want. All of the top features that he read in every blog, you know, and then he said, we released it to crickets. None of the data scientists wanted to use it because we didn't properly go out there and evangelize how good it was for them to start using it. So they all kept just doing their own thing in their own ways and

and not coming on the platform that he spent all this time making absolutely incredible, you know, and having all the features that you would want and you would think are almost like table stakes or everybody must need them. And he fell flat on his face even after doing that. Yeah. And I think another quote I really like is, and this was the motto of the engineering team,

engineering department at the time when I was there in my change was as simple as possible, as powerful as necessary. So I think one thing to speak to that is to get something out and then iterate on it so that, you know, it doesn't have to be perfect. And not saying that this was, but just one point that that made me think of is that it doesn't need to be so perfect. Just get something out that's an MVP and be able to start iterating on it and get people using it and get feedback. Because I do think a lot of times we think we know best about what people want.

But sometimes it's just not true. You know, especially as engineers making a platform for another engineer, I think we think even more, oh, we know what they would want. But sometimes it's just not true, you know? And like you said, like sometimes people...

need to adopt it or like have time or maybe maybe it wasn't exactly what they're hoping but another point that that I think you bring up a lot too is that not only did I have and still have really great mentors but also having someone who champions you and make sure that your work is out there and known and publicized is very important I mean

I do a lot of self-publicization, probably like you do as well. We're very public people. We're posting a lot. And does that help me? Yeah, I think it does a lot, actually. I mean, I think...

I hate to say the loudest person in the room gets heard. I mean, that's not always true, but you need to, your name needs to be known or like the project needs to be known and put out there. Like I'm sure there's some like amazing apps, but if they don't have any marketing, maybe they don't have any users. So if you don't have any users, you can have the best app in the world, but what does it really matter? You know? Yeah. Like first time founders focus on product and second time founders focus on go to market. But,

basically. Yeah. So, I mean, yeah, it's super, super important. The whole marketing part of it. Marketing yourself is super important as well. Like I, for example, I create a, a brag doc for myself. I got this idea. I'm forgetting the, um, the tech influencers name. She was big on, uh, Twitter a while ago, but she created these, uh, zines like these comment with magazines. And it's, uh, an idea from her. I have to look up who she is.

But to write a brag doc about yourself. So every quarter, I do it every quarter. I have certain sections about like presentations, like general work. And I summarize all my work every quarter and I give it to my manager or whoever's doing my review. And the feedback I've gotten is that that's also super, super helpful. Just on the topic of evangelizing yourself and your products as well. I think it's really important in tech. Yeah.

I need to do that just so that on the days that I feel down, I can open that up and be like, look, I'm not as much of a failure as I'm making myself out to be. Yours would be pretty awesome, too. You do a lot of stuff. So, yeah, it's good to remind yourself. It's good to keep track, you know, and then it makes writing. Because sometimes you forget, what did I do? But having having a record of it is super nice.

Yeah, I think there is a part of human nature that you get into these slumps and having something like that and going through something like that is very useful. I heard even a story of

John Lennon and the Beatles were in the recording studio and Lennon was down and didn't want to record anything. But this was after a whole week bender. But as it happens, after a week bender, you're kind of depleted and he's not wanting to record anything. And then people started playing him some songs back or reading him lyrics back that he wrote. And he got like, oh yeah, I wrote that? Oh yeah, maybe I do have

maybe I am an okay songwriter, you know? So even the best of us, it can happen to us. And having something like this is quite useful. It's like just to get your spirits high again and then get you back on track. Yeah, for sure. But going back to this motto of, what was it? Only as powerful as necessary and as simple as possible. Yeah, as simple as possible, as powerful as necessary. That is such a great,

motto to like live by because it's so easy to over-engineer stuff it's so easy to grab the shiniest tools and before you know it you're way in over your head yeah and you're like oh maybe uh this is gonna take a little longer than expected because the scope has creeped exactly yeah yeah that's super difficult but at least starting with that principle I think helps a lot like I said

get your requirements, understand what you need, and then try to grow from there. And so trying to solve this huge problem in the beginning. I think that's applicable to a few different scenarios. Yeah, I'm thinking about the requirements and also how you were mentioning, like things that are red flags when you're starting a project that can be almost like early warning signs that this might not have as much business value as you think.

And I'm wondering if you have any more. There's like, does the team that's creating it, is it bigger than the actual people that will benefit from it? That is a huge red flag, right? Yeah, so that's what I was like, how many users are wanting this feature? Like how many people have we asked that? Where did it come from? Who actually wants it? Like what's the cost of running it? And that's what I also mean by requirements. Just like, what's the criteria for it? Like, why do we need this? What purpose does it serve?

So yeah, I think if one person says, "Oh, hey, it'd be really nice to have this." I think I have some questions about if it's worth my time, but sometimes it's obvious that it is worth your time. Sometimes it's obvious that we definitely need model caching. That's obvious to everyone. But other times, do we need this button? I don't know. Maybe, maybe not. - Yeah, I wonder about anything related to GPUs and saving,

GPU time or saturating the GPUs more, like getting the most out of the GPUs, that feels like something that is, yeah, we should go and work on it. But again, you should come at it from this eye, almost like a discerning eye to be able to ask questions and recognize from what you're getting if it's actually as valuable as folks are making it out to be. Yeah. Yeah.

Exactly. And there's a lot of different ways to do things. Like for a good example of this in KServe, we started adding some of the open AI protocols like chat, chat completion. There's more, but we just started with that. And we started supporting a few different type of tasks like

auto-routing, what kind of tasks you want. And then for the GPU thing, there are a few different ways that you can manage GPUs working together. And I know that right now they're working on a solution that uses two of those because those are maybe the two most popular, the two that are most needed. So I think that approach is also pretty good. You don't need to do everything at once. At least just get something that people can start using. And then I really like using chat, the chat endpoint here. Now I would really like to use this one as well. And I think that's super helpful.

How much are you going to the non-technical stakeholders that are on the business side of the house and having conversations with them about what they may want since? Like with the advent of LLMs, it opens the aperture of who's using the AI solutions?

Yeah, I mean, that's a good question because I feel like I'm pretty low on the stack as a platform engineer. We discuss a lot with the AI and ML engineers, but we do have product people that we talk with regularly. And I know they are really good about getting all of that different feedback and kind of funneling it to us in a way to create products and features. So for us, I believe that's the way it works. At Bloomberg, it's really cool because even a lot of the management and

product people are super technical they grow a lot and Bloomberg is really great at keeping people I know a lot of a lot of tech in tech a lot people move around quickly it's just so common right but I'm super impressed with that Bloomberg is there are a ton of people who've been here for 10 plus years you know

So they've moved up and around in Bloomberg. But yeah, so that's one interesting part about Bloomberg. But as far as how we get it, it's usually through product, which I do like that, you know, streamlined way of doing things. But I do agree. And there's like a whole philosophy around this about going to the users and asking them what they want. And I think that's a really helpful. And I think that's the way to go. You should always be doing that. You know, if you're not doing that, it's kind of worrisome, I think. Yeah.

Yeah, again, red flags. That is something you might want to look out for because you're potentially not finding the...

highest levers or the biggest levers that you can pull? Yeah, I think if you can also creating a personal relationship with say the AI or ML engineers that are using our platform to go sit with them and just be like, hey, if you have someone that you consider like a super user that's really using it a lot, always in the support chat, like show me how you're using it. And then maybe they'll be like, oh yeah, by the way, I don't know what this button is. Or, oh yeah, by the way, I always have to do this every time. Maybe you would pick up on things. I think that is also important

really good way to figure out what are people's patterns and using your tool and what are the pain points because they might not say it in a survey and they might forget it or they might not say it in the chat when they're doing something but by sitting with them and observing with them you could figure it out yeah I am

Just probably I would say like 90% of the tools that I use, I am never like blown away by the experience. But that doesn't mean that I go and I explain what could be better about the tool to the people that are creating that tool. And I am assuming that that's like the majority of folks, right? Because it takes so long to explain like why this is a pain and

And how big of a pain is it for me? And so then usually I'm like, whatever, I'll do something else with my time. Yeah, you can't pick and choose your battles. Yeah. And so you going and sitting with someone,

you almost like force them to show you what is painful because if if I had somebody that was right next to me and watching me do my thing I'm sure I would be very vocal yeah same yeah kind of it's a bit more intimate in the way like it makes you feel more comfortable you know than just entering something on a survey it takes more time but it's more personable the higher yeah higher leverage 100 the the other thing that I was going to ask you about is

When you look at the stack and you think about the traditional ML use cases and almost like tabular data, low latency, that type of stuff that you are creating a platform for or you're helping folks serve those use cases versus the new LLM AI world that you're creating a platform for or helping usher in those use cases, where do you see them diverging?

And where, like we talked about the Envoy AI gateway and that's specifically for an LLM use case, right? Yeah.

How do you see those two worlds playing together in a platform? Like, does the platform support all of it and almost have like this sprawl? Does the platform have pillars that depending on what the use case is, you kind of plug into the platform over here or over there? Like, how do you visualize that or look at it? So a few things, I think.

We're trying to make the platform as easy to use as possible. And at the end of the day, you're just deploying a model somewhere. And we try to make sure the user doesn't have to care too much, like I was saying, trying to abstract away a lot of those Kubernetes concepts as much as we can. Try to make sure the user doesn't care too much about what or where, like

Why should they care if it's on-prem or in a cloud? It's just running where there's resources, you know? So for AI models across hybrid environments, we try to create a platform that's very seamless. You don't really know or care where it's running. And that's the whole, you know, point of serverless and being able to run hybrid cloud, just be able to use resources for different things. As far as different products running on Kubernetes,

I would say training would be like a different tab in our UI, you know? And where does it differ? I think it differs in inference services or long-running jobs and things like training jobs usually are not. It depends on what you're training and what you're doing. So the way these jobs run, you're still deploying something in Kubernetes, though. So the basics are still the same. I think some of the smaller features are different. Like we have a model registry, so where you can...

save your models, pull versions of them just by a click of a button and have different, change different configs slightly, have different artifacts. So I think that's something that is needed in inference that maybe isn't as needed in training or some different features like that. That's kind of where I see them diverging. Did that answer your question? Yeah. And I like this idea of like,

The things that are the same are that whether you have a gigantic model or you have a very small pruned model or a fine-tuned model or a distilled model or a random forest model, it's a model and it's going to be sitting on some kind of Kubernetes service. Like you just want the model. You don't really need to think about

How much traffic can that model handle? What the actual like where it sits, which cloud is it on? Is it on prem? All of that stuff. I really like the idea of like, let's abstract away everything that we can so that if you have a data scientist that is putting a model out there.

they know they can just get that model out there and their job is to create the best model. And then the rest is taken care of for them. - Yeah, exactly. And that's why the unified API is so nice too, 'cause we give you an endpoint and the endpoint will look very similar no matter where you're running it. And what you will use to predict or chat will be so similar in the same format

no matter where it's running. And so that's a really good plus of these tools, I think. Do you distinguish at all between like the models that are

For LLM worlds, you can distill models and or maybe you prune them or you do things to make them smaller. Or maybe you have an ensemble of models and you have the gateway and whatnot. And so there's things that you potentially want to do to LLMs.

that you wouldn't necessarily do to traditional ML models. Like if you're training those models, like you said, and you have the model registry and maybe there's feature engineering happening in the traditional ML world,

Do you distinguish there, too, or is it just like you were mentioning where these are different tabs on the platform depending on your use case? So for that, so what your inference service is running, is it running this model or that model? That's just a config or one little part of the small YAML that you're putting into the inference service. And then the case server will make a lot of assumptions underneath the hood so you don't have to write YAML.

all these big things about they're specific to that model, how it runs, like even like what's the port that gets the metrics. Like we know TensorFlow is on this port and another model is on this port. So we'll automatically open that port or be able to pull from it if you're getting metrics. So a lot of things like that, it's just you need to tell us the keyword or key config one thing and then we do the rest. But that's in the YAML itself. So that's within the tab of the inference product itself.

So much fun.

Kubernetes, AI Gateways, and the Future of MLOps // Alexa Griffith // #294 51:43 Share

MLOps.community

Deep Dive

Shownotes Transcript

Kubernetes, AI Gateways, and the Future of MLOps // Alexa Griffith // #294