We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Real LLM Success Stories: How They Actually Work // Alex Strick van Linschoten // #287

2025/1/31

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Alex Strick van Linschoten

Demetrios

Topics

Alex Strick van Linschoten: 我受到了 Evidently AI 数据库的启发,开始着手整理关于 LLM 的各种信息。我发现网络上零散地分布着许多关于 LLM 应用的博客和讨论,这些都是宝贵的数据来源。我希望能够整理这些信息,帮助大家了解 LLM 在生产环境中的实际应用情况,无论是大型企业还是小型团队,都能从中受益。 Demetrios: Evidently AI 创建了一个关于 ML 和 AI 如何被使用的庞大数据库,ZenML 也做了类似的事情,但专门针对 LLM。这个数据库涵盖了小型团队到大型企业的使用案例。LLM 的用例非常多样化,企业正在努力寻找 LLM 的价值。

Deep Dive

Shownotes Transcript

Translations:

中文

My name is Alex Strick van Linscheten. I'm a machine learning engineer at ZenML. And I don't drink any coffee. I drink green tea. Not always, but yeah, a good cup of jasmine green tea or something.

Good people of the universe. Welcome back to the MLOps Community Podcast. Today, we've got a very special episode going over the database of real world LLM use cases. That's right. There's all kinds of AI use cases that Alex consolidated into one place. And I appreciate him doing that for us.

We get to the conversation about what exactly he learned while putting together this massive database and how he did it. So without further ado, let's get into it. And if you are one of those special people that listens to this episode on a podcast player, I've got some recommendations for your podcast.

listening sessions in the algorithm on YouTube, you can find such a great gem of Bob Dylan and Van Morrison playing the classic song Crazy Love, Front of the Acropolis, in Athens, 1989. I can hear her heartbeat from a thousand miles in the heaven open sky.

Every time she smiled. Yeah, when I come to her, that's where I belong. Yeah, I run into her like a river's song. She gave me love, love, crazy love. She gave me love, love, crazy love. Fine, said the humor, when I'm feeling low down. Yeah, when I come to her, when the sun goes down.

take away my trouble take away my take away my heartache in the night like a thief she gave me love love love crazy love she gave me love i think the only time i've ever seen this well done what you've done is when the folks at evidently ai put together

a gigantic database of different ways that ML and AI is being used. And they took a lot of disparate data sources from blog posts that have been out there and folks that they've talked to and probably some people that are using their open source client. You all did something similar at ZenML and you put together a database, but for LLM specifically. And, and

people that are using LLMs in production. Can you explain how you went about it and what a huge undertaking? Sure, yeah. And I'm so glad you brought up that Evidently databases. They have two databases, I think. And they were totally, that was totally an inspiration for us. I was like, yeah, there's all of this stuff going on and you see like,

people posting these little blogs occasionally or just like random things or like obviously like all of the conversations you're hosting with MLOps community as well just like this rich source of data and we're all sort of trying to figure out like exactly how does this stuff work in production like what are the what's the spectrum from like mega company down to like five or six people just trying to like start something start something new um

So yeah, my background, I used to be a historian. And so I kind of like...

um i don't know a hamster another hamster a squirrel like just like hoarding like all of these things so i've been like keeping a list of all of these links as as i went through and at a certain point like you reach um yeah the point where like it makes sense to put this out there um and yeah obviously i mean we can talk about the details but like

there's summaries for each of the posts which would have been impossible to do manually um for for me or at least like without like a mega budget um and like llms themselves like helped make all of that stuff uh stuff easier um so yeah it it was a big undertaking in the sense that um

It's not like someone else had done the work of collecting them together. But in terms of overall time, it didn't take that long to put together. Yeah, now there's one place. I know, at least for myself, I always try to add any quality blog posts that I'll see out there on the internet to the MLOF community newsletter that goes out as like a hidden gem. And it's true that

It goes out as a hidden gem. And then maybe if you missed that week, you missed it for good. But here you have it and you can always go back to it and reference it. And I really like what you said is how it tries to cover the gambit of small teams, what they're doing to large enterprises, what they're doing. And did you notice any types of

repetitive use cases. Because the other thing that's a little bit new here is how varied the use cases are. It's not with the traditional ML where we kind of have it figured out and we know that there's going to be some fraud detection, there's going to be some recommender systems, maybe there's some loan scoring or some classification. But with LLMs, it's

wild, wild west on how you are getting value for your company. And there's whole task forces at enterprises that are trying to figure that out right now. Yeah. I mean, variety is definitely the word. There are a ton of different use cases. I think there's kind of two broad categories. And one is probably maybe much bigger. The bigger one is like

Let's go with something that we see everyone else doing. So let's build a chatbot. And the chatbot is either like customer service or it's like chat with your data.

These are, broadly speaking, like the two most. And these can come in different flavors. Some of them have like an agentic, you know, color to it. Other ones, yeah, are like completely internal. Other ones are customer-facing and so on. Those are kind of the...

yeah some ceo has seen like some other company has done this um or someone internally has built like this poc demo on like a streamlit app or whatever which looks crazy impressive like let's roll it out it's already working right um and so you see little did they know those were kind of the the the most common ones um and then you have like a smaller cluster of of uh

which are either, yeah, which are just like, I don't know. I didn't know whether there's any unifying threat thread apart from the fact that it's like, um, it's like the companies which are really like pushing the envelope. Technically they're really driving like innovation. They're figuring out like stuff. Think, I don't know. Think, um, what copilot was doing like a few years back, like really like driving a path onto themselves or, um, or,

Or it's some companies doing some like innovative stuff with perhaps with agents now or mixing stuff around content generation or whatever, which you can't really categorize. And probably maybe there aren't like a thousand other companies who would want to do this, but like it really works for them. Yeah.

And so, yeah. But I mean, it takes a certain kind of company and a certain amount of like risk appetite for you to get into like, I'm just going to go and do my own thing, even though everyone else is doing chatbots. Yeah. Yeah, it's easy to do the chatbots because it's working for the majority of folks, or at least it's better than what we are used to as far as chatbots go. And so why not? And I like that you...

show there's different flavors of chatbots. Maybe it's agentic, maybe it's internal, maybe it is just externally facing and it's folks that are hacking around on something or it's a full-on rolled out project that got buy-in from leadership. And so you see every type of flavor when

you get blog posts about it and when people write about it or talk about it or come on the podcast and share. And I wonder a bit how there's got to be a lot that is still obfuscated because when I think about a company like an Uber, they have so many use cases that they aren't able to talk about all of them.

And they're not able to show every single way that just random departments are using LLMs and what they're doing with it. And it makes me think a lot about how...

We've had traditionally like the data governance role. And now the AI governance role is just such a beast because in an enterprise with a thousand plus people or God forbid, like 10,000 people, imagine how many different instances of AI

AI they're doing and how many repetitive licenses they're paying for repetitive workflows and that just uh yeah it gives me anxiety just thinking about that

Yeah, I mean, certainly, you know, the things that people are putting out, particularly in Uber or whatever, generally speaking, people are putting things out which make them look good or make their teams look good. Even if it's a failure, it's like we caught the failure, right? Or we had good processes to catch the failure. And yeah, lots of respect for companies that like include...

of like where they mess something up. One which comes to mind is like Weights and Biases who developed their like internal, sorry, support chatbot. And they'd be really great at like building in public and sharing stuff on how they build their evals and so on. They shared like, oh yeah, we got something wrong about like how we did our evals and we needed to spend like...

um several thousand like dollars just like redoing everything because we made some mistake or whatever so like yeah it's nice when people like put actual money um but yeah for it would be great if there was a bit more normalization of like sharing failures and paths which didn't work i guess those are shared internally and maybe for a mega corporation that is good that that's the way you know that's a good thing for them and yeah maybe it's expected but um

Obviously, it would be nice to see all the parts which didn't work out along the way. Were there any other patterns that you saw as you were putting this together and whether it goes towards...

common design patterns or how folks or the majority of use cases are doing evaluation this way or that type of thing that you noticed after reading so many you're like well this seems to be the flavor of the day maybe it is the most useful yeah so so i mean yeah lots of lots of i guess smaller insights rather than big insights i mean i guess if there was a big insight it's like

All of the tried and tested things that we know about, like, software engineering, DevOps, all of these kinds of principles, all of that stuff is super important. And you better get all of that stuff right. Otherwise, this magic that you're building on top is not going to work. Or it's not going to work reliably. And, you know, we thought a lot about, like, exactly what to call this database. At the moment, we settled for LLM Ops because this is kind of...

what the community seems to be settling for. Microsoft is trying really hard to push this term Gen AI ops, which is just like too many syllables and no one else is using it. Somebody told me, I may get canceled for saying this, but somebody came on here the other day and said, oh yeah, we're doing Gen Ops. And I was like, that sounds like you're pretty progressive there, huh? Like,

Is there some... It just made me be like, GenOps doesn't sound like what I think you want it to mean. Yeah. We have this term, LLMOps, which...

is is sort of i guess like the the ways to think about and the ways to do all of this stuff we're doing around gen ai and to be honest like most of it is around llms it's not still not in the video domain it's they're starting to do people trying to do things with multi-modality but like that's still a little bit like uh future facing um and and same with same with image generation um

But really, a lot of the stuff which underpins this is MLOps. And even, as I'm sure you're very well aware, there's still a lot of people who say MLOps is not a thing. It's just DevOps. You can go all the way back down. That's why I kind of wanted to say software engineering best practices, which a lot of this is. So that's kind of one thing. The fundamentals still really matter.

And what were some of those smaller insights? I'm very interested in tracking the extent to which people are actually using this in production. Every year, it's like, this is going to be the year of the agent. I hear it's now about 2025. And you put on a great conference recently around this and some of the use cases that

And it's still relatively, I mean, there are some success stories of like companies who are doing things, but quite often there's not enough technical detail to know. It's not like you're just like unleashing customers workflows, just like completely unbounded, like let the agents have it. It's like, no, everything has been super constrained down like as much as possible. And so, yeah, it does seem that we're,

we're not quite there with making agents work reliably. And it's, yeah, it's unclear to me, not working on a ton of these projects, like exactly where the bottleneck is in terms of that. But the places where people are managing to work

to get this kind of to work is, as I said, like really, really constraining down exactly the specific tasks or specifications for agents. Klarna had a kind of huge, huge win there where they...

they amplified or like supported their like customer service agents. I think they calculated that they, they would like gain, what was it like 30, $40 million in profit on the basis of this deployment. They reduced the time of customers or whatever in, in waiting lines and people didn't come back and all of these kinds of things. It's a mega company. The from, from what you could tell from what they released, it's like, yeah, it was,

a very narrow realm and they could kind of control it. And C, they are filing to go public. So...

That can boost their stock price when they IPO. I think that that felt to me like one of the most outlandish claims of 2024 was when the CEO came out and said, we don't need to hire 700 people because our AI or something like that. And it was one of those where you read between the lines and you recognize, oh, yeah, that's

He's just getting ready to go public. He's probably on a road show right now. I'm good on him. Let that stock pop on A1. But there is, it's funny you mentioned Klarna specifically because of that and the support

use case because that feels like the one that is most defined when it comes to agents and all the other ones what I've seen is yeah we're really trying to figure it out still and you can't just say go be my marketing team right or go do my marketing for my startup what you have to do is

Go deep, deep, deep down into one specific task and then try and automate it in a way that is possible that you know the steps for. And so you can say, all right, go and collect all of the keywords that my competitors are

using a whole pay-per-click campaign around and then analyze which ones I am interested in also bidding on. That type of thing is great for an agent. It's not going to do marketing for me, right? Because the more vague you are, the less that you're going to get the outcome that you want. And so that probably is the hardest thing right now with agents. And then when you see them

trying to be used like the the OLX magic use case that we had at the agents in production where they're trying to reinvent this way that we're doing search inside of their app and so you just you don't necessarily need to search for anything specific you just say oh I'm looking for a baby stroller and it will give you a few options but it will try to

be a bit more agentic in the way that it presents you these options instead of just giving you the ads or the classifieds of people that are advertising their strollers. You can get the information and then start honing in on it. I still am not clear, though,

do we want to be using chat for this? And the interface of me having to explain exactly what I'm looking for versus me being able to click around in a recommender system. And so I think what OLX Magic is doing that's cool is they're trying to combine both and say, hey,

all right, we're bringing you these first searches or these first hits, but then we're also adding a recommender system on top of that so that we can learn from where you're clicking and where you're going off of. And so it's,

not throwing out the old just because there's new and thinking really creatively at how to layer the two on top of each other. Yeah, I think in the database, there's also something which is somewhat underrepresented, maybe because the kinds of people who write these blogs are like the technical team or like the software engineers, kind of a little bit more on the backend side. But like, you're totally right. Like,

UX innovation is also super needed and people to like experiment around. And a lot of things which are often presented as chat interface don't need to be, but they can still, you know, use LLMs under the hood. It just can be like a button. Why make me like type out all of this stuff? Exactly. Or interact through voice. So yeah, that's totally something which...

And yeah, I think even if all of the innovation in models stops right now, we still have a bunch of years to figure out new ways of doing this stuff. Yeah, the interface is a really fascinating one for me because...

We've got the pointer and we've got, or the cursor. We are used to clicking around, but we've also got different commands. And so thinking about hotkeys is fascinating to me. And then I am constantly referencing a talk from Linus Lee from like almost a year ago, probably when he, it was before Notion AI had implemented their just,

five suggestions of what you can do with Notion AI inside of Notion as you're writing. And it's all click-based, right? Click ops in a way. And that's really cool. But then, yeah, maybe there's just...

uh select all and then you have some kind of a hot key that you can add your voice to the writing or just clean up typos or whatever it may be rewrite in a shorter way or condense this thought or or to bring up the box where you can go back and forth with a chat and so

Yeah, all that is... It feels like we're still in the first inning on it. Yeah, yeah. And I mean, that maybe takes to another kind of lesson which came out of the database was...

there's a, again, perhaps not surprisingly, but there's a lot of people who are deferring to pre-made, I don't want to say pre-made necessarily, but like safe frameworks and platforms around Gen AI, whether it's Bedrock or it's more specific stuff where AWS has done the

the work of like making it super easy for someone to create a chatbot based off company or enterprise data or something like that.

So yeah, I was kind of surprised to see how many people, I guess, as the saying goes, no one got fired for buying AWS or whatever for your enterprise company. But I worry in the light of what we were just talking about with the UX stuff, if we go too quickly into the world of a pre-made framework with relatively little flexibility, then maybe we don't get to discover all of these things.

different ways that like customers could could interact with our stuff then we're just getting the the whatever the five or ten things that you can get with bedrock out of the box that's such a great point and then it all looks the same and it's not like we were really excited about that whole experience in the first place so now we get just more chatbots that we don't enjoy interacting with

And yeah, I feel like I've seen that before. But the pattern has happened already. But it's nice that the open source side of that doesn't seem to... People don't seem to be resting on their laurels too much. I think of, I don't know, Langchain or Llama Index, where probably by this point they could just like...

Stop like breaking new ground as new technology come out and just like, hey, we're just going to become like the super stable like chatbot guys. And so, yeah, to their credit, they're like they're still discovering things still.

yeah, still adding new ways of thinking about like LLMs and Gen AI, you know, even acknowledging that the many criticisms that people have of particularly those two, like it's still, yeah, they haven't fallen into the same trap I feel.

One use case that I saw that was fascinating was when Philip from Honeycomb came on here and he was talking about how they were plugging in LLMs to their product itself and trying to have the LLM almost be like a shepherd to help folks.

get to doing something inside of the product that they knew would convert them to a paying user and I think that use case is incredibly awesome but very under seen or I don't know many other companies that are trying to innovate in that way maybe you saw others that said oh yeah we're gonna

plug in LLMs as this guide or as the shepherd or really as like our sales agent inside of the product or a sales engineer, more like, more like it. And from there, they're going to help the user become more proficient at the product faster so that they become a power user and inevitably buy the product. I mean, I think for the most part, um,

Most companies seemed a little bit wary of entrusting too much agency to that level, to LLMs. I mean, somehow chatbots are that thing, right? It's like, we can't give you full access to our support team, but hey, here's this robot which is there 24-7 and you can knock yourself out. The problem comes is like...

it's often seen as kind of like a panacea or like people don't like stick the details and they're like then then you see like people getting frustrated with a product i mean yeah certainly i'm sure you you have tried out like random demos and random things on a whole bunch of different places and like quite quickly like you realize oh it's not actually like

doing what this thing is intended to be. So either people are releasing these things and there's kind of middling results and that's why you don't see in the blogs that people write about it, it's not so much about like, yeah, we made a ton of money out of this.

but it's more focused on the technical challenges. And then, yeah, there's a ton where it's just like we built something internal because it's way easier. Like we're way more comfortable with the risk there where the people can find it useful and they can kind of take it or leave it. But in the end, they have to work for us or whatever. You know, it's not like they can... Can't complain to anybody. Right.

Yeah, they're not going to make the stock price go down by writing a horrible Reddit thread. So, you know, as you were saying that, one thing that jumped into my mind is how with these support bots that we get, you have to inevitably think the majority of them are powered by RAG systems in some way, shape, or form. I'm sure you have read RAG.

More rag blogs than you would like to admit. When folks are setting up their rags and they're giving context to... So you've got the chatbot that I, as an end user, am interacting with. And then that goes to some kind of a search system or it's retrieving stuff that you're asking for. And it's also maybe...

trying to come up with a solution. I wonder if people have experimented with adding different signals of what I've been doing inside of the app. So this may get a little confusing,

dare I say the buzzword, multimodal on us. But if I'm clicking around on something and I have, I love, some people call it like rage clicks because something isn't working or I'm trying to find something. The last thing I want is for me to talk to the support bot and then it suggests that I do exactly what I've been doing. Right. Well, then I have to explain like...

the last 10 minutes of yeah yeah exactly i don't want to have to tell you that i just did these five steps i would like that you knew that and took that into account in the answer i don't know if you saw anybody doing that because that feels like very cool but it also feels like it might just not be possible or invaluable i mean for sure it's possible um but i i didn't see any specific

specifically around like what were you doing in the website but certainly customer support bots were like enriched with like customer data and the customer's previous context that would for sure was something that I could see and so I think

Forget the precise examples, but just like e-commerce, that was a really common thing. The customer's profile, recent things that they bought, their preferences. All of that stuff was pretty regular to pass in. But it does go a little bit, again, to what we were talking about with UX there, where

For RAG systems, yeah, it's like you want a little bit more power and flexibility as you would maybe with like a person to like either fast forward certain bits of the conversation or slow down at this bit or like, hey, let me send you a photo at this moment. Don't make me pick from like six bits of text that like apply to my thing. So,

So, yeah, and that kind of thing is still... People are still figuring out, like, the bigger picture and stuff. I imagine, like, once that's a bit more solidified, then you start getting people thinking a bit more about the UX. I'm not sure. I'm not...

Yeah, I'm not sure how long this stuff takes to like percolate through. Yeah. And a lot of times I can imagine that the end user doesn't realize that they have the option of sending a screenshot or something.

Because if the end user isn't prompt with, do you prefer to send me a screenshot in explaining what you're doing? Then you end up trying to explain it in text and it's a little clunky. And so, yeah, maybe it's just as easy as one of those six follow-up questions that you're given is send a screenshot with your problem. And so they go from there. But it's fascinating to think about that. I always am...

intrigued by the product journeys that people take and especially around friction that they may have that you as the creator of the product never think about.

Right. Because, you know, it in and out and, you know, oh, yeah, if you want to do that, you just click on this button and then there's the little drop down and you get exactly what you want. But the new user is just clicking around everywhere, trying to find what they're trying to do. And they may not even really know exactly what they want to do. And so they're exploring in one way, but they're also trying to figure out if this tool is going to be useful for them. And yeah.

that's where it would be awesome to have that little buddy that pops up and it's like i see you're just randomly clicking around can i help you find what you need or i've seen you've done these five actions you know what else is really cool here's a hidden trick and so it suggests like ways to become a better user of the product and that is yeah that's probably very few and far between yeah and you see people experimenting with this i mean like

OpenAI and who else has this Microsoft with their like we watch everything that you do all day all the time like you see like people trying to make that kind of experience work but yeah it's clear it's very early days and yeah just also just thinking about like one thing we haven't talked about yet is like evaluation and like how you then like

create your data flywheel and all of these kinds of things. But you can imagine that like once you get to like

24 hours like monitoring the screen and interactions and stuff like evaluations becomes um yeah you need to really really think about how you do that and you probably need like llms or multi-mental lms like in the loop they're picking out interesting examples or whatever and but yeah at that point you're like evaluating like the sum total of human behavior and yeah

Exactly. Was there, speaking of evaluation, did you see common patterns that were arising there? I'm always interested because there's a lot of hype around LLMs as a judge, but actual practice of using LLMs as a judge, I'm not sure how many people are actually doing it.

I mean, there are certain people who tried it and they had mixed results and they found, you know, they ran into a lot of the common failure patterns of like using LLMs and like getting them to output like number scores. And they found this was super unreliable. So then they had LLMs. You get like qualitative results.

responses or you have LLMs which are like flagging certain examples as and you see obviously LLMs being used in like synthetic data maybe to get you over a certain hump that you have in terms of like building out your functionality but yeah probably a lot less of that and

Yeah, the common pattern is like someone builds the POC demo, whatever, at that point, like evals are not really part of anything. Then they start thinking about like, well, how do we present this like to a wider audience, whether it's a

the public or internal facing for the company and then that point hopefully people start like thinking about evals and like what are the real like failure scenarios that we really can't afford to happen here so you have some very basic stuff come in and then yeah depending on the size of the company maybe they even stop there but then yeah the bigger ones ones with a bit more money to invest in a kind of

in a project like this, which doesn't necessarily impact like the bottom line, then yeah, then they're actually like gathering the data, iterating on the process and so on. But yeah, it's more often than not, like people are,

People are often building these LLM projects, particularly the internal facing ones, so that they don't fall behind on the technology almost. They want exposure to it. And it's kind of their way of...

yeah, just being exposed to what's happening with LLMs. And we did this at ZenML as well. We built, or I built a Slack chat support bot when a year, year and a half ago or something like that. And that was mainly just like, yeah, we want to understand how people are using this stuff. How hard is it? What are the failure patterns and

and so on. And so there's maybe different incentives played there versus when you're doing something which is a little bit more like profit results driven. You're, yeah, you're more willing to accept certain risks or to do certain things in a certain way. Well, you said one of my favorite words, which is ROI and profit.

Did you get a lot of insights from all these blogs? Did many of them talk about it? Because as you mentioned before, it was primarily engineers who are writing about the engineers' problems that they're solving. So I would assume that they're not necessarily saying, this netted us a huge ROI. But maybe there are some that you saw that actually...

took that into account or they talked about how it was or wasn't viable depending on the scale that they're looking at not really apart from like the whatever a few like clarna or whatever where they actually did back to clarna did did put like an actual number big big number on it um and i think it's also a bit the ones that did talk about like um

you know, a big spike in users or, you know, a renewed interest or a product which was, you know, dying or whatever that was revived by the introduction of RAG or like a better, I don't know, LLM-based search system or something like this. It's really unclear to me whether this is like something that would apply for like the long term.

um because um yeah a lot of people go to places to try stuff out or to try out some new functionality or new technology um and uh yeah for a lot of stuff it it's clear like this this is in flux and it'll be replaced by something later on i mean my favorite thing at the moment is or in this in this kind of

uh areas like notebook lm like very popular um it's kind of a cool use case um uh you know they just launched this like thing where you can like participate in the podcast yourself and if you saw this and tried it out it's quite kind of cool it's fun to use um will be will we still be using notebook lm or like playing around with this in three years i don't know probably not hopefully maybe there's like

a better kind of like meta tool around like, yeah, podcasting or discussions or study, depending on what your angle is in, in that, in that stuff. So a lot of the use cases where people get like really good results and like can give specific numbers about like the number of customers they have or,

how people's user journeys were improved by this. I don't know whether this stuff is for the long term. It's just like we release something and we have a lot of users. Yeah. The other thing I think about in this space is like all of these agent platforms. Yeah. Probably like half or most of them won't exist in five years from now. But like some of them have like a ton of users and people playing around with them. And

Yeah. Particularly for the smaller use cases, people do interesting things which are really useful for them. But yeah, I have to think that a lot of this stuff will...

will be merged or amalgamated or turned into something else over time. And when you say agent platforms, you mean to help folks build agents, like a framework that can help

you building agents or it's actual agents that you can go and use? Well, some of them have marketplaces for agents, which seems to be the thing which makes them money. But yeah, it's kind of these GUI interfaces, web interfaces where you can connect lines of you do this and if not, go to this agent and that kind of stuff. They're quite popular and some of them are making real money. And

and maybe it's a little bit like um uh what's that kind of create your own avatar replica um this this kind of um service where you can create like an ai avatar and like it's personalized and so on which is not like a mass-used thing but like it it's popular it makes a lot of money maybe two or three of those will continue to exist like for for a long time but um yeah i have

have to feel like a lot of these will just drop off or like people will move on to the next thing. Yeah, that's fascinating because the one thing that is clear is that you have to have a lot of patience to get something working on these build your own agents. And the debugging is hard because you don't know if you are...

not prompting it correctly, or if you're not asking it to do narrow enough of a task, or if the flow that you've set up is the problem. So if you're not willing to spend the time to create that flow, then it's quite difficult. However, I have seen there's a lot of

common use cases that are coming out of the box with those little build your own agent things and so I I think I signed up for one the other day and it said what are you what do you like identify as and so I said more let's see what they have for product marketing or SEO I think there was just all different marketing use cases and

And then it gave me a lot of flows that they had set up. And you just add in simple things like, oh, here's my website. Here's the keywords that I'm trying to rank for or whatever. And they make that easier on you to try and reduce that friction.

Yeah. And I mean, that works great for like a well-defined domain where we already know like the things which are important for SEO and the tasks or whatever. But when you, you know, a lot of the use cases where people talk about agents is like it's in the area of like research where, yeah, that's a little bit harder. Like if you knew what the problem was and how to get to it, you wouldn't need the agent probably. Yeah. Yeah.

Yeah. Fascinating, man. So as you were putting together all these different blogs and different sources into the database, did you find any sources that were consistently just giving top notch material and publishing absolute quality material?

I mean, some of the ones which are well-known, the Netflix tech blog, DoorDash has a really good one. Honeycomb produced consistently stuff. Weights and Biases, since they started their kind of support chatbot, I think they've done 10 different like deep dive blogs, technical styles. Yeah, there were a lot. Some of them...

Yeah, and then the rest were just like a ton of like random blogs on companies where, yeah, maybe they haven't written anything previously or the new companies or, yeah, that was a bit... Finding them was...

um hard um i use this really great um search embeddings based search engine called exa.ai um which um yeah you put in like some other blog and you say find other blogs like this um and because it's embeddings based you get really great results uh would recommend that um

Yeah, so they were all of the ones that you'd like, you know, already for like having great technical teams and then a ton of like just yeah, ones you have to like

hope that they're posting somewhere in your social network or whatever someone reshares them because it's hard to find yeah hard to find these the case studies and yeah obviously like MLOps community like definitely love all the videos like I hope we've done a service of like liberating some of the content out in text form out of the videos by like summarizing from the transcripts and stuff because yeah there's like

I think there was something like 100 or so videos referenced in the database. And maybe you don't have 100 hours to watch all of the videos. Or at least you can decide whether you want to watch it based off the summary. Is there any cool stuff that you want to do with the data visualization on YouTube?

this because it feels like with all the topics or with all the different filters you could create some fun data visualization whether it is okay chatbots and you have a whole embedding space the way that you're looking at it or you're looking at the different use cases and and that type of thing or are you done with it you're like all right i put it out there now i'm

going to get back to work and keep rocking at ZenML. I mean, we're continuing to maintain it and people submit use cases and articles and so on. So that's really great. That will only grow. The one thing I really... Where do we submit? There's a link at the top of the database. It's just a form we fill in. Okay.

And we put out the dataset also as a Hugging Face dataset, so if people don't want to scrape our website in order to get all of the data, we've done that for you, just go to Hugging Face. But something I really wanted to do and I didn't have the time to implement it basically is to allow people to search tool-based by all of these use cases.

You want to see all of the companies that are using Lama Index for embeddings or all the people who are using Quadrant or Pinecone, like vector databases or whatever, and then see common use cases or whether it's failure or success patterns around particular tools and use cases. So yeah, it's a bit harder to...

to implement the extraction of the tools or at least to automate it in a reliable way. But that would be a useful thing. What I can promise is we're not going to have like a

chat with your llmops database like functionality on top or you can build it yourself if you want to download the hugging face too that's it next mlops community hack is on yeah we're gonna do that one that's so good oh man well yeah you've uh done some awesome stuff with it and i really appreciate you putting it together because it like i said is this one resource that i can come back to and continue to learn from

And so I hope that you keep updating it. And anybody out there that is doing anything cool, if you write about it, make sure to submit it to Alex and the ZenML team. This has been awesome. Thank you. Thank you.

Real LLM Success Stories: How They Actually Work // Alex Strick van Linschoten // #287 49:54 Share

MLOps.community

Deep Dive

Shownotes Transcript

Real LLM Success Stories: How They Actually Work // Alex Strick van Linschoten // #287