We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Machine Learning, AI Agents, and Autonomy // Egor Kraev // #282

2025/1/8

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Demetrios

Egor Kraev

Topics

Egor Kraev: 我认为大型语言模型最大的作用之一是作为非结构化数据之间的桥梁。过去的数据科学需要将所有内容转化为向量或矩阵才能开始工作,而现在主题分类可以直接使用文本描述,更容易理解和修改。我所见过的 LLM 的大部分生产应用是将非结构化的语言数据转化为结构化的数据。实际上在生产环境中,LLM 的大部分用例不是作为一个神奇的代理来完成所有事情,而是作为额外的乐高积木与其他积木结合使用。 Demetrios: LLM可以将非结构化的混乱数据转化为结构化数据。

Deep Dive

Chapters

Egor Kraev, principal AI scientist at Wise, shares his diverse background, from studying mathematics in Russia and the US to working with nonprofits in Africa and then transitioning to a career in finance and AI. He discusses his journey and his current focus on causal inference and AI applications in Fintech.

Egor's diverse background in mathematics, economics, and finance.
His experience working with nonprofits in Africa.
His transition to a career in AI and Fintech.
His current role as Principal AI Scientist at Wise.

Shownotes Transcript

Translations:

中文

So, hi, I'm Yegor Krayev. I am now principal AI scientist in WISE until recently heading up and building up its AI team. Parallel to that, I'm fortunate to also be able to work on my startup on causal inference in marketing, a code name causal tune. Instead of coffee, I prefer green tea, things like long gin, for example.

There are certain things where once you know how the good stuff tastes, you cannot really drink the bad stuff anymore. Like wine, whiskey, and green tea is certainly one of them.

Welcome back to the ML Ops Community Podcast. I'm your host, Demetrios. And today we got into the traditional ML, the AI world of things, and also the AI agents world of things. And Igor has built packages and open source products for each one of those. It was really cool to dig into what exactly he's done with the different ones, be it for

fraud detection or segmentation and A-B testing in emails or LLMs as being just one part of your DAG and how it's really useful to think of an LLM as something

taking unstructured messy data and turning it into structured data. And lastly, the framework Motley Crue that he created around AI agents so that you can use various AI agent tools or frameworks to

If you want to, you don't have to be locked into just Langchain or Langraph or Lama Index or Crew AI. Motley Crue allows you to have a Motley Crue of agents and leverage what's best about each one of those.

Let's jump into this conversation now. And for those who are on Spotify, I've got a nice little song recommendation for you coming right up, which is from my new favorite playlist, Big Desk Energy. And this song is called Smeds by Steve Cardigan. Wait for it. Here we go.

That saxophone is just too good. All right, folks, I'll catch you on the other side. Have a very happy holidays and a wonderful new year. Let's get into this conversation. I think we spoke.

two or three times in the past year and we didn't record any of those conversations and now I'm glad that we are finally recording this conversation. So before we get into any of the technical stuff which I want to talk about because as I mentioned many times when we spoke I'm a happy user of WISE. I love the product. I know you're leading all kinds of AI initiatives at WISE.

What's up with the pirate stuff? That's what I really want to know. I am one of the founders of the Swiss Pirate Party. What does that even mean? Well, it means whatever people make it mean, of course. For me, what it means is that I think the balance between copyright and public access is way out of line in pretty much all Western countries, or all countries I'm aware of, really.

Because being able to control anything, information that you created is not a God-given right, right? It's a monopoly created by government. And therefore, that monopoly must serve the public good. And because a monopoly always destroys value, so it's only worthwhile if it also adds value. And now, for example, the thriving open source products, which we all know from Linux to countless things,

show that it's not necessary in all cases to have copyright protection for good things to come into being. Sometimes it is, sometimes it isn't. And so what for me, what the pirate parties are striving to do is to shift the balance a little bit. Wow. Okay, so I was way off base when I thought it was like you all were dressing up as pirates, maybe outside of Halloween and having parties.

When I saw that written, I thought, oh man, he likes to wear patches over his eyes. Oh, no, it's the other kind of pirates. It's a classic example of a concept hijack. Yeah, great.

You did well. You had me very misdirected. So the other piece that is cool to talk about before we jump into WISE and everything you're doing there is the work that you've done over the years before you hit WISE. And I know you were in Africa, you were in Ghana for a little bit, right? And yeah, can you explain what you were doing there?

Oh, that was a wonderful story. I was a student in the U.S. and being a young, passionate student, I got involved in the anti-globalization protests against IMF and the World Bank doing bad things or what the organizers said were bad things in all sorts of developing countries.

And then as part of those protests, it was all very nice, very civilized. In fact, I was reminded of Umberto Eco's descriptions in The Name of the Rose about the heretics. It's like a carnival. It was this wonderful, wonderful thing. And then as part of this, some of us were invited to the IMF where the people from the IMF were explaining to us what it is they actually do and they're not actually evil and all that.

And after that, just that very night, there was a party at one of the nonprofits that I was hanging out with. And there I was telling the lady who was organizing the party and who had the nonprofit, like I was telling her, like, this IMF guy took such a horrible job of explaining what they did.

Like I could have done a better job of explaining what they do, clearer and shorter and anything. And then she said to me, well, there's somebody I'd like you to meet. And then she introduced me to Charles Abugre, who was a big guy in the nonprofit scene. So he had a startup in Ghana doing all sorts of public good stuff.

And then he invited me over for a week, first of all. And then we got a loan. And then I ended up spending at least half of my PhD there doing all kinds of economic research for that nonprofit. What kind of economic research was it? It was quite mundane, actually. So relationships between inflation and income distribution, inequality, that sort of thing.

But what made me then move on, change careers, is the realization that it doesn't really matter what research I was doing. Because the name of the game is just by having somebody who does any kind of credible research move to the nonprofit at a different league so they were invited to different tables and could take part in different conversations on the back of that.

But the content itself was irrelevant or largely irrelevant as long as it was largely pointing in the right direction that the nonprofit was aligning with. And once I realized that and also got tired of being poor, which is part of working at nonprofits in Africa, I changed careers and went to an investment bank. So you...

When on the exact opposite side, you were like, you know, that guy from the IMF, he actually, that stuff they're doing wasn't so bad. Maybe I should try my hand at finance. Yeah, you know, if you can't beat him, join him. Yeah. Incredible. Well, I want to say a few things that I've jotted down from our conversations, and I would love for you to elaborate on them. Because every time that I talk to you, I feel like you have a ton of hot takes. And you are...

knee deep in so many different areas of AI and ML from traditional ML all the way to LLMs and AI, quote unquote, all the way to AI agents. And so one thing that you said to me that stuck with me was AI is a bridge from unstructured to structured. Can you explain what you mean by that?

Well, I would rather say that one of the biggest roles for large language models is being a bridge between unstructured and unstructured data. Because if you think about the way we did data science only two years ago, is you first had to make everything into a vector or a matrix, and then you could begin working with it.

So when you did topic decomposition, then topics were just areas in vector space, really. And then you had a hell of a job explaining what they even meant, and you had no hope in hell for a normal human to modify them. Whereas now, if you do topic classification, then topics descriptions are the topics, because the LM can work with the draw text directly. Yeah.

And at least three quarters of production implications for LLMs that I've seen are just the LLM converting incoming fluffy language data into something structured. Is it a customer complaint? What is the rate on this contract, the late charges percentage rate? In goes a contract, out goes a number.

And the vast majority of the applications that I've actually seen are just that. So taking a lot of messy data of the unstructured and then trying to figure out some kind of a system so that that's the input and the output is something structured. Exactly.

And in fact, I think the vast majority of use cases for LLMs that actually work in production are not just an income as the magic LLM, income as the magic agent and does everything. But rather you take the LLM as an additional Lego brick, in addition to all the other bricks you have, and you combine them. And that's how you get out the value. Okay. Now, do you think that it's easier to explain

or champion for AI now that it feels like it's more simple to grasp like that than when you were doing ML two years ago and you had to try to tell someone, oh, well, we're going to vectorize this and then we're going to matrices and all of that jargon. Hopefully you weren't doing that when you were explaining it to leadership or presenting different use cases. But now you don't really have to do that, right? You can speak at a different level

Maybe, but really the problem is never the technology or even being able to explain the technology. Because if you know what you're doing, then you're able to explain it. But there seems just, the biggest blockers are always of two kinds. A, organizational structure, but this can be broken down with sufficient will from above at least. And the other one is the invisible walls in people's heads.

Because there's a fun thing I've observed with a variety of technologies, from bringing FX swaps to Wise Treasury to now LLMs in customer support, a bunch of others, is that even if you have a team which has an existing workflow that kind of works for them, and you bring a new technology to them, which clearly adds value, but they have never, which they have never seen before. And then from the moment you start trying to educate them about it,

To the moment it becomes just another thing in commonplace. Minimum of two years. Wow. It just doesn't matter how hard you try. It just takes time for people's heads to adjust around this thing along with all the other stuff, around all the other problems they have. Two years.

Yeah, and to build those habits, to build those muscles of I'm going to use this instead of my traditional workflow, that makes sense. So if you budget for two years, then you set your expectations at a realistic level. And if it happens before then, maybe you get lucky. Yes, and generally it won't. You can have the prototype out in two weeks, but for people to actually adjust to the fact that it exists, this is how it works, it actually is good for you, it's not scary, two years.

Yeah. And if it's more than that, it's time to start looking for a new job. Perhaps, perhaps. Thankfully, Wise is actually one of the more agile places. So I have never had good ideas blocked here. Well, there's plenty of ways that you're using ML and AI at Wise from the traditional fraud detection, because it is a financial services company or it's a financial...

FinTech company, I think is what you would categorize it as, right? And for those who don't know, it makes it really easy. And the reason that I love using it is because I can have money in the US and I live in Europe and

And so it's really easy for me to move that money around and not have to pay exorbitant fees from traditional banks. And I can just juxtapose the differences of when I would come to Europe when I was like 19 and 20. And so Wise kind of cut that all out and made it really easy. So coming back to the AI and ML,

that you're using at Wise. Since it is finance, I imagine there's a lot of, A, fraud detection, and then B, maybe some, are you doing loans? I don't think you give loans, do you? No, that we don't. But there is a lot of things. So absolutely, of course, fraud detection, anti-money laundering, the whole wide, wide area of fink crime is probably the oldest because it's so clearly beneficial.

And that's classic ML. So white table of data, XGBoost, high-preference tuning, the whole thing. And that's also the area that our PR department is not very fond of us telling my details about. And then there's also, understandably, for good reasons,

The other one is treasury. And treasury is what you call a trading desk in banks because the flows that people want us to transfer are always balanced. And so you have to go to the interbank market in one way or the other and source the necessary currency and then manage the risk of holding currencies. Because one way of looking at wise is actually as a market maker to the masses.

We always give a bid and ask price for any currency pair. So effectively, we are a kind of market maker, but a very unusual one because we market maker to the masses rather than to other big financial institutions. And we try to keep our spreads as tight as possible as opposed to the banks who try to see how much they can get away with.

Yeah. But so treasury is a lot of machine learning, trading estimation of flows to make sure we have cash in place. For example, did you know that if you try to withdraw money in Sri Lanka,

or places like with the currency control places. Actually, Wise has to send the dollars to our partner bank the day before so they can arrive overnight and wait there safely at the partner bank. So then when somebody wants to withdraw money, Wise can ask the partner bank to please charge us the dollars and give you the local currency. So there is a lot going under the hood for this quasi-instance experience.

And then my final big area, of course, or my final favorite area, of course, it's not by far not the only one, is marketing and causal inference and all those fun and games. Yeah. Well, and you did kind of mention before too, support, right? Absolutely. I think support is comparatively young because before LLMs happened, it was quite hard to do because so much of the data is text-based.

But now we have a great data science team in place there, and they already have some things in production and more is on the way. So another thing that you said that stuck with me and I want you to elaborate on is LLMs or AI shouldn't be necessarily looked at as the solution. It should be looked at more as one step to get to the solution almost. And the way that I understood it is we should just look at it as another step in a dag.

Yeah, 100%. I also, I find it very silly when people ask ChatGPT to add 2 and 2 and ChatGPT tells them it's 5. And they're like, oh, AI has failed. The whole thing is like, you should think about this as one big Lego set. And now you have a couple of extra blocks that you couldn't do before. Maybe the blocks can blink or emit sounds or whatever it is they do. But there's just one more addition to your Lego set.

And it gets its power from being combined with all the others. And I'm sure it won't be the last cool thing either. Because I still remember when RNNs arrived 10 years ago, it was exactly the same kind of hype wave or RNNs and neural networks will solve machine learning for us. And then that kind of settles down for eight years or so. There's nothing and there's another big thing comes along. I'm sure there'll be another one. Now, which ways are you seeing...

both LLMs or foundational models plus other traditional ML or just regular heuristics being used together? Well, so the most obvious one is just LLMs being used in old school pipelines.

Or as I mentioned, LLMs just transform data. For example, they give you a score of does this customer email look like a complaint? Like a complaint based on the text, what is the likelihood? And then you add that score to a bunch of other data points you might have and do old school machine learning to classify the email. And at this point, this is the easiest thing to do, the most controllable, the safest. And so also the most production-ready.

For the other ones, I guess the fun part about LLMs is that they humanize machine learning in so many ways. So they make this interface between humans and machine learning fluid. For example, in customer support bots, right? Before you had this text block and then you had to classify it. What is the customer asking about? Is it an old school model, vector spaces, yada, yada.

Now you can just ask an LLM, does this message contain enough information to understand what the customer wants? If not, what else should I ask the customer? And there's a little prompt. And then you can ask the customer for more information and they'll give it to you. So you unlock this whole interactive potential that just wasn't there. Yeah.

And I want to talk about causal inference technology and what you're doing there, because you said marketing and all of that fun stuff is really one of your passions. Give me the lay of the land on what you're doing, how you're doing it and what it looks like with ML, AI. I know it's it's A-B testing, right? But what else is going on there? Well, it's not just A-B testing.

But the trick is that the trick is estimating causal impacts. And that is hard. It's not like regular machine learning because in regular machine learning, for example, if you want to predict how much a customer will buy in the next month, then after a month you can see how much have they bought. So you have an observed true value and then you can rate your prediction or the multiple prediction variance you can rate which one was closest.

When you send, when you choose to send a customer email A versus email B, you really have no way of measuring that impact directly because you cannot send only email A and only email B to the same customer and compare. So that's quite hard. And then unsurprisingly, people have built models specifically for this case, causal inference models. So for example, Microsoft's economy library is wonderful.

But then again, they have half a dozen different models, each with its own hyperparameter universe, and absolutely no guidance about which one to use. So what we've done twice under my guidance is to find a way to score these. So even though you can't observe individual impacts, turns out if you have a whole population, like an A-B test, you can score these models out of sample.

And once you can score out of sample, you can do model selection, hyperparameter tuning, all those wonderful AutoML things. And so now you actually have, even though you can't observe them directly, you have a verified and selected estimate of impact for every single customer.

And then, so now you can, once you have that estimate, you can do fun things, right? You can do targeting, first of all. So you can send to the customer the email that is most likely to have them do the action you want them to do. Like click on that link. You can also do segmentation. You can see which parts, once you have that impact at customer level, you can now segment much cleaner.

Because how do people do A-B test segmentation now? They slice the whole A-B test sample in little chunks and try to see if there's significance. But that's very, very noisy. That doesn't work. This way you can. How many customers or people do you need to have on a list in order for this to actually have statistical relevance? So certainly as many as you have for a regular A-B test is enough. Usually I expect it to be even smaller.

Because now instead of treating in an A-B test the whole customer variability, you treat like noise, right? You average over it. All you want is an average. Whereas with all these models, you first model customer natural variability based on the customer features, what you know about them. Because after this, you can only model the impact on top of this. So now you actually treat customer variability in the customer behavior signal.

And so this is also, so we haven't tested that extensively, but I expect you actually need smaller sample sizes than for regular A/B tests. And is this what you're doing with Wise Pizza or is Wise Pizza a little different? Wise Pizza is related. So Wise Pizza is there for finding fun segments. It started with growth analysis. Supposing you have this data set with customer segments.

regions, the device they use, region they're in, customer from currency to currency, product they use, any number of dimensions you might want to have for your customers. And then, so you have a million micro segments. And now you want to find out my overall, say, revenues per customer went down by 2%, or went up by 10% from one quarter to the next. And you want to find out which simple explainable segments were driving this.

Oh. And that's what Wise Pizza does. My growth rate went down by 1% from one quarter to the next. What were the main customer segments? Explain in simple terms in terms of those dimensions like Diego that drove it. And you could also apply it in causal tune results. So you could also apply it on causal inference results, but you don't have to. Do you always find...

segments that match up and that you understand, oh, it's, it's these people with these different features that drove whatever the question was, whether it's an increase or decrease in revenue and, or is it sometimes that it just is a little bit scattered, like, uh, or, or very weighted towards one or two people that don't really have anything in common, I guess, uh,

Well, why is we fortunate to, in this sense, fortunate to be a B2C company? So we have a large, large customer basis. And so sample size is not a problem. And this thing will always find something, right? Because that's machine learning. That's what machine learning models are built to do. You tell them to find something, they will find it.

And then you could always just split your sample in half and fit it on one half and then look at statistical significance on the other half. And if it's there, then it's really something. That is so cool. And how do you bring that? Because so let's say that you are answering some of these questions and you're finding different segments that are driving the reasons, right?

And you've split it in half and then you've recognized, okay, this seems to be true, right? Then what do you do with that information? Well, that's really where humans step in. In our case, we have, so what the rest of the world calls data scientists, the wires calls analysts. And what wires calls data scientists is more like an ML research engineer in the rest of the world.

And so we have those wonderful analysts who then go in and dig deep into those segments. And now they don't have to go through wandering around in pivot tables looking for something that's out of line. They can clearly see these are the drivers. Okay, now let's look at, say, iPhone users in Asia, what happened there. Or it's really the big tickets that drive the change. So let's look at the big tickets, what's going on there.

But that's, I think, also a bigger philosophical point. I have exactly zero fear to be replaced by algorithms. In fact, the more machine learning can do things that are currently manual labor by data scientists, the more data scientists can use them and therefore add even more value.

So the more labor-replacing stuff like KGPD happens, the more jobs there will be for data scientists, not the other way around. Fascinating. And so you're digging around in that data, you're trying to find insights, and then they're presented to leadership, and hopefully you're, whatever, creating a new campaign to try and target that specific segment, or you're running a discount, or you're doing something, some kind of action is taken to help with that.

Exactly. And also, of course, it wouldn't be 2024 if I wasn't currently working on an extended agenda decursion of this. Where you run this analysis, you come up with those segments, and then you see whether you can source some more data from wikis or chat or Slack channels or any kind of internal information you have that would help explain it or maybe be relevant. You still would never fully automate it.

but you could go a long way, but you would have an interactive tool where in dialogue with the machine, humans could do a much better job of telling the whole story, not just the numbers. So if I'm understanding you correctly, your analysts are digging through very specific segments of users who are driving some question that you have. And let's use this example of revenue went down

1% in the last quarter. So let's figure out who are the major reasons for that. And you find out that it's iPhone users in Asia or something along those lines. Analysts are digging through that data and then they can augment that data, that structured data with, oh, you know what happened? We raised the prices in Asia for XYZ transfers, or we did something. And

Those analysts would not know that unless they had the agent bringing them extra context. Absolutely. And this is not, well, so I don't want to overpromise. This is not something we have just yet. It's something I'm actively working on right now. So hopefully we'll have something rough out by end of year.

But that's a niche that I don't really see occupied right now. Because, like, let's ingest all your corporate data and put a chatbot on top kind of startups or a dime a dozen. Yeah. It's such a natural idea. But combining this with qualitative analysis to do storytelling that's linked to what the data tells you, that's not really something I've seen out there.

And so in a way, it's just it's plugging into all of your internal docs, your internal messaging systems, and it's trying to find relevant information to the data segment. Or how is the how are you interacting with the agent? Is it through a chatbot? Yes, I think so. It's just just it's just a chat dialogue.

So again, this is now very much in the design stage again, but it's very much just the dialogue. So you have the kind of report with graphs and text that the thing is generating. And then there is a chat panel to one side telling you where you can say, oh no, let's zoom into the segment or what's going on here. Or they make changes report this way. Because I think it has to be a dialogue because ultimately humans, they know what story they want to tell.

And this has to be a story that's supported by the data, but potentially there are many stories that could be supported by the data. And ultimately it's a human thing of those stories which are, which is the one you actually want to focus on this time. Yeah, I also like the UX sometimes when I'm playing around with different agent chatbots of it will guess something.

what question I want so I don't have to think as hard. And for me, it's a lot easier to just say, oh yeah, let's see what's going on there. Almost like when you click on a YouTube video because the title entices you. So the agent comes up with like four questions that maybe you might want to know. And I've seen this done where you ask an initial question and then there's the follow-up questions. That's a pretty common theme.

theme that is in agents these days or just in chatbots in general and it feels like it would be very cool if you're moving around in the data and you're asking questions and then the agent can suggest things like oh maybe you want to know about xyz and so uh

I like the way that you're going about that. And it also is very nice to augment the capabilities and augment the tools that an analyst or a data scientist has at their fingertips to be able to tell the story that they want to tell, as you were saying. When you're building out the agents, what have been things that have been difficult? I'm afraid it's the usual, right? Perquisites, data quality, etc.

and then convincing, and then working with engineering lead times for getting things into production. So those are, because a chatbot on its own is good as a toy. But for example, if you want a chatbot to answer customer queries, you need to have a hard, old-school underlying taxonomy of what the possible customer questions you could handle are.

And you have to distill that taxon and then build it and expose it to the agent. And then the actual agent part is easy. In fact, I think that's the general theme and the whole hype around agents. Like it's kind of fun, but I think agents, like using an agent in itself will be kind of, in very shortly, will be kind of like using a database.

Like, yeah, they're useful, they add value, but it's not a big deal. It's like, oh, we have a database and we have a multi-database application. How wonderful. It's just a pattern. And it's not a very hard pattern. All the stuff around you to make it work are hard. Agents as well are easy.

The database is not, for the majority of people, is not the most sexy of technologies. It's kind of old. Maybe it was back in the day, but now it is, like you said, it's just the database. That's the ultimate success, actually. Yeah. Yeah. So is the maturity of where it's at. So the thing that I'm wondering around agents is,

How are you making sure that the right data is not going to the wrong person or the wrong data is not going to the wrong person? I guess when it comes to, for example, with Google Docs, you have a very clear sharing strategy.

protocol. And I'm not worried that somebody is going to be able to see all my Google Docs. But with agents, I think it's really easy to not have that role based access. And so the agent can now get into any data that it wants to even in private Slack channels or whatever. And then you're surfacing information that that analyst maybe shouldn't have access to.

Well, that's an excellent point, actually. And to some, well, on one level, we don't really have that problem yet because we don't have, we certainly don't have customer facing agents which directly output LLMs. All this thing is human in the loop. But, and also internal tools, most of these things, they provide drafts which humans then can edit.

But in the long run, actually, the problem you address is not that difficult because the way you store things, you think about how these things work. You ingest snippets from everywhere and then those snippets and then via some kind of RAG, and I could go on about RAG for a day. I probably will be giving some talks on it soon. But you retrieve some of the snippets that might look relevant.

or that look like they might be relevant. And then you give those snippets in the prompt, they'll line along with your question. Now there is nothing easier than to attach to those snippets metadata where we show how they came from and then filter by that. So that's actually not a very hard problem, but it's an important one. So you're doing it within the metadata of the different chunks as opposed to in the database itself.

I'm not sure how else you would do it because there are so many different systems and the permissions are granular and entangled. So the best you can hope for is just flag, okay, this chunk came from here. Is this person entitled to this source? Yes or no? Yeah.

Yeah. Is this person in that channel? If it's a private channel and if not, then make sure not to include it. Exactly. Who could they read? Are they allowed to read this wiki page and so on and so forth? Yeah. Okay. So now tell me about Motley Crue. Motley Crue is fun.

Basically, it's the usual nerd thing. We had a vision with my partner about what agent frameworks should be like. I first started looking because I assumed there was something. There would be something there.

And I looked at a bunch of them, LandChain, Alamainix didn't have much back then, Crew.ai, a couple of the more prototypy ones that were around back then. I didn't find anyone that worked quite the way I wanted it to. And in particular, Crew.ai came closest. But then they really wanted to have their own walled gardens.

So when I submitted to them a PR that would allow to freely interoperate with any kind of land channel, Lama Index agents or any other agents, they ignored the PR. Oh, wow. At that point, I said, no, it's a Lego set, right? It's my favorite metaphor for this whole game. I want to mix and match. And so that was the starting point. And so now Motley Crue's core premise is you want to be able to mix and match any frameworks at all.

from Autogen, Lama Index, Langchain, Kurei, they all have their strengths and their weaknesses. So you should be able to use the best tool for the job without trying to pull people into a log garden. And then as we try to use it for things, which is the only way to make it good, we also started adding other patterns that I haven't seen anywhere. And my favorite one, for example, is a forest validation.

So what normally happens when you use an agent with tools? For example, you have an agent that generates Python code. Then you do want to make sure that the Python code is valid. Then the agent calls a tool which tries, for example, to run the code, and if there's any errors, it gives them back to the agent. And the hope is you tell the agent in the prompt that the agent will keep trying until the code is valid, until the tool says you're good.

However, there is no guarantee of doing this. And the LLs are famous for doing strange things sometimes. So basically you put the intent into the prompt and hope. Whereas with forced validation, what you say is that the agent is only allowed to return a result via the tool. And so the agent tries to call the tool with, say, Python code. If the code is fine, the tool returns the code.

And if the code is not fine, it returns the reasons why not to the agent, the agent tries again. And if the agent tries to return directly to the user, the agent gets told, no, you have to return by calling the tool try again. And this way you have guarantees because then if you get anything at all back from the agent, you know it's been validated.

And you see this pattern, people reinventing this pattern with like Clamor index workflows and line graph and whatnot. But I haven't really heard it described as a pattern elsewhere, strangely enough. So I like the force validation description because it is very clear on how the agent is not the one that is giving you it. It's the tool. And by way of the tool, if you're getting it,

then you know that it's been able to go through the tool and passed what it needs to go through. Exactly. Okay. So, and the other idea on Motley Crue, which is really cool is the, and it also makes sense on the name of,

You're using any framework that you want. So how does that even look? It's an abstraction above the lane graphs and the Lama indexes and autogens? So right now, first of all, all of our, well, we have wrappers for all the common agent types because every framework has their own agent parent class and we can wrap them all, which is necessary, for example, to make them all support the force validation pattern.

which we can. Well, certainly Lama Index and Langchain agents we can make support, we support the false validation pattern as well. They all support the runnable interface, so you can plug them into Langraph as well, because Langraph is actually cool. And so those are the main two bits. So you have wrappers, and those wrappers also inherit from Langchain's runnable.

Because you can say many things about Landshade, which contain the word flag planting and similar, not entirely nice words. But some of the things it has is really cool. And Landgraph is certainly one of them. Yeah. Yeah. Excellent. Well, that's fun. And that's fully open source. So anybody can go and play with it right now.

Exactly. And the commitment is to make it truly open source so it will never be used to try and upsell people into stuff or will never deliberately cripple it to make people pay a little more for the paid version. It's meant to be maxed and that's also why it's maxed with everything. Like the next fun thing that we'll have to look at now is Anthropix model context protocol. Oh, yeah. It looks really cool. So we'll really have to support that. Very cool. Not there yet, but coming soon.

And now what's causal tune? First of all, can we just take a moment to note that you've been pretty busy creating a lot of stuff over there. We've gone through Wise Pizza. We went through Motley Crue. Now we're going to talk about causal tune and also just everything in general that you're up to. Hats off to you all at the Wise ML and AI website.

section of that of that company um well first of all i've been very fortunate um to work at wise which is very cool with people doing other things in parallel outside of working hours um and so now i've um i've been able to officially lay down my people leadership responsibilities and work part-time at wise and devote the rest of my time to bringing up a startup

And causal tune is exactly the stuff I've been telling you about, causal inference segmentation. So the idea that in marketing, you can extract more value from an A-B test than just averages. In fact, you can observe the impact on every customer and use this for segmentation by impact, targeting by impact, and all those wonderful things which, until you've seen them done, you wouldn't even believe are possible.

So causal tune is the causal inference with marketing, but it's also the segmenting piece, because I know there was a few different parts that we talked about within that. Well, so causal tune is a library open sourced by WISE. So we are actually using it at WISE, by the way, successfully with the numbers. So we do see a distinct uptick in click-through rates and such like.

So causal tune is just library for causal inference. So it does two cool things. Cool thing number one, it allows you estimate customer level impacts. Cool thing number two, it allows you to estimate outcomes of hypothetical treatments. So suppose you did some kind of assignment using your causal tune targeting or just a random test even.

And then all of a sudden your head of marketing comes in and says, oh, no, you should have used these rules instead. And then instead of waiting for another month to run another test to test those rules, it's actually possible you're having, for example, a randomized trial result to compute with a very high degree of precision of what the outcome of any rule would have been on that sample.

So if you want to test ideas of simple targeting rules, you don't have to run a new test for each one. You just run a random test first, and then you use that data set and get the outcomes with confidence intervals of any other assignment you could try. And you don't actually need to even be sending emails. You just run the tests on old data? Well, you need to have one test.

You have one test, a fully randomized one will do perfectly. And then using this, you can estimate the, you can get the average outcome of any other assignment based on the same customer features that you had in the original test without running more tests. So why do people, for example, set aside randomized samples when they're targeting? It's just waste of sample size when you can just do it exposed by math. Yeah. Okay. Yeah.

And so now what you mentioned, you kind of hinted at it. You're building a startup with this. What does that look like? So very early stages. We hope to have something up in a couple of weeks because it's not just the product itself. So the technology itself has been tested at Wise. It works. It's been open sourced by Wise as well. So there is no obstacle to using it in a startup by anybody, including myself.

But now there's the whole machinery of a SaaS startup, hosting and user authentication and all the bits that go into a functioning system. Once we get those up, we really would love to have more people outside of Wise try this out and see the benefits. Because any kind of targeting where you know something about the users who you target,

is this is the technology with which you can squeeze as much information. We can do the targeting as well as it can be done given the features that you have. I actually think it's pretty close to optimal.

And how have you been using it? Is it just more for if you're running a sale or if you're running better rates or what are the can you give me an example of what you actually do? Because I'm not sure I fully get across the finish line on understanding like how that looks with an email that I get from WISE.

So in this particular case, the one that we have just got, where the test is just over and we've seen nice numbers come up, the WISE doesn't just offer transfers. It offers many, many nice things. You have assets, you have balances, you have the card, and you can use the card in different ways. And so we have between half a dozen and a dozen emails encouraging users to use a particular aspect of our offering.

what we internally call a product. And then the measure of success are people who actually go ahead and not just click on the email, but actually register and start using that particular aspect of offering within a certain time window of receiving the email. And then the question is, now that you have eight or 10 different encouragements for different facets of product to choose from, which one do you send to this user?

And that's something which you can try naive rules, but you'll get much better impact if you do it with this kind of technology. Okay. Okay. Yeah, that makes sense. So it's like, hey, this person generally... We'll use my use case because I feel like I've probably seen these emails before. And as I mentioned, I use Wwise. You're looking at me and you're saying...

generally transfer from American dollars to euros. And I'm using the savings function or the checking function right now. And you know what he might like is the credit card or the card, because I have thought about that. And so I definitely have. And if an email hit me in the right moment, I probably would. Yep.

end up getting it. Exactly. And that's exactly the kind of thing. And the nice thing is that these things are rather that now you can take into account many of the other things. Like different regions might work differently or people with different average transaction sizes might behave differently. You can take any kind of features into account. Just train your model and it tells you which is the thing that's most likely to produce a positive result. Very cool. Well,

Can we talk a little bit about organizational structures and what thoughts you have on those? Oh, absolutely. I would love to. In fact, that's the big revelation when I came to WISE was the degree of autonomy that not just a person but a team could have.

And when I first joined Wise Treasury, it was really a revelation to be in an organization where certainly all I saw around me is that nobody told anybody else what to do, not even your lead, which is strange when you hear about it, but it actually works. And so this idea of autonomous organizations is very close to my heart since.

Because it's not just about how people behave, it's not just about intent. It's also about how organizations are structured. For example, if you have a vertical IT organization, which everybody has to compete for, you can forget about autonomy. And if you have, or if you have an organization, if you have a team which is only regarded as a cost center, then this team will penny pinch to the detriment of the rest of our organization because there's certain centers. And so there is a personal and a structural component to it.

And I've actually been working on a book. Let's see how well I get.

about this. So reasonably far advanced. Hoping to bring it out early next year, but we'll see. And the fun part about the timeliness of this is that I think, and I'm not the only one thinking this, of course, that this way of AI will transform the way organizations are structured.

Right. If you think about it, a typical firm, sort of like a big firm, a structure around its information flows. And its information flows are hierarchical because that's, until recently, that's the only way humans knew how to deal with text data. You have middle managers making reports for their managers who are making reports for their managers. And then many Chinese whispers later, the CEO thinks they know what's going on. Yeah.

But now if you can shortcut this whole, this pyramid of Chinese whispers and just have AI look directly at all the raw data in the organization and if necessary, go out and ask questions over chat when it doesn't know enough and then give you the answer, the regularity. Maybe you don't need the hierarchy.

And so that's a very, very exciting thought and a certain space that I will try to be present in over the next years. You did say something that was a question that came up time and time again when we did the AI Agents in Production conference a few weeks back, which is how do you...

create an agent that understands when it does not have enough information. And that's a really hard problem, I think, like you just said, because AI, and you mentioned it before, like if you give AI and ML a task to find something, it's always going to find something. Whether or not it actually is the thing that you're looking for, it's always going to find something. So it's really hard to

have an agent understand that it does not have sufficient information to answer whatever the question is or whatever the report it's drawing up. It will, a lot of times that's when it defaults to hallucinating. That is true. But at the same time, I think there are solutions for this. You have to build, you have to build to this from the start. I mean, the simplest one is consensus.

If you run it several times, does it come up with the same answer or different answers? And if it does come up with different answers, then maybe it doesn't know.

And I'm 100%, but frankly, as far as this kind of technology goes, I'm happy to be a fast follower. I'm 100% sure that smart and well-paid teams at Google, OpenAI, et cetera, are even now working on smarter decoding methods to deal with this.

because it's such an obvious blind spot in LMS. And I'd rather wait another quarter to quarters and use that to build products. Yeah. And it was a bit of a tangent to the actual organizational structures conversation that I wanted to have, but you said it and my mind went that way, like a little dog chasing a ball. The organizational structures piece though, for me is interesting because

A, I can imagine a lot of folks that are listening, including myself, they think, well, how does anything get done? And how do real initiatives, who's crafting the high level vision or the goals if nobody is telling anybody else what to do? Well, I think there is still hierarchy. It's just that the hierarchy is not coercive.

It's just that the point of a lead is not telling people what to do. It's really, instead, it's the following. Firstly, it's telling people the story. Why is the team here? What is it there for in big terms? How does it fit in with the rest of the organization? That's task number one. And task number two is, and this is really enough for smart people to figure out what is most important for them to look at.

And the other half is then clearing out of the way obstacles that the team can deal with itself. So if there is an organizational problem somewhere else in the organization, some stupid process is stuck, or things happen, then just going in there, clearing this up. I think those are the two biggest things. And if a leader at each level does this, and especially the storytelling part, then you

The amount of freedom that is unleashed thereby, the amount of creativity is astounding. And it even has an economic effect. So when I was hiring for Wise data science team, it was not uncommon to people turn down high offers elsewhere because they came here for the autonomy. And the autonomy in practice means that I, if I was on your team, I can say, you know what, this is,

project, considering what our story is and why we're here on this team, I feel like the best way to help us move the needle on this project or on this team is to do this project. And I propose something and then it gets sign off or is it just

I run with it and I say, hey, I've already hacked together a little bit and now I need more resources from XYZ teams. I guess that's how seniority works in autonomous teams because seniority is measured by the amount of people that you're able to bring along with your idea.

So if you're really a junior and you don't really know what's important, then you're generally happy to take guidance. And then as you progress, you first, so you have to convince people around you that what you're proposing makes sense. At the very least, it makes enough sense for you to work on it. But then how you grow to be more senior is when you start convincing people that it's an important enough thing for the whole team to work on it.

And the more people you can engage and bring along with your storytelling, the more senior you are. And then eventually titles adjust. And what about this scenario? I am very excited about working on a project like, let's just say, Wise Pizza. Before Wise Pizza was a thing, I go around trying to rally the troops and

and it falls flat. People don't understand it. People aren't really interested in joining the cause. How long do I have to try and champion for that before it falls flat completely and I give up and I'm on to something new? Is it a matter of days, weeks, months, or is it just something that I put on the back burner and I never truly let the dream die?

Well, first of all, of course, everything I say, needless to say, are my opinions about how an autonomous organization, a fence around autonomy, would really work. And in any particular organization, including WISE, many people would disagree. That's how organizations and humans work. So I would say that since there is no objective standard of truth,

of usefulness for most people, like maybe marketing people, they increase revenue, but most people in our organization have no direct measurable impact on their own. Then you have to bring, you have to deliver enough value that makes sense to people around you.

But you'll have to have some slack to pursue things which might make no sense to them. But you have to have, the more record you have of delivering things that visibly add value in a way that makes sense to others, the more slack you will gradually get for doing things that might not make sense at once. But it's very much a relationship, it's a human thing. I guess that's the thing about autonomy, whether or not it's not

You can think about, like my favorite metaphor is that of a machine versus a forest, right? So many advice, so much advice on scaling an organization centers around making it like a machine. Every role is precisely described, people are replaceable, and then you can scale. Whereas autonomy-centric organizations are more like gardens.

where things grow the way they are and they adjust around each other, but it's really, oh, there is no two plants are alike, no two humans are alike. And then you have to tell stories to the people around you and what you're doing must make sense to them. And that's the main criteria. Are you familiar with permaculture and that whole movement for gardening?

Which one? Permaculture. I am not. Oh, what is it about? So it talks about how, and I'm by no means an expert, but from what I understand is, is not only, if you look at traditional farming and traditional gardening, you'll place all the tomatoes in a line and, or maybe you'll have a whole field of corn and,

But this is more like, you know what goes really well with tomatoes is basil because it keeps the fruit flies away. And so you put tomato, one tomato plant and then one basil plant and one tomato plant. And so you're varying things and you put different plants together because they have a nice ecosystem or homeostasis together. And it feels like that is what you're talking about.

It's not only gardening and having each individual plant be an individual that grows in the way that it does, but if you can put two plants together that work very nicely together, you're going to get that combined effect and the outcome of both plants doing better. Oh, thank you for a lovely metaphor. In fact, this is perfect because also in terms of autonomy,

Functional verticals are the kiss of death. So a centralized IT reporting line which overrules product centric reporting lines is the kiss of death for any kind of autonomy. It's exactly this. You have to mix different specialties and proportions that allow them to do the thing that they have to do and without having any kind of vertical priorities getting in the way of that. Brilliant. ♪

Machine Learning, AI Agents, and Autonomy // Egor Kraev // #282 01:05:20 Share

MLOps.community

Deep Dive

Shownotes Transcript

Machine Learning, AI Agents, and Autonomy // Egor Kraev // #282