We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Agents of Innovation: AI-Powered Product Ideation with Synthetic Consumer Testing // Luca Fiaschi // #306

2025/4/15

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Luca Fiaschi

Topics

Luca Fiaschi: 我拥有超过15年的AI、数据科学和分析领域的领导经验，曾在HelloFresh、Stitch Fix和Rocket Internet等公司担任要职。我目前是PyMC Labs的合伙人，致力于将贝叶斯和因果推理技术以及生成式AI应用于财富500强公司。在HelloFresh，我领导团队建立了数据平台，并实施了推荐引擎和预测模型等，极大地促进了公司业务发展。准确的预测在生鲜食品电商领域至关重要，因为错误的预测会对业务造成严重损害。在Stitch Fix，我面临的是库存管理的挑战，因为其库存大部分时间都在运输途中，这要求对库存预测非常准确。我发现贝叶斯模型非常适合高风险场景下的预测，因为它具有可解释性和良好的置信区间。我目前正在研究如何利用AI技术增强数据分析和数据科学工作流程，解决数据团队规模化和利益相关者沟通跟进的难题。我们正在构建一个基于图的应用，其中包含多个代理，分别负责数据质量控制、洞察分析、模型构建和预测等任务。我们还开发了名为“合成消费者”的虚拟消费者模型，用于快速进行产品概念测试和用户研究。我们可以利用公司过去的用户研究数据来微调LLM模型，从而创建更准确的合成消费者模型。要找到高价值的用例，需要深入了解业务模型，找出对收入和利润影响最大的变量，并依靠团队的力量来发现新的机会。我鼓励数据团队积极主动地寻找新的业务机会，并将其作为关键绩效指标。选择关注哪些用例取决于公司的发展阶段和战略重点。我通常会通过与公司高管沟通、分析数据、研究消费者调研报告以及利用外部工具（如OpenAI的Deep Research）来深入了解业务模型。

Deep Dive

Shownotes Transcript

Translations:

中文

Luca Fiasca. I'm a partner at IMC Labs. I take my coffee, I just drink espresso. I'm an Italian hardcore. Espresso is my way to go. Letters make words, words make sentences, and sentences make paragraphs. Welcome back to another MLOps Community Podcast. We're getting into it today, talking about how...

GenAI can help the world of traditional ML with Luca. We also go deep onto leading data teams at the end. Hope you enjoy. And a huge shout out to folks that have been joining the MLOps community because I've got the best music recommendations now from you all since I set up

one of the triggers to ask for music recommendations. It is so cool to see some of these suggestions that I'm getting back, and I'm going to play one for you today. This recommendation that I'm going to play for you today is by the band Boards of Canada. So this is like phone a friend, basically. I get to phone a friend, and man, you've been doing a lot of cool stuff.

in your career we should probably just go over a bit of your journey at hello fresh hello fresh hello fresh and that stick stitch fix both of those are not hard words to say unless you are me today and so

Which one came first? HelloFresh came first and even before that I used to work with this large VC in Europe called Rocket Internet.

It really moved me out of my academic track into the startup world. And Rocket was a lot of fun. I built some of the largest e-commerce in the world outside of the US. And it literally was empowering young people like me to solve big, big problems.

putting a lot of money behind them and behind bold ideas. And there I got to work with companies like Zalando, Delivery Hero, a lot of these, you know, like pretty... Huge names in Europe. Yeah. And I got to work with them when they were relatively small, like 20 people in the room. And also when they became like these huge giants that they are today. So HelloFresh was one of the company of the Rick Rocket Internet Group. And at that time, the

The founder asked me to move to the U.S. and build a data team for the U.S. component of the business. It was one of the few rocket companies operating in the U.S. as well. And when I arrived in the U.S., it was like four people working in data. And when I left, 35. So within four years tenure there, it was a tremendous ride. It was also the years of the pandemic. So the business was booming.

And we did a lot of crazy and interesting stuff by implementing great data platform, great analytics processes, recommendation engines, forecasting models, and so on and so forth. Yeah. The interesting thing about this is that the ML...

aspect of the business is integral to whether or not that business succeeds or fails because it is such a hard thing to do when you have fresh produce and you need to know how much of this am I going to need tomorrow how much am I going to need next week and if you get that wrong that can really damage the business and if you consistently get it wrong you don't have a business you

Yeah, I can tell you stories of missed forecasts and people having to run at all fours around the area with the company credit card.

Wow. Try to buy as much Lime as possible to fulfill our clients' demand. It is really interesting where small mistakes in your prediction can have a big impact on the company bottom line. And definitely the forecasting aspects at Alufresh was extremely interesting and complex to actually solve. Because the product has so many, in the box you ship so many ingredients. So think about Amazon as a complex business model.

but you ship mostly items that are not perishable and in quantities that are kind of like pretty defined. In like HelloFresh, every box, I remember every recipe contains on average, every box contains three recipes and every recipe has like seven, eight perishable items that you need to package accurately and ship to the users. And then you add the recommender systems in with that and...

So I imagine the recommender system chops that you had from there translated nicely to stitch fix because that's another similar thing. Now you don't have to worry about perishables as much, I guess, because clothes only go out of style. They don't go bad. Right. And some clothes never go out of style. So if you're lucky or if you're just oblivious to fashion like I am, then you're good.

That's right. I think Stitch Fix is an interesting, different set of problems. Actually, Stitch Fix has a very interesting business model where when you ask, where is your inventory? 50% of your inventory is most of the time at FedEx. It's always in transit between the user or the final customers and the fulfillment centers.

And you want to keep it like that because you want to be a,

as efficient as possible with the fulfillment space. And so Stitchworks had this very interesting business model where you need to be very accurate also on your forecast to always have available inventory and relevant inventory to present to your customers. Otherwise, you can make the best recommendation, but if the inventory is not there, you can't fulfill it. And so there is this

very tight and nice interrelationship between the two. Sometimes the forecast or the recommendation engine can be super accurate, but the problem is that the items you retrieve are out of stock. So the interplay between the two is extremely interesting and complex. You have gone down the rabbit hole and I know that you've talked a lot about Bayesian theory and working with Bayesian algorithms. Can you tell me a bit more about that?

Yeah, so Bayesian algorithms is something that I came to from the sideways, meaning

I have a background in ML engineering and I have a background in things like, you know, traditional ML and deep learning and so on. However, a lot of problems that I try to solve is actually to convince stakeholders of the reliability of the forecasts we can put up for them. And it's super complicated because the stakeholders ask you, what's the logic on the model?

What is the prediction interval for the model?

what features are you based on? And you can do that with a traditional ML and there are techniques for doing these exploratory variables and so on, but they're intrinsically hard to explain and the confidence interval you get out sometimes, they're not well calibrated. And so Bayesian models is a solution for that because it gives you two things out of the box, which is interpretability,

and confidence intervals, which is the ability to add easily constraints to your forecast. For example, if you know that the output needs to be positive, you can constrain that very easily in a Bayesian model. So for things like high-stakes scenario where you need to do like invest, HelloFresh had spent $800 million in marketing budgets across Europe

40, 50 media channels every year, or at least when I was there. So that's a high-stakes scenario. So you really want to understand how the model works, what are the causal relationships between the variables, not just statistical relationships between the variables. And the Bayesian model is perfect for doing something like that. When you have talked to the finance stakeholders and you see it, CMO and CFO, you can really motivate why you actually put up

a certain forecast. And yeah, and therefore we solved those with those tools because they were the right tools to solve the specific problem. And you've kind of taken that and run with it because now you're still doing stuff with it, right?

Right. So that actually solved what I want to solve right now. So with the models we built, very sophisticated models for marketing allocations that through the years, you know, they've been published, then companies use PyMC Labs as companies that has been developing

open source library or supporting the person's library by MC, which is a statistical libraries that's very much used in the industry and is perfect for building these type of models. Now, the key idea there is that in my career, always at this problem that you're overshort of very, very small people to hire.

And the other problem is, especially when you're running analytics teams, what really kills you is not the delivery of the insights, but it's all the follow-up that comes from stakeholders. So how do you solve that specific problem, augmenting your data analytics workflow and data scientist workflow with AI? So the idea is that you can take these models and talk to them.

chat to them using LLMs. And basically you can do two things. One is to simplify the process of building these models by having S-BORM of agents. Some are modelers that actually put together the relationship between the variable that you need and write the code that you need. Others are

quality control agents that control the quality of your incoming data. And these accelerate the production of these models and coming out with them. And the other one is once the model has been built, well, you can ask questions of make a body scenario analysis. So if you make this specific forecast,

what could be a certain scenario that you haven't thought about, please right now answer to the questions, A, what happens if I drop my marketing budget $100 million in Google next day?

The CMO in the regional process scope, it wasn't really there, this question. You thought about this afterwards. No problem. I can give you the answer right away without having to involve my analyst because I know how to talk to the model and I don't need specialized expertise to rerun that code again. So I think that to me, it's a way of...

Solving this very hard problem I always had in my career is that it's so complicated to scale up data teams effectively and really augment the workflow of your data scientists and analysts using these agile and decay technology.

Yeah, I love this because you are combining the new world Gen AI capabilities with the traditional ML capabilities. And each one has their purpose and has their value. And so there's going to be times where you need to use traditional ML. And so being able to augment the capabilities and augment your understanding of how this works and your understanding

benefits from them are such an incredible superpower to have. So if I'm understanding you correctly, you're saying something like on the front end, when you're building the model, you're getting help from LLMs. Correct. And

Are you also getting features that are being suggested to use? You mentioned making sure that there is clean data or the data that comes in is good data. How are you doing that? Because that seems like a bit of an impossible job. These are very hard jobs, but these LLMs are surprisingly good. Maybe I should also prefix this. So where Bayesian models are particularly good

tailored towards this problem because the first things that you would do with the Bayesian models, you don't work in a scenario of like very large data sets with millions of features. So Bayesian models are really, really good where you have relatively small data sets. You have several features, but maybe in the area of 30, 40 features themselves. It's a small data application, very high tailored problem.

in a very high-stakes scenario, but relying on relatively small datasets. And the reason is why when you have small datasets, the prior of these Bayesian models help you with completing the missing gaps in the fact that you don't have enough data to draw full inference. So they're pretty particularly tailored to that. So that allows you to give

these data in the context of the LLMs often. And so in the LLMs, by having access to this data in the context of the LLMs, we realize, for example, things like, hey, there are missing values here. Hey, there is a trend in the data that's a little bit strange. And then when you actually prompt the LLM appropriately by telling the LLMs, hey, probably these are type of analysis that you need to actually

do for quality control and so on and so forth, it often comes out with some interesting insights about the data that allows you to then catch early specific problems. It's not perfect, certainly still developing the technology, but there is a series of quality control checks that you can prime the LLM to and they're going to come up even with follow-ups and further checks and they can execute that pretty quickly on this type of data set.

and the next LLM to the next agent for the next steps of the analysis. Yeah, and how are you operationalizing this? Is it some DAG that you have the LLM cleaning the data as one of the steps of the DAG? Right, so right now we are setting this up as a long graph application, and there is a full graph application

There is behind this and one step is a quality control agent, another step it's an insight agent that taking the data can actually draw some interesting plots of it just to explain what is in the data, in the main trends and the main insights, the relationship within the variables. Then there is a modeling agent downstream that actually builds the model on top of it

it, and there is forecasting agents that allows you to make inference with the model about the future, and the scenario planning agents that allows you to create scenario plans, optimized configurations for your variables, for your forecasts, and so on and so forth. And even some which is a big problem for analytics teams and time-consuming things, which is creating PDF and DAX out of the insights of the models of your analysis.

There is even, they are also building right now a PowerPoint agent that actually creates a deck with the file recommendations for the stakeholders so that they can have it at hand and redeem it. And how are you confident that

the data you're getting after it's gone through all of these different steps is high quality data. So at this stage, of course, there is some parts of automated checks you can make. For example, you can check for null values. You can check for things that are out of scale. For example, if you see that you're spending in a marketing channels and in the MMM example, if you're spending in a marketing channel, you know,

100, 200 million dollars in a week is probably a little bit too much. So there is something that's certainly out of scale and LLM will note that down.

Otherwise, you still rely on a workflow where the human is in the loop. It doesn't need to be a human which has specialized technical expertise in terms of coding because the LLM does the coding for you, but it needs to be a human that has business context good enough

so that can understand whether the data is the right data and the output of the entire workflow is something that's sensible and you can make a decision on reliably. That's fascinating to think about. The most important human to be in this loop is less of a data scientist and it's more of the

one who understands the business context so they can flag something when it looks a little off. Right. So in fact, who is the target customer of an application like this? It's like busy analytics and data science teams that want to augment the workflow of their analysts and data scientists without having them to write the code necessarily from scratch for some of these models or come up with semi-automated analysis.

And the other one is really business stakeholders that are technical enough to understand what's right and what's wrong, but have also like a deep business context that they can help them guide the analysis and guide the results of the models. Yeah, it reminds me of the...

SQL analyst agent that Process put into production recently. And they were talking about how they have this bell curve in a way where you have the very advanced SQL queries that you need to ask and you need to spend a lot of time digging into the data. And then on the other side of that, you have the do-it-yourself

And everything is going to be written by an LLM. So there's like a spectrum of how much LLM you're using. And on one side it is

only LLM, self-serve, and on the other side, it is no LLM because it's far too complex. And what they were mentioning was the majority of the lift is in the middle where it is something, like you said, it's augmenting those who already understand or have enough experience to be able to get the value from it, but they aren't

on either side of this spectrum where it's highly complex and you can't use an LLM or you don't know anything about it and you're relying 100% on an LLM. That's exactly right. And it's a great example that you make because the idea comes a little bit from that. And it's a development of the idea of SQL agents because when I look at that idea, I thought, oh, wow, that's great. But

It stops at descriptive statistics, which is the 101 of analytics. What if instead of just doing descriptive statistics and get SQL out and nice plots, you can actually do predictive analytics and mix forecasts and close the gaps between those advanced statistical models and the delivery that you need to give to the business in a very fast way.

It was inspirational, the work of what companies like Databricks are doing, for example, like agents like Genie that does the X2SQL completely integrated in the Databricks ecosystem and Data Lake. However, our idea was like, you shouldn't stop there. You should actually go the last mile now. Help automating or like supporting and augmenting the entire analytics and data science process.

And so now there's a lot of agents that you have working and you also have the human in the loop being able to oversee the agents. How are you doing evals, if at all? And that feels like it could get very...

sticky very quickly or it could just become a ton of work if you have all of these agents and you're trying to make sure that they are producing high quality data yeah they're still learning about what's their appropriate way of doing evils um there we have some reference workflows so we have some reference data and workflows and we do know that um

the agent's applications needs to come to certain conclusions. And we have a way of verifying that the agent applications, you know, doesn't get stuck or gets results and parameters for the models that are aligned with reference parameters for the models, because you can verify that even generating synthetic data, for example. And so we have a set of those that are in place. We have

and other telemetry that allows us to check what the queries of the users are and see whether the user gets stuck. But finding the precise way and the more systematic way of evaluating this type of applications is something that I don't think we have solved yet. And I don't know if the industry has entirely solved it. It's still kind of an open debate. Yeah, 100%. And you mentioned to me before, too, that...

You're leveraging this for product development and almost getting user research done. Can you talk to me about that use case? Yeah, it's a different use case of what I thought about, but it's kind of the same idea and in a bit of a different way. So the idea here is that you have an entity like a Bayesian model, a machine learning model, you want to took

to it like it was your friend and ask how you've been built, build me a forecast and so on and so forth. Now, there is another problem that often companies have is that they don't know that much their customers and product managers, business stakeholders, marketing people, designers, they wish they could talk to their customers and user every day to learn and get more insights about what they are doing, how they're using the products or why they're using the product.

And so another application we are building is some virtual representation of your consumer that we call synthetic consumers, which you can ask any type of question to. And the typical applications would be, hey, you are, I don't know, a CPG company, for example, and you are developing a new type of, I don't know, like, let's say a new type of toothpaste.

And you want to know whether that resonates with the user base and the consumer base. Well, the only way you'd have to do it right now, you need to do very expensive panel interviews with tons of people.

consumer research and it's very time consuming and expensive while you could have a swarm of other lands that are primed and prompted to behave like real people and you can just ask show them pictures and ask them questions of would you buy this product for a certain price points what are the features of this product that you like and there is research that have shown that

With the insert and constraints, this can be actually representative of the real population. So especially if you build this synthetic consumer in the right way, using the right type of data and so on and so forth. And we are still in the early stages of building this application for a Fortune 500 client. But that's the next type of things. And I think in the longer term...

this could become really an interesting technology. So imagine that you're a product manager and you're building a new website. Well, you probably want an operator, a synthetic consumer that browses through your website, tries to do specific actions, and then you want to ask questions. Oh, how did you like the button up there? Or did you find this confusing, this kind of user workflow? And why? And

And I think this could be extremely powerful to really close the loop between and design better UX and close the loop between, you know, product development and real users. Yeah, you can take that a step further too. And anyone that's developing software tools or infrastructure tools,

You can get synthetic data or just have an LLM use your software tool and tell you what they think about the API and how your specs, where it's confusing. And so that is a huge promise. I wonder how in line it is with real humans and how many insights, I guess at the end of the day,

Even if it's not totally in line with humans, like what real humans would be doing. If you can gather insights, that's what you need. And you can at least get a few iterations before you bring something to market or before you come out of stealth or whatever. And so this is a fascinating way of doing it. Now, the main question that I have is how are you making sure that the LLMs are properly used?

set up to be these consumer profiles are you prompting them are they fine-tuned to a certain person what does that look like and also what kind of models are you using yeah that's that's a very good question like i want to answer first your your first insult this can be fascinating so

The idea comes from a very personal, actually, experience. I'm a little bit of a timid person. I kind of like, I wish I could speak freely with and get connections with a lot of different people. And sometimes the applications I have and the discussions I have with LLMs, it's extremely deep. So that's why I wish I could talk to anything. I could talk to, you know,

a scientist or I could talk to a famous scientist like Richard Feynman or I could talk to a user of my final applications. I have no problem asking deep questions to them. That's why I'm so excited about building stuff like this. Now, the second part of your question is how you're actually building it and making sure that they're aligned with the entity you want to represent.

meaning the user of this application. So at the moment, we're exploring variations of prompting and that brings you or like a

let's say 70 60 there and the typical way you would do this is um by really prompting the lambo with things like hey you are um a black woman who's living in brooklyn uh you have a tech job your confidence about this you bought items of this specific product category recently you're

And that brings you there up to a certain level.

What you can also do, and we are experimenting with that, is that you can take these companies' past consumers' research and surveys that were answered in the past to supervise fine-tuning. And basically, they had already massive data sets in the past of product they've shown to consumers' research users, but they collected the demographics of those users and specific behavioral traits.

And you can use those to do supervised fine-tuning and then, you know, like create other LLMs that are specifically for that. Of course, you base these on open-source models to do supervised fine-tuning, like the LLAMA models, for example. Now, it's really complex and there are a lot of nuances of it. For example, you know that...

And most of these LLMs, they have specific political biases. For example, it has been proven that American LLMs, they are a little bit like left-leaning in general. And

What you can do, and we are also exploring, is techniques to remove these biases, to make sure that the base of a LLM you start with is actually impartial. And you can do that with also techniques like ablation, for example, which is a technique to remove specific biases from LLMs by removing and killing specific neurons within the network. So we are still at the beginning of it.

The basic applications bring you 60, 70% there, especially when you go in fine-grained subsegments of users. It gets the average and the bulk of the American populations right, but when you look and slice your data in subsegments of the users, it can be very awful. So we are looking into these techniques like supervised fine-tuning, ablation, to refine the model outputs, especially on subsegments.

Well, the idea of having someone go through your website too and looking at heat maps and all of that. One of the first things that I did when I was at a startup that kind of gave me a bit more freedom to play around with the product was we had a tool installed on our product called Full Story. And that would record everyone's sessions in the product.

And I would just sit there. I was mesmerized by how people were using the product because you can really see where folks get caught up and where their snags. And Full Story, I remember, had this...

I don't know if it was a feature. It had an alert when someone would do what they called rage clicks. And that was where if someone wouldn't get what they wanted on the first click, and then they would click like four or five more times because they wanted that. But for some reason, the button wasn't working or the page wasn't loading, whatever it may be. And so it's incredible to...

have that almost before you do anything or before you actually have real users and you don't have to have real users go through the pain and that rage and you gather the insights and then you can talk with the llms to gather those insights and say what are some of the key things that i'm missing here's my write-up maybe or here's some things that i'm seeing what am i missing

That's right. If you gather that just from data, you need to do a lot of guesswork. And that's the reason why UX research exists and qualitative surveys exist. It's because quantitative data brings you probably 80% there.

but you still want to understand human behavior also from a qualitative background. And these techniques, I hope in the long term, will be a way of bringing UX research to the next level.

And so are you trying to operationalize this or is it something that is much more handcrafted for each use case that you encounter with the companies? Yeah. So at the moment we are basically a consulting company. So we have clients interested in building specific applications, which is for product innovation and a

in the CPG space, so we are developing this with that angle in mind. I do think that unlike other consulting companies, we are a little bit different at BIMEC Labs because we have a strong belief in innovation and open source and that's the reason why we called Labs because we are recognized internally as a group of researchers that wants to solve interesting problems.

And everybody has fantastic backgrounds, both in academia and in the industry. And so we think that some of these applications we are developing

right now is using concrete problems that come from our clients in the industry in the longer term can be strategic for us to develop and they're going to be released as open source where we can of course and you know they're going to constitute a body of work that's going to remain for everyone

Now, when you look at other use cases and ways that you can merge the Bayesian world with the LLM or just language model world, have you seen other stuff that you want to attack? Maybe you have put time into trying to make it work. Maybe it's an idea that's floating around in your head. Yeah, I think this area of probabilistic deep learning and probabilistic neural network

is really interesting and it's another area you want to talk.

So we talked about the idea of augmenting the Bayesian work through below-the-lens. We talked about the idea of running even simulations of agents that behave like users. And then what you can do, and I mentioned later on, is to add to these simulations some Bayesian prior later on. But intrinsically, there is also another angle, which is your fundamental deep learning model can become a probabilistic model if you add

probability distributions on the weights, for example, and you sample from the neural networks.

That's the reason why the technique you use in Bayesian modeling, like building computational graphs, sampling from GPUs, they're actually the same technique, computational technique, fundamentally, that you use in deep learning. There is even libraries like TensorFlow Probability, which is an adaptation of TensorFlow to the problem of building Bayesian models.

probabilistic deep learning is extremely interesting. It's very hard to solve computationally and there is still a lot of research to be done on whether the probability distribution you get from these deep learning models is really calibrated and the correct probability distributions you can rely on.

So lots of research to be done. There is a body of research that comes from ETH Zurich that came a couple of weeks ago out in a very nice summary paper. So I think that's where I want to go there next. It's because it actually addresses very key problems, business problems, especially in the area of...

high stakes predictions like for self-driving cars for example where you need to know the confidence of your predictions for is there a pedestrian there or things like financial forecasts for high stakes scenarios like for example hedge trading hedge fund trading or things like medical predictions for like knowing whether a person has a

cancer, for example. So these are extremely interesting topics that we probably are going to get also like, you know, some work from our side in the future. And I missed how exactly that corresponds. Or did you say that that's just somewhere where you want to start focusing your attention? Yeah, so I think it's the bleeding edge of research today.

I would always divide, let's say, the world into categories. The bleeding edge of research is not yet ready for industrial applications. Then things that are ripe for building applications on top. And then there is the state of the art that everybody uses and keeps building up.

I classify, and this is of course just my opinion, but I would classify probabilistic deep learning as the bleeding edge of research, almost at the brink of becoming something that you can implement in industrial applications. And of course, you know, people are going to shoot me and they're going to say, oh, there is already some applications built on top of probabilistic deep learning. I'm a deep expert in that topic, but that's kind of my high level assessment of the field at the moment.

It's so cool, man. It's really nice that you're so deep on this stuff and you're thinking about it and then thinking back to how it can be tied into the companies that you're working with and how can we bring business value with this, right? And so one thing that I... Speaking of asking users for feedback, it's so funny that you mentioned this because I literally, this morning before we hopped on this podcast, I...

I spent probably an hour and a half emailing folks that signed up for the ML Ops community and that have written me because one thing that I asked for just to find out if they're human or not is what's your favorite song?

And so on that first email, when you join the MLOps community, it says, Hey, this is the MLOps community. I'm Demetrius here. Just so I know, what's your favorite song? And so people write me back with like favorite artists and all that. And by the way, it is an incredible way of finding new music. I have such cool playlists now that I never had heard of. But what I was doing for an hour and a half today was writing everyone back saying, Hey,

great suggestions, finding new music. This is awesome. And also asking for what's going to be the most valuable thing that this community can offer you. Because I always want to know what else can we be doing in the community to make it more valuable to people? And so that type of stuff I need, I need the LLM. I want to talk to the LLM also. And then also, um,

just the reason I was bringing this up is at the end of last year, I sent out a bunch of emails and messages to close friends asking them, what's something that we can do more of in the community in 2025 that will be valuable for you, right? And

One that has stuck with me, I can't remember who told me this, but somebody said, I really appreciate it when you talk about bringing the ML and AI field to the business side and tying it to business metrics. I think I'm paraphrasing, obviously, but tying it to business metrics. And so I feel like you're someone who sits in a unique position because of the...

area that you're in right now, being able to work with a lot of these Fortune 500 companies, but also all the stuff you've done in the past with these hyperscalers. Do you have ways that you can sniff out high value use cases or high value or basically bridge that gap between the machine learning and AI side of the house and the business metrics and tying it back to it? That isn't just an LLM creating a PowerPoint for you.

That's a very, very, very good question. I think that I don't have the magic answer to that, but there are two principles that I use that really guided me in my career. So the first principle is try to understand very deeply the business model and what moves the needle percentage-wise. Meaning...

is in these specific business models, you can always model as a graph. And then if you change X percent in this variable, let's say marketing spend, how much the revenue profitability of the company changes percentage wise? If you find those

variables of very, very high elasticity, meaning a small change here changes kind of a lot your top line and bottom line, then you found an area where you want to dig deeper. And often what I do when I join a new company, I do an analysis of their business model. I try to find those areas of high elasticity, looking really at decomposing their business model, kind of like a graph of variables.

The second one is that, and this is more like a leadership principle, you don't need to come up as a leader with all the answers, all the time. You can rely on your team. And often, if you think about your portfolio of teams as a head of data, you have the data platform teams, your analytics teams, you have machine learning teams, and so on and so forth. Your analytics teams, it's the gold nugget

finder team. So they work closely with stakeholders. They often uncover needs, insights, opportunities that your data platform teams and your data scientists, the machine learning engineers wouldn't see otherwise. And you can use them really to evangelize the approach that your team has and to uncover new interesting problems. And so often, or actually,

A trick that I learned that I use in all my teams is that data teams should have business calls. And a business call that I often give, a KPI that I often give to my analytics teams is, by the end of the quarter, you need to find 10, $20 million of new opportunities. And so I give the problem back to them and I ask them to solve the problem in this way. And it's a core KPI that they have. They've got to eat it.

And it really doesn't matter whether it's 10 millions or 20 millions and if they achieve, you know, X percent of it only. But it puts them on the offensive to going to their stakeholders. And instead of like just getting the problem from them, helping them think through their problems and finding new opportunities. So...

This makes me think almost in a different way that I normally will look at it. But if you are looking almost at a pipeline of use cases or potential use cases, you have the analysts that are out there like Sherlock Holmes trying to uncover new use cases and then figure out how much that is going to

make for the business or save the business. And then once that is properly scoped and there's an idea there, then they can go and hand it off to the respective teams or they can go and champion for it. And then it's up to the leader to make the decision if it's worth pursuing or not. And then you go and you say, okay, well, this is actually...

going to be implemented by the machine learning team or it's going to be something that the data platform or machine learning platform team has to go and implement because we see that if we can

and increase the velocity of getting a machine learning model to production by 2% or 10%, then that's going to save us or make us X amount of money. Or if we can bring down the fraud by 0.3%, then that's going to save us X amount of money, whatever it may be. That is a really cool way of thinking about it as like the analysts are out there searching through the business, just like,

lifting up rugs and trying to find dirt.

Or they get it from the stakeholders. For example, it's completely new problems we haven't thought about. For example, I mean, in hindsight now, at HelloFresh, you always had this problem of getting amazing pictures of food and creatives of food. And, you know, there is an entire photo op team that's dedicated to that. Well, I mean, like the analysts could have noticed that specific angles, specific like type of pictures got tons more engagement. And now,

Go figure with your MML team. How do you actually scale that process up regeneratively? So here it's new business problems. It was uncovered by an analyst. A good observation. It's not the team as a head of data that I work with that often. The team of like photo op for like food. So that insight is brought to me back to me.

from my analyst, I say, wait a minute, there is something to dig further here that may go make the connections with my data science teams and let's try to automate that process and get value out of it. So that's the way it actually works. The value of the analytics teams is to be decentralized in your data portfolio and to really get this inside goldmine that comes across the business from everyone in a way.

Yeah, and then figuring out, I imagine there's got to be some tough calls that are made that potentially, like you were saying, the technology isn't there yet. I know that there was someone who I was talking to a few weeks ago that is embedded in a finance team, and there's over 42 people on this finance team. And the majority of the stuff that takes this finance team ages to do and what they are constantly being flooded with

are PDFs from banks. And they've tried so hard to figure out how to ingest those PDFs so that LLMs can do the bulk of the work. But it has been a tedious process and they haven't been able to get there with technology to ingest the PDFs and allow the LLM to fill in the PDFs for them. And so there it's almost like the...

is it worth it to spend X amount more time? I guess if it's a gigantic business problem, like you were saying back to the first part of your answer on how important is this, if we spend a year on it and we get 1% savings, if it's a million dollar company, that's probably not good. But if it's a billion dollar company, that's going to be well worth it.

That's exactly right. There are principles that you can also apply here depending on the phases your company is in. So, for example, if you are in a growth stage company, it's rarely the right thing to focus your team on cost-saving opportunities and automation opportunities.

Why? Because your business is growing really, really fast. And typically when it grows really, really fast, the opportunity cost of solving a growth problem, like how do I get to the next $10 million of revenue versus solving a relatively small scale automation problem? How do I get four people to spend 10 hours less on this specific problem? It's really worth the money.

However, if you are already a business at scale where growth is really hard, then actually focusing on cost-saving opportunities is important. For example, if you are in Amex and you have thousands and thousands of agents doing phone calls, and that's a big chunk of your costs, then, of course, working on automations of client success processes is extremely important. So you need to think wisely.

Where your company is at, where the strategy of the company is at. And of course, these gold nuggets that your team identify needs to be contextualized within the broader picture. Now, you mentioned at the first part of that answer a while back how you will get very intimate with the business model.

What are some things that you do there? Is it just you're reading their S1s or 10Ks if it's a public company? And if not, are you going to the CFO and saying, hey, how does this thing work? Or do you have other strategies of making sure that, again, you're able to uncover rocks that seem like pebbles, but really they're boulders?

Wow. That's such a, I don't know exactly if I have a good recipe for doing this. I definitely talking to people is extremely important. I mean, if you're just sitting in the company as a head of data, the VP or chief executive level, you,

You, of course, need to work very closely with your peers to understand the business model or the CEO to understand the business model and how the company works in a way. As a head of data, you also have the superpower of being able to look at the data very closely. And what I typically do, I look at concepts like LTV, which is, you know, historically, it's a very concept that's

been built and studied throughout but understanding where the value is created in your business model and how these accrues to revenue it's the first step if you really understand that that you can model it even with a bayesian model for example then you and i've understood your your business model and then i look at consumer research like i look for example i've um

NPS surveys that we did to our users and what problems they surface within our product to identify other areas of opportunities. Often talking to your CPO is extremely insightful. CPOs are obsessed with using the products themselves and they often uncover very interesting insights and problems.

So this is the tricks that I use to really get familiar and acquainted with the business model. Most lately,

I also use deep research from OpenAI a lot. I'm a power user, I think of that. Often I have just the prompt if I'm working with a company I never thought or I never worked with before, I ask deep research to give me like an overview of the industry, the trends and why this company differentiates from the competition. It's their strategy.

And that actually is a good way from an external point of view to familiarize with the new business model and with the new companies. I talked even to VCs that are using deep research to think about investment area and opportunities. Actually, now that you say that, it is so cool to think about how you can use deep research just to get an understanding of what the competition is doing also and what their most valuable ideas

Products are or services that they may be selling and then it can inspire you in different ways potentially, or it can show you where, oh, maybe there's a product line or an offering that we can incorporate into our business.

company also. So I've just been using deep research and I use the Gemini version, but I've just been using it to like research stuff that I want to buy. I didn't think about researching that kind of thing. You are...

much further ahead of me. And I appreciate that little tip. So now whenever I talk to a company, I'm going to use deep research and get a full report before I talk to them. That's right. Very, very, very important. And it's like, yeah, if you're ever comparing...

what shoes you want to buy or what, you know, like I went from watching a ton of YouTube videos to now asking it, hey, I want to buy a new car. What's the best hybrid or what has the highest score? What cars do people like the most? You know, that kind of thing on those bigger ones, just so I can know, like, where's the signal? Where's the noise?

and it does a pretty good job of it. Or, yeah, I think I've done it for shoes, cars, and my buddy told me about it. He told me he was doing it for his watches for, like, exercise, you know, like garments. Because there's anything where it's like there's a million different models and each one has its own little things. However, I have tried it for GPUs.

And it did not work. I was like, I want to know. Yeah, like GPU reserve. Like basically if I want to pay for some GPUs, I want to know how much, what's the pricing models? What are the different value props that each GPU provider has? And there's like managed GPU services and then there's non-managed. And so what are the ones that do this or that? Can't really get...

a good handle on it. And the whole reason I wanted to do that was because we're creating this like GPU buyer's guide. And so we're trying to put everything that folks would want to know when they are trying to buy GPUs or just like rent GPUs even. What do you want to know? You're on the market for some GPUs. What do you want to know? And so we're putting that all together. I tried to do it with Gemini and I couldn't get a good... Because I...

What is the reason, do you think, and they didn't get to the right answer? I think it's super confusing. And so, I don't know, maybe the LLM, I shouldn't underestimate the LLM like that. But one thing is to be able to find, first of all, where I think deep research fell down was that

It wasn't able to find all of the different providers, which you would think, like with Google, it would be able to. But there's a million GPU providers out there, and I know of a lot of them, and I know even more now after doing this research. And the report that it put together did not have half of them. It also had a bunch that weren't really what I was looking for. So it was...

I don't want to say it was like scam pages, but it wasn't high quality pages. And then the value props and the pricing, they had no idea. Like they really didn't get well. And so I don't know if it was hallucinated or if it was just, especially on the pricings, you know, and I want to go as deep as when somebody is on the market for a GPU, they're probably on the market or they could be convinced that,

to use a TPU or Amazon AWS's like Inferentia or Tranium. And so I also want to incorporate those in. But as when you say GPU for deep research, it's only looking at GPUs. It's not looking at now like, oh, TPUs and Inferentia and all of that. Of course, I could prompt it better by just saying, now also look at Inferentia. But it might be my prompts. It might just be that, you know, like,

But we're really, I can invite you to the Notion space where we're trying to do all of it in case you have any feedback on. Have you ever been on the market for GPUs? I feel like you have. No, I haven't. I haven't. Oh, all right. Then maybe not the best. Yeah. Cloud resources off the shelf as well. We haven't shopped ourselves for GPUs. But it's so fascinating. I actually never thought about using it for shopping, which is an obvious application in retrospect. I'm like, yeah.

And second is that on such a specialized product, they cannot really find a good summary, which is interesting. Yeah, it's weird because maybe there's just too many providers. And so the context, or I don't know what they're doing behind the scenes with the deep research agents that they've got, but it's not really, at least when I did it,

a month ago it wasn't working so maybe it has changed since then and I also again was using like Gemini's deep research so maybe like the OpenAI's deep research significantly better than the Gemini one is it yeah oh really way better so like while Gemini is good for a quick summary like yeah

the one from OpenAI looks like there is some thought into it. There is a good thinking structure process that comes out of it. It's like the difference between, you know, like a senior analyst that just compiles sources and maybe like a consulting project manager or partner that actually think through the storyline. That's what we're paying attention to. A report also after the fact? Yeah.

You can ask for different formats. They give you like a full report. You can tell him even, hey, I want a podcast out of it, style presentation. Actually, it's a format you want to digest the information in.

That's interesting because this is, well, if you want to ask Deep Research on the OpenAI side and send me the report it gives you, I would be happy because I don't pay the 200 bucks a month. Send me the prompt. You can send me the prompt that you use for Gemini with Triton AI. You can send back the link to the results so you can see the link. Nice.

Agents of Innovation: AI-Powered Product Ideation with Synthetic Consumer Testing // Luca Fiaschi // #306 01:02:23 Share

MLOps.community

Deep Dive

Shownotes Transcript

Agents of Innovation: AI-Powered Product Ideation with Synthetic Consumer Testing // Luca Fiaschi // #306