This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. When it comes to generative AI, I think sometimes we just overlook some of the most important things, right? It's we just want to hit that big red easy button and have it spit out hours of work. And we're just like, yay, good, we're done. But
There's that one most important part. It's the data. Do you trust where your data is coming from? Is it reliable? What happens if it's wrong? And well, why should you even care?
We're going to be talking about that and hopefully answering a lot of those questions today on Everyday AI. What's going on, y'all? My name is Jordan Wilson. I'm the host of Everyday AI. This thing, it's for you. This is your daily live stream podcast and free daily newsletter, helping everyday people not just learn
and keep up, but how you can leverage it and become the smartest person in AI at your company. So if that sounds like you and what you're trying to do, this is your new home. If it's the first time, uh, please make sure if you listen on the podcast, check out your show notes in there, you will see a website, uh,
youreverydayai.com. But before we get started, have to first give a quick shout out to our partners at Microsoft. So why should you listen to the Work Lab podcast from Microsoft? Because it's the place to find research-backed
insights to guide your org's AI transformation. Tune in now to learn how shifting your mindset can help you grasp the full potential of AI. That's W-O-R-K-L-A-B, no spaces, available wherever you get your podcasts. All right, so thanks to our partners at Microsoft. And as a reminder, if you haven't already, make sure you go sign up for our free daily newsletter on our website. We're going to be recapping today's conversation as well as going over the AI news. Yeah,
Technically pre-recorded one here that we're debuting live, but a lot's happening in the world with AI. Everything happening at CES. We got some open AI rumors swirling. So we'll have that all in today's newsletter. All right, but enough.
chit-chat. Let's build some more trustworthy AI. You don't have to hear me ramble on any longer. I have a great guest today lined up for you all. So please help me welcome to the show. There we go. Bar Moses, the co-founder and CEO of Monte Carlo Bar. Thank you so much for joining the Everyday AI Show. Thanks for having me, Jordan. It's a pleasure. All right. Let's do this. Well, first, people don't know, what is Monte Carlo?
What is Monte Carlo? Great question. So Monte Carlo's mission is to help accelerate the adoption of data and AI by reducing what we call data downtime, which is basically periods of time when data is wrong or inaccurate. You can't trust it. I don't know if this has ever happened to you, but you wake up on a Monday morning and you see that one of your data products is wrong. Like you were staring at a report, the number's off, something's wrong. You're like, wow, like, why is it off?
And oftentimes, it's not only really difficult to catch the issue, it's actually also really hard to understand what's the root cause and to resolve it. So Monocle helps solve all of that. We're fortunate to work with some of the world's best data teams, ranging from companies like Fox, Roche, Cred, Karma, and many, many others. It's probably the part of my job that I love the most, getting to work with amazing customers on some of their hardest problems.
So in a nutshell, right, a company comes to you before, what happens after they come to you, right? Like if everything goes right, they just better understand their data and how it works with AI. Like what's the end result? Great question. So I would say, you know, there's lots of people today like data analysts, data scientists, data engineers, machine learning engineers building what we call data products.
A data product can be a generative AI application, or it could be a report that your CMO is looking at every day, or it could be a pricing recommendation algorithm. It could really be a variety of data products. And those data products are often wrong. The biggest issue are based on wrong data. And the biggest issue is that oftentimes data teams are the last to know about that.
And so, you know, the very sort of the very first kind of table stakes fundamental thing that we do or that we help organizations is be the first to know about data issues. So no longer the days where data teams are sort of caught by surprise by data issues and sort of hearing about it from someone else. Like, that's the worst thing that can happen, that you didn't sort of catch that.
Never happened to me. You know, I'm sort of asking for a friend, if you will. I'm kidding. But, you know, that's sort of like fundamentally the very first thing. And that's really the thing that sort of that Monte Carlo kind of set out to really solve, you know, when we founded the company five years ago. And that's been sort of the, you know, I want to say like the first frontier. I think since then, what's become even more apparent is that that's only like first half and in a sense, maybe even the easier half.
of the work. Actually, the really big challenge, sort of where I think that the sort of AI reliability and industry is heading is not only knowing that the issue existed, but also answering why. And should I care? What should I do with this information?
Because oftentimes data teams are just inundated with alerts like this is broken, this is off, you know, this data is late, this data has never arrived, this field is, you know, looks a little bit off, this number is missing. But in those instances, like the hard thing is actually to answer,
I have all these systems working together, but what is actually the root cause? Is it something that went wrong in the data? Is it that the job wasn't completed? Is that there was a change in the code? Those answers are really, really hard to answer. And so a lot of the things that Monte Carlo does, not only Monte Carlo, just observability more broadly. So
sort of the field of data observability, if you will, is about answering or helping data teams answer the question of something went wrong. Should I care? And if so, why and how do I resolve that? And that honestly is a lot of what observability actually sort of started out with. So observability, we didn't make it up in data. We actually sort of, you know, borrow the concept from data.
borrowed the concept from engineering teams from software. So observability and software engineering is very well understood with organizations like Datadog, obviously. Who doesn't have Datadog today or something like Datadog? Every single engineering team has something like Datadog and relies on a solution like Datadog to make sure that the software that they're building is
reliable and can be trusted and sort of up and running, if you will. And so data teams, in my opinion, should be doing the same. It's a little bit of a new area, observability, if you will. You know, I think Gartner sort of forecast that's over 60% of organizations will have data observability in some sort of fashion form in the next five years, but it's a new area. So, you know, it's been only recently been defined.
I mean, just in the first like three minutes there, I think you answered like my first five questions. I want to hit rewind just a little bit on the like, why should we care, right? And I think at least in my viewpoint, and maybe this is for smaller and medium-sized businesses, but they often don't even really take the time to fully understand how their data even works.
So they're like, OK, well, we know we need RAG, right? We know we need to bring in our own data, you know, to work alongside, you know, a backend API, right? But why does it ultimately matter whether they get their data right if they are using, you know, a different, you know, Claude, Anthropic, Gemini, et cetera?
It's a good question. And let me just take us back like 10, 15 years ago when honestly, maybe it didn't really matter. Like it just didn't, you know, we weren't really using data so much. Definitely didn't have any journey of AI. And so we could kind of get away with like data being wrong most of the time. Worst case, someone just told you and you had to go ahead and fix it. Like no big deal. You moved on with your life. Right. But I think a lot of things that have changed since there's been these sort of various, you
sort of eras, if you will. And I think the first era was where a lot more people started using data. And so you can no longer get away with like looking at the data only once a quarter. Now you have like millions of users, you know, pressing, ordering an Uber. And so you can't get the time of when your car is coming to be wrong or you can't get the price to be wrong. Like, for example, if I see that the Uber car is going to be arriving in 30 minutes, I'm not going to be waiting for 30 minutes.
I'm going to be signing off and going to a different platform, right? And so, yeah, it matters that the time for which, for when the Uber is going to arrive, that data needs to be accurate because otherwise you're going to lose your users. So that was sort of the first wave where people, just a lot more people were using data and data products became a lot more important. Then there was a second wave of generative AI, which is now sort of happening more
And actually, interestingly, you know, we recently did a survey among about 200 or so data leaders. And we basically asked, you know, how many of you are sort of deploying generative AI or building generative AI in production? Can you guess the answer? How many are doing that today? I'm guessing a small amount.
Actually, interesting, 100% of them said that. Oh, okay. Literally, yeah, I was surprised too. Every single person on the survey, and these are all like, you know, data leaders from credible companies, 100% of data leaders are currently building something that's joint of AI. Now, the second question was, how many of them actually trust the data that they're going to be using?
Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.
Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,
or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI.
Yeah. And that's interesting. And it's a good point. And I was kind of shocked by that, right? Because a lot of the studies I read say that even in enterprise companies, you know, I think the latest study is only 5% of companies have generative AI solution top to bottom.
right? Fully implemented, right? But I guess it's got to start with the data first and trickle, uh, trickle everywhere else. Is this maybe also, right? Uh, speaking of, of shifts in generative AI, one thing that I'm personally always dorking out about is kind of this shift from the, uh, attention economy to the intention, uh,
economy, right? Like to be able to better understand, you know, what users on the internet are going to do before they even know they're going to make a decision. And that starts with data, right? I think we've been hearing for 5, 10, 15 years, oh, data is the new gold, but is it even more, like increasingly more and more important because of generative AI?
Yeah, 100%. And so going back to that survey, only one out of three leaders feels confident in their data that's feeding their generative AI model. So like most of us, two thirds of us don't have confidence in the data that we're using. And so, you know, to your question, why does it matter more in this new world or in generative AI?
I'll explain that given one example and one sort of more sort of theoretical example. But the first sort of, you know, sort of real life example, if you will, this was a couple of months ago, someone sort of this went viral somewhere. Someone sort of Googled, you know, what should I do if the cheese is slipping off my pizza? Right.
And Google was like, oh, no problem. Just use organic super glue to like, I don't know if you saw that, to like put it back on pizza. Like that went viral. And you're like, okay, well, if you're Google, maybe you can get away with that kind of answer, right? Like, sure, I'm going to continue to use Google tomorrow, right? But most of us...
can't afford that we don't have the luxury of spitting out such falsely or you know um uh you know clearly uh misinformed answers and so for most enterprises the reliability of the data that you provide is actually intertwined with your brand and your reputation and the impact on the top line and the revenue that you're generating and so that's sort of you know one example um
to kind of bring that to life. More broadly, though, you know, if you think about what companies are sort of tasked to do now, we're seeing, you know, every single data leader needs to do something with gen of AI now. How are they doing that? Because today, every single one of us has access to the latest and greatest LLM model, right? Like, we can all switch between them, we can all use them. And so in a sense, we all have access to models built by, you know,
Thousands of amazing PhDs, right? Like we can all do that. So what's my competitive advantage, right? How am I going to build a better data product than my competition or what's my long-term moat?
And what I believe in, what I'm hearing from my customers is that the moat is actually the data that you have because it's no longer simply connecting to an API and actually building a generative AI product. The power of building a highly personalized generative AI product is based on the ability to use first party sort of enterprise data.
So I can actually build a way better product if I know something about you, Jordan. And if I know about your background and I know your habits, I can actually build something that's personalized for you. And the data that I have is something that arguably no one else has.
And so I think for leaders thinking about what are they going to build or how are they going to use generative AI, the data that you have is the moat. That's actually how you gain competitive advantage and build data products. And so if you believe that's true, then the quality and the reliability of the data that you're using is of utmost importance. Because if the data that you have is inaccurate, then what's the point of the moat that you have? Yeah.
And I think this makes a lot more sense and is resonating for those that work at larger enterprises, right? And they already have a data warehouse or data lake. They're using Amazon S3. I don't know, right? But for maybe those medium-sized companies that don't have their data game strong, right? But they have it in a lot of different platforms.
places, right? Maybe they have some, you know, floating around in different places in Google or, you know, in their CRM, et cetera. How do these, you know, smaller and medium-sized organizations, how can they take advantage of that? Because what you said there is true, data is the moat, but how do these smaller and medium-sized organizations start to actually pool all of this data together so they can, you know, leverage generative AI with it?
Yeah, I mean, I'd start by saying no data is better than bad data. So if you have shit data, I'm actually not convinced that you should be using it. I actually think it might be better to make sure that you have data that is reliable and trusted. You know, I think just to give you an example on Monte Carlo, Monte Carlo as a company, we're about 200 or so employees. We build generative AI products, and it's of utmost importance that the data that we have is accurate.
And so I think even if you are a small organization, you know, the bar is not lower. In fact, I think it's higher. In fact, I find that enterprises, you know, large enterprises really struggle with getting their data together, really struggle with having a source of truth. Like if I have multiple copies of the data,
For example, I mean, even answering questions like, you know, how many customers do we have or large organizations needing to try to figure out sales compensation. It's really freaking complicated to do that because the answer that
I get from my finance team is different from what my sales team is saying, is different from what my marketing team is saying. And so every different team is looking at different sets of data. And so getting an answer is really, really difficult. So I actually think medium size and smaller organizations have an advantage. You have, you know, you actually, in fact, you know, I think it's, I actually think smaller teams move faster. Like I,
There's probably some proof of that. And so as a smaller team, you know, you're probably small but mighty. And so make use of the data that you have and you have the advantage of being able to move faster and actually innovate faster because larger organizations now are way, way slower and obviously way more risk averse.
So I think smaller organizations have the benefit of being able to try a lot of things, experiment, move quickly, and double down on some of the experiments that are working. And by the way, that's sort of the large majority of what we see companies do, both small and large.
basically have like this mandate to go experiment in the organization and like have lots of teams, you know, try out different things and sort of build different applications. And, you know, companies understanding that they will come up with a centralized strategy only later.
All right. So I do want to talk about some of these use cases, but we're going to take a quick 20 second break and have to shout out one more time our partners at Microsoft. So why should you listen to the Work Lab podcast from Microsoft? Because it tackles your work.
Thank you.
work lab that's w-o-r-k-l-a-b no spaces available wherever you get your podcasts all right so bart i i do want to jump into it a little bit here because we've been talking about like some of the issues uh with with getting good data having reliable data that you can trust so what happens when it does get together maybe could you walk us through a use case or two just for those that maybe are just getting their data feet wet so to speak so they can see
hey, how do these, when good data and good gen AI comes together, here's good use cases. Yeah, absolutely. And it's been really fun sort of hearing kind of, you know, a variety of use cases and sort of innovation. I'm really excited. And honestly, like the hype around this is so big, but I think even if it materializes only 10% of the way, that's enough to be such a disruption for us.
and for future generations. So I'm really excited. I'll give a specific use case, actually one that we use at Monte Carlo. So one of the challenges that we have is oftentimes when, this is sort of very meta, but oftentimes when we work with data teams, they actually don't know the state of their data and they certainly don't know why their data might go wrong and what might go wrong there. And so if you need data
to set up sort of coverage with data quality monitors, you don't always know how to get started. Especially if you are a less technical user, that might be a lot harder. And so what we do is actually sort of build data quality monitor recommendations where we actually sort of profile the data that specific customer. We use Anthropix Cloud 3.5 Sonnet.
And one of the advantages of working with LLMs is that they have a really strong semantic understanding. And so we can actually use, with a combination of profiling the data and the metadata, a bunch of other contextual things that we bring together, we can use that to actually help define what monitors you should be setting up.
So I'll give sort of a really, hopefully an easily understandable example. You know, we work with sports organizations, for example. And so if you take like a, you know, like a baseball organization, for example, and you think about like pitch types, you know, and actually like baseball and in sports in general, collect a ton of data about, you know, different athletes, different players, and a lot of sort of data about the games itself on a ton of statistics and analysis.
For anyone who's seen Moneyball and others, one of the things, one of the types of data you might collect is the type of pitch and also the speed of the pitch. And so, for example, using analysis, you can actually learn that, you can actually determine that if you have a fastball, that should always be over 80 miles per hour.
And if it's under 80 miles per hour, then maybe there's a problem. It's not really a fastball, right? And so that's sort of the recommendation that we can make using generative AI or using LLMs to say, hey, you should set up this data quality monitor. And we can do a lot more to kind of help users actually make sense of their data in order to drive what sort of data quality monitors they need.
And that's a good example. And I think it really illustrates a point. So, you know, because we can all relate to, you know, oh, classifying a pitch, right? And, you know, you never really know what it is until, you know, you see it or, you know, you're watching on TV. But maybe could you walk us through one, maybe one more example of, you know, how people
good data and, you know, knowing that you can rely on it and how that can really make a difference. Yeah.
Yeah, for sure. So another example of, you know, I think the generative AI use case that's really cool is something that Credit Karma from Intuit does. So, you know, for folks who don't know, you know, Credit Karma is sort of a financial assistant that's based on AI and so can make recommendations for you on how to best manage your finances. And so, like I mentioned before, you know, any organization has access to the latest and greatest services.
you know, OpenAI, API or others. What Credit Karma has that no other organization has is information specifically about their users. And, you know, they serve hundreds of millions of users and they can tell, you know, you have this specific credit score and you've had this Honda car for the last 10 years and you're going to be selling it at this time and you have this kind of history and
all that information can be used to help make specific recommendations for you about your specific financial situation. Now, the downside is, you know, we want to make sure that we're not surfacing to you the wrong credit score. So you, Jordan, should be able to access only your credit score and not my credit score, for example. And also the financial recommendations that are being made to you should be based on your data and your data alone.
And so I think the power, you know, um,
Credit Karma actually builds RAG pipelines. And so they use LLM and actually like enrich them with data that they have about their users in order to build these highly personalized assistance, if you will. And so, you know, I think being able to actually build such a personalized product that's also based on reliable, accurate data results in a really, really strong outcome for customers.
That's sort of one kind of example from the financial world. There's a lot more examples where companies make good use of LLM and Gen AI actually for efficiency internally, which I've seen. So, you know, the Credit Karma Intuit is more an example of an external data product that gives you a really strong ability to make an impact on your customers externally.
If you think about seeing value internally as well, lots of organizations, the most basic example is where organizations see an increase in engineering productivity. So where you have a sort of coding assistant, that's like the most basic, I think, that most organizations are seeing today. And I think that often helps for
more sort of junior and experienced engineering, but also senior engineers as well. So if you have a largely sort of junior or new organization, you'll realize even more benefits in that. But I think, you know, some numbers like, you can increase significantly the number of, the ratio of sort of number of coders and code that's being reviewed with LLMs. Another example is, you know, in the,
pharmaceutical or sort of medical space, but also in insurance, there's a lot of compliance reports that are being shared. And those oftentimes can take six to 12 to 18 months to generate. And those include, you know, a lot of internal data, but also sort of status and protocols. And, you know, a lot of sort of like wrote in sort of manual reportage,
report writing, generative AI can significantly reduce the time based on that. So if you sort of use kind of the data that you have off the shelf and actually train it on past reports, it can actually generate really good examples or at least drafts to start out with.
So those are kind of examples of how folks are really gaining internal efficiencies. There's also some clever ways specifically around structured and unstructured data. So oftentimes folks find that it's, I would say in general, kind of the whole
unstructured data stack is very new and is just emerging. Like I think this is very early days for unstructured data. One of the things that are hard is how do you monitor and observe unstructured data to make sure that unstructured data is reliable? Like that's really, you know, I want to say very, very early days of that. Something that Monte Carlo, obviously we think a lot about
But also our customers obviously think a lot about one good example of how you might use LLMs to better observe unstructured data is we work with an insurance company that has customer support chats. And if you think about customer support, customer support,
customer support conversation, that's largely unstructured data. And you can use LLMs to actually structure that specific support chat and give it a score based on what is the reading of the tone and the conversation and the resolution and understanding whether that support conversation went well or not. And basically assign it a score between zero to 10 on how it went.
And one of the sort of use cases is that is then you can observe that structured data. Like, let's say some let's say the LLM gave it a score of a 12. Like, what does a score of 12 mean on a score between zero to 10? Right. And so in those instances, you can actually make sure that
that data is reliable. So there's a lot of kind of clever ways in which folks are using LLMs to structure the unstructured, if you will. Yeah. Yeah. I love that. And that's something that, you know, I'm a, I'm a huge proponent of, especially for smaller, medium-sized businesses. It's like, yeah, use LLMs to turn that unstructured data into structured data that you can actually use. So I would be remiss if I didn't ask you this though, because, you know, it's been a growing, growing trend, at least of 2024 is, you know, using synthetic
data. What's your thoughts on that? I love what you said. Having no data is better than having bad data. Is using synthetic data better than using bad data? Where do you stand on that? And is this going to be a big play in the future?
Yeah, I mean, I think there's, I think, I don't remember who, but I think there was a former scientist of OpenAI who said, like, we are at peak data right now. Like, there's so, like, we, you know, sort of maxed out on the data that there is. And, you know, we now need to turn to synthetic data in order to kind of make advancements. And so I think it's definitely an interesting time for synthetic data. And I think there's going to be a rise for that in terms of, you know, how we're going to train LLMs and, you know, how we're going to,
sort of reach even better performance, if you will. But I do think that there's no, obviously no replacement to, you know, real world data that enterprises need to use. And I see most of the attention and time spent there. So it's actually interesting. There's this return to some of the maybe more unsexy things like data governance is sort of like rearing its head now. I haven't heard that.
you know, that word in a couple of years. And now there's like a, you know, a return of data governance. And so I actually think a lot of like, you know, what, what was old is new now, if you will. And perhaps synthetic data is, is in that camp as well.
All right. So we've covered a lot in today's conversation, Barr. We started with the reliability of data and how observability worked. We gave some examples and talked about the future of data as well. But as we wrap up, what's maybe the one most important thing that you think our listeners need to know, especially those making medium and long-term decisions on how their companies play in the AI space?
What's the one most important thing they need to know about reliability of their data? I would say, you know, your generative AI product is only as good as your data. And so...
Excuse my language, but if your data is shit, then your generative AI is going to be shit. And so getting that in order is the first thing. And actually, that's a really tall order. It's actually really freaking hard to do that. But I think there's no other way. I do think that we are seeing more and more organizations actually seeing ROI on their generative AI products. And so it's high time that any organization starts to, if you haven't already invested, then you're already too late.
Love to hear it. Yeah, that's a great warning call to all of you people that are somehow still sitting on the fence in 2025. I don't get it, but there's a lot of you out there. So, Barr, thank you so much for joining the Everyday AI Show and taking time out of your day to help us all better understand data. We really appreciate it.
It's been fun. Thanks for writing. Good luck to us all. All right. And hey, y'all, that was a lot to take in. Yeah, just so much. A data avalanche of great information. If you missed something, maybe you're on the elliptical and looked away. Don't worry. We're going to be recapping it all.
on our website, youreverydayai.com. Sign up for the free daily newsletter where you will find a lot more insights and complimentary, supplementary info to go along with today's conversation, as well as everything you need to know to be the smartest person in AI at your company. Thank you for joining us. Please join us tomorrow and every day for more Everyday AI. Thanks, y'all.
And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.