We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode EP 475: AI Without Mistakes: How Good Data Makes It Happen

EP 475: AI Without Mistakes: How Good Data Makes It Happen

2025/3/5
logo of podcast Everyday AI Podcast – An AI and ChatGPT Podcast

Everyday AI Podcast – An AI and ChatGPT Podcast

AI Deep Dive AI Chapters Transcript
#artificial intelligence and machine learning#generative ai#data privacy#machine learning theory#educational#ai product innovation#entrepreneurial decision making#financial influencer insights People
B
Barr Moses
J
Jordan Wilson
一位经验丰富的数字策略专家和《Everyday AI》播客的主持人,专注于帮助普通人通过 AI 提升职业生涯。
Topics
@Jordan Wilson : 我认为,在生成式AI领域,我们有时会忽略一些最重要的事情,那就是数据的可靠性。我们需要关注数据来源的可靠性、准确性以及数据出错后的应对措施。 @Barr Moses : Monte Carlo的使命是通过减少数据停机时间(数据错误或不准确的时间段)来加速数据和AI的采用。数据产品(包括生成式AI应用、报告等)经常基于错误数据,数据团队常常是最后知道问题的人。数据可观察性不仅在于知道问题的存在,更在于理解问题的原因、重要性以及解决方法。小型和中型企业常常没有充分理解其数据如何运作,这会影响生成式AI的应用效果。数据准确性在数据使用量增加和生成式AI时代变得越来越重要,因为错误数据会导致用户流失和品牌声誉受损。大多数企业对用于生成式AI模型的数据缺乏信心,数据可靠性与品牌声誉和收入息息相关。企业数据是构建个性化生成式AI产品的竞争优势,数据质量和可靠性至关重要。即使是小型企业,数据质量也至关重要,不应该降低标准;小型企业在数据处理方面具有速度优势。高质量数据与生成式AI的结合可以产生有价值的应用,例如通过数据质量监控器推荐来提高数据质量。生成式AI可以用于分析体育数据,例如识别棒球投球类型和速度中的异常情况,从而提高数据质量监控的效率。Credit Karma利用其用户数据构建个性化金融产品,生成式AI与可靠数据的结合提高了用户体验和产品质量。许多组织利用生成式AI提高内部效率,例如编码助手、合规报告生成等。生成式AI还可以用于处理非结构化数据,例如客户支持聊天记录,将其结构化并进行分析。虽然合成数据在训练LLM和提高性能方面有潜力,但它无法替代企业所需的真实世界数据;数据治理在当前环境下变得越来越重要。生成式AI产品的质量取决于数据的质量,确保数据可靠性是至关重要的第一步。

Deep Dive

数据可靠性:生成式AI时代的核心竞争力

在生成式AI时代,我们很容易被炫目的技术所吸引,而忽略了支撑这一切的基础——数据。我最近与Monte Carlo的联合创始人兼CEO Barr Moses进行了一次深入的对话,他清晰地阐述了数据可靠性在生成式AI应用中的关键作用,以及如何利用数据可观察性来构建更可靠、更具竞争力的AI产品。

Barr指出,Monte Carlo的使命是减少“数据停机时间”,即数据错误或不准确导致系统无法正常工作的时间。许多数据产品,包括生成式AI应用和日常报告,都依赖于高质量的数据。然而,现实情况是,数据团队往往是最后发现数据问题的人。 这不仅影响了业务效率,更会损害品牌声誉和用户体验。

我们讨论了数据可观察性的重要性。这不仅仅是发现问题,更重要的是理解问题的原因、严重程度以及如何解决。 Barr强调,数据可观察性,如同软件工程中的监控系统(例如Datadog),能够帮助数据团队及时发现并解决数据问题,确保数据可靠性。

对于小型和中型企业来说,充分理解自身数据的运作方式至关重要。 许多企业并没有意识到数据质量对生成式AI应用效果的影响。 Barr认为,在数据使用量激增和生成式AI蓬勃发展的时代,数据准确性比以往任何时候都更加重要。错误的数据会导致用户流失,并最终损害品牌声誉。

数据:你的护城河

Barr 强调,企业拥有的独特数据才是其在生成式AI领域的核心竞争力。 所有企业都能访问最新的LLM模型,但只有拥有高质量、可靠的企业自身数据,才能构建真正个性化的生成式AI产品,从而获得竞争优势。 因此,数据质量和可靠性至关重要。

他指出,即使是小型企业,也不应该降低对数据质量的要求。 小型团队往往行动更快,更具创新性,能够快速尝试不同的方法,并迅速迭代改进。 这使得他们能够更好地利用自身数据,并快速构建有竞争力的AI产品。

实际应用案例

我们探讨了几个实际应用案例,展示了高质量数据与生成式AI结合的强大力量:

  • 体育数据分析: 通过分析棒球投球数据,生成式AI可以识别异常情况,例如速度低于预期的“快速球”,从而提高数据质量监控的效率。
  • 个性化金融产品: Credit Karma利用其用户数据构建个性化金融产品,生成式AI结合可靠的数据,显著提升了用户体验和产品质量。
  • 内部效率提升: 许多企业利用生成式AI提高内部效率,例如使用编码助手提高开发效率,以及自动生成合规报告,节省大量时间和人力成本。
  • 非结构化数据处理: 生成式AI可以处理非结构化数据,例如客户支持聊天记录,将其结构化并进行情感分析,从而改进客户服务。

合成数据与数据治理

我们还讨论了合成数据在生成式AI中的作用。 虽然合成数据在训练LLM和提高性能方面具有潜力,但它无法完全替代真实世界数据。 Barr认为,我们正处于“数据峰值”时代,未来需要更多地依赖合成数据,但真实可靠的数据仍然是核心。 同时,他指出,数据治理的重要性正在回归,这对于确保数据质量和可靠性至关重要。

结论

生成式AI产品的质量最终取决于数据的质量。 确保数据可靠性是构建成功AI产品的首要步骤。 无论企业规模大小,都应该重视数据质量,并积极采用数据可观察性技术,以构建更可靠、更具竞争力的AI产品。 忽视数据可靠性,就如同在沙滩上建高楼,最终将面临崩塌的风险。

Chapters
This chapter explores the critical role of reliable data in generative AI. It emphasizes the importance of data trustworthiness and highlights the challenges of identifying and resolving data issues. The discussion also touches on the field of data observability and its importance in ensuring data reliability.
  • Data reliability is paramount for generative AI success.
  • Data downtime, periods of inaccurate data, significantly impacts AI applications.
  • Data observability helps identify and resolve data issues promptly.
  • Understanding the root cause of data problems is crucial for effective solutions.

Shownotes Transcript

Translations:
中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. When it comes to generative AI, I think sometimes we just overlook some of the most important things, right? It's we just want to hit that big red easy button and have it spit out hours of work. And we're just like, yay, good, we're done. But

There's that one most important part. It's the data. Do you trust where your data is coming from? Is it reliable? What happens if it's wrong? And well, why should you even care?

We're going to be talking about that and hopefully answering a lot of those questions today on Everyday AI. What's going on, y'all? My name is Jordan Wilson. I'm the host of Everyday AI. This thing, it's for you. This is your daily live stream podcast and free daily newsletter, helping everyday people not just learn

and keep up, but how you can leverage it and become the smartest person in AI at your company. So if that sounds like you and what you're trying to do, this is your new home. If it's the first time, uh, please make sure if you listen on the podcast, check out your show notes in there, you will see a website, uh,

youreverydayai.com. But before we get started, have to first give a quick shout out to our partners at Microsoft. So why should you listen to the Work Lab podcast from Microsoft? Because it's the place to find research-backed

insights to guide your org's AI transformation. Tune in now to learn how shifting your mindset can help you grasp the full potential of AI. That's W-O-R-K-L-A-B, no spaces, available wherever you get your podcasts. All right, so thanks to our partners at Microsoft. And as a reminder, if you haven't already, make sure you go sign up for our free daily newsletter on our website. We're going to be recapping today's conversation as well as going over the AI news. Yeah,

Technically pre-recorded one here that we're debuting live, but a lot's happening in the world with AI. Everything happening at CES. We got some open AI rumors swirling. So we'll have that all in today's newsletter. All right, but enough.

chit-chat. Let's build some more trustworthy AI. You don't have to hear me ramble on any longer. I have a great guest today lined up for you all. So please help me welcome to the show. There we go. Bar Moses, the co-founder and CEO of Monte Carlo Bar. Thank you so much for joining the Everyday AI Show. Thanks for having me, Jordan. It's a pleasure. All right. Let's do this. Well, first, people don't know, what is Monte Carlo?

What is Monte Carlo? Great question. So Monte Carlo's mission is to help accelerate the adoption of data and AI by reducing what we call data downtime, which is basically periods of time when data is wrong or inaccurate. You can't trust it. I don't know if this has ever happened to you, but you wake up on a Monday morning and you see that one of your data products is wrong. Like you were staring at a report, the number's off, something's wrong. You're like, wow, like, why is it off?

And oftentimes, it's not only really difficult to catch the issue, it's actually also really hard to understand what's the root cause and to resolve it. So Monocle helps solve all of that. We're fortunate to work with some of the world's best data teams, ranging from companies like Fox, Roche, Cred, Karma, and many, many others. It's probably the part of my job that I love the most, getting to work with amazing customers on some of their hardest problems.

So in a nutshell, right, a company comes to you before, what happens after they come to you, right? Like if everything goes right, they just better understand their data and how it works with AI. Like what's the end result? Great question. So I would say, you know, there's lots of people today like data analysts, data scientists, data engineers, machine learning engineers building what we call data products.

A data product can be a generative AI application, or it could be a report that your CMO is looking at every day, or it could be a pricing recommendation algorithm. It could really be a variety of data products. And those data products are often wrong. The biggest issue are based on wrong data. And the biggest issue is that oftentimes data teams are the last to know about that.

And so, you know, the very sort of the very first kind of table stakes fundamental thing that we do or that we help organizations is be the first to know about data issues. So no longer the days where data teams are sort of caught by surprise by data issues and sort of hearing about it from someone else. Like, that's the worst thing that can happen, that you didn't sort of catch that.

Never happened to me. You know, I'm sort of asking for a friend, if you will. I'm kidding. But, you know, that's sort of like fundamentally the very first thing. And that's really the thing that sort of that Monte Carlo kind of set out to really solve, you know, when we founded the company five years ago. And that's been sort of the, you know, I want to say like the first frontier. I think since then, what's become even more apparent is that that's only like first half and in a sense, maybe even the easier half.

of the work. Actually, the really big challenge, sort of where I think that the sort of AI reliability and industry is heading is not only knowing that the issue existed, but also answering why. And should I care? What should I do with this information?

Because oftentimes data teams are just inundated with alerts like this is broken, this is off, you know, this data is late, this data has never arrived, this field is, you know, looks a little bit off, this number is missing. But in those instances, like the hard thing is actually to answer,

I have all these systems working together, but what is actually the root cause? Is it something that went wrong in the data? Is it that the job wasn't completed? Is that there was a change in the code? Those answers are really, really hard to answer. And so a lot of the things that Monte Carlo does, not only Monte Carlo, just observability more broadly. So

sort of the field of data observability, if you will, is about answering or helping data teams answer the question of something went wrong. Should I care? And if so, why and how do I resolve that? And that honestly is a lot of what observability actually sort of started out with. So observability, we didn't make it up in data. We actually sort of, you know, borrow the concept from data.

borrowed the concept from engineering teams from software. So observability and software engineering is very well understood with organizations like Datadog, obviously. Who doesn't have Datadog today or something like Datadog? Every single engineering team has something like Datadog and relies on a solution like Datadog to make sure that the software that they're building is

reliable and can be trusted and sort of up and running, if you will. And so data teams, in my opinion, should be doing the same. It's a little bit of a new area, observability, if you will. You know, I think Gartner sort of forecast that's over 60% of organizations will have data observability in some sort of fashion form in the next five years, but it's a new area. So, you know, it's been only recently been defined.

I mean, just in the first like three minutes there, I think you answered like my first five questions. I want to hit rewind just a little bit on the like, why should we care, right? And I think at least in my viewpoint, and maybe this is for smaller and medium-sized businesses, but they often don't even really take the time to fully understand how their data even works.

So they're like, OK, well, we know we need RAG, right? We know we need to bring in our own data, you know, to work alongside, you know, a backend API, right? But why does it ultimately matter whether they get their data right if they are using, you know, a different, you know, Claude, Anthropic, Gemini, et cetera?

It's a good question. And let me just take us back like 10, 15 years ago when honestly, maybe it didn't really matter. Like it just didn't, you know, we weren't really using data so much. Definitely didn't have any journey of AI. And so we could kind of get away with like data being wrong most of the time. Worst case, someone just told you and you had to go ahead and fix it. Like no big deal. You moved on with your life. Right. But I think a lot of things that have changed since there's been these sort of various, you

sort of eras, if you will. And I think the first era was where a lot more people started using data. And so you can no longer get away with like looking at the data only once a quarter. Now you have like millions of users, you know, pressing, ordering an Uber. And so you can't get the time of when your car is coming to be wrong or you can't get the price to be wrong. Like, for example, if I see that the Uber car is going to be arriving in 30 minutes, I'm not going to be waiting for 30 minutes.

I'm going to be signing off and going to a different platform, right? And so, yeah, it matters that the time for which, for when the Uber is going to arrive, that data needs to be accurate because otherwise you're going to lose your users. So that was sort of the first wave where people, just a lot more people were using data and data products became a lot more important. Then there was a second wave of generative AI, which is now sort of happening more

And actually, interestingly, you know, we recently did a survey among about 200 or so data leaders. And we basically asked, you know, how many of you are sort of deploying generative AI or building generative AI in production? Can you guess the answer? How many are doing that today? I'm guessing a small amount.

Actually, interesting, 100% of them said that. Oh, okay. Literally, yeah, I was surprised too. Every single person on the survey, and these are all like, you know, data leaders from credible companies, 100% of data leaders are currently building something that's joint of AI. Now, the second question was, how many of them actually trust the data that they're going to be using?

Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI.

Yeah. And that's interesting. And it's a good point. And I was kind of shocked by that, right? Because a lot of the studies I read say that even in enterprise companies, you know, I think the latest study is only 5% of companies have generative AI solution top to bottom.

right? Fully implemented, right? But I guess it's got to start with the data first and trickle, uh, trickle everywhere else. Is this maybe also, right? Uh, speaking of, of shifts in generative AI, one thing that I'm personally always dorking out about is kind of this shift from the, uh, attention economy to the intention, uh,

economy, right? Like to be able to better understand, you know, what users on the internet are going to do before they even know they're going to make a decision. And that starts with data, right? I think we've been hearing for 5, 10, 15 years, oh, data is the new gold, but is it even more, like increasingly more and more important because of generative AI?

Yeah, 100%. And so going back to that survey, only one out of three leaders feels confident in their data that's feeding their generative AI model. So like most of us, two thirds of us don't have confidence in the data that we're using. And so, you know, to your question, why does it matter more in this new world or in generative AI?

I'll explain that given one example and one sort of more sort of theoretical example. But the first sort of, you know, sort of real life example, if you will, this was a couple of months ago, someone sort of this went viral somewhere. Someone sort of Googled, you know, what should I do if the cheese is slipping off my pizza? Right.

And Google was like, oh, no problem. Just use organic super glue to like, I don't know if you saw that, to like put it back on pizza. Like that went viral. And you're like, okay, well, if you're Google, maybe you can get away with that kind of answer, right? Like, sure, I'm going to continue to use Google tomorrow, right? But most of us...

can't afford that we don't have the luxury of spitting out such falsely or you know um uh you know clearly uh misinformed answers and so for most enterprises the reliability of the data that you provide is actually intertwined with your brand and your reputation and the impact on the top line and the revenue that you're generating and so that's sort of you know one example um

to kind of bring that to life. More broadly, though, you know, if you think about what companies are sort of tasked to do now, we're seeing, you know, every single data leader needs to do something with gen of AI now. How are they doing that? Because today, every single one of us has access to the latest and greatest LLM model, right? Like, we can all switch between them, we can all use them. And so in a sense, we all have access to models built by, you know,

Thousands of amazing PhDs, right? Like we can all do that. So what's my competitive advantage, right? How am I going to build a better data product than my competition or what's my long-term moat?

And what I believe in, what I'm hearing from my customers is that the moat is actually the data that you have because it's no longer simply connecting to an API and actually building a generative AI product. The power of building a highly personalized generative AI product is based on the ability to use first party sort of enterprise data.

So I can actually build a way better product if I know something about you, Jordan. And if I know about your background and I know your habits, I can actually build something that's personalized for you. And the data that I have is something that arguably no one else has.

And so I think for leaders thinking about what are they going to build or how are they going to use generative AI, the data that you have is the moat. That's actually how you gain competitive advantage and build data products. And so if you believe that's true, then the quality and the reliability of the data that you're using is of utmost importance. Because if the data that you have is inaccurate, then what's the point of the moat that you have? Yeah.

And I think this makes a lot more sense and is resonating for those that work at larger enterprises, right? And they already have a data warehouse or data lake. They're using Amazon S3. I don't know, right? But for maybe those medium-sized companies that don't have their data game strong, right? But they have it in a lot of different platforms.

places, right? Maybe they have some, you know, floating around in different places in Google or, you know, in their CRM, et cetera. How do these, you know, smaller and medium-sized organizations, how can they take advantage of that? Because what you said there is true, data is the moat, but how do these smaller and medium-sized organizations start to actually pool all of this data together so they can, you know, leverage generative AI with it?

Yeah, I mean, I'd start by saying no data is better than bad data. So if you have shit data, I'm actually not convinced that you should be using it. I actually think it might be better to make sure that you have data that is reliable and trusted. You know, I think just to give you an example on Monte Carlo, Monte Carlo as a company, we're about 200 or so employees. We build generative AI products, and it's of utmost importance that the data that we have is accurate.

And so I think even if you are a small organization, you know, the bar is not lower. In fact, I think it's higher. In fact, I find that enterprises, you know, large enterprises really struggle with getting their data together, really struggle with having a source of truth. Like if I have multiple copies of the data,

For example, I mean, even answering questions like, you know, how many customers do we have or large organizations needing to try to figure out sales compensation. It's really freaking complicated to do that because the answer that

I get from my finance team is different from what my sales team is saying, is different from what my marketing team is saying. And so every different team is looking at different sets of data. And so getting an answer is really, really difficult. So I actually think medium size and smaller organizations have an advantage. You have, you know, you actually, in fact, you know, I think it's, I actually think smaller teams move faster. Like I,

There's probably some proof of that. And so as a smaller team, you know, you're probably small but mighty. And so make use of the data that you have and you have the advantage of being able to move faster and actually innovate faster because larger organizations now are way, way slower and obviously way more risk averse.

So I think smaller organizations have the benefit of being able to try a lot of things, experiment, move quickly, and double down on some of the experiments that are working. And by the way, that's sort of the large majority of what we see companies do, both small and large.

basically have like this mandate to go experiment in the organization and like have lots of teams, you know, try out different things and sort of build different applications. And, you know, companies understanding that they will come up with a centralized strategy only later.

All right. So I do want to talk about some of these use cases, but we're going to take a quick 20 second break and have to shout out one more time our partners at Microsoft. So why should you listen to the Work Lab podcast from Microsoft? Because it tackles your work.

Thank you.

work lab that's w-o-r-k-l-a-b no spaces available wherever you get your podcasts all right so bart i i do want to jump into it a little bit here because we've been talking about like some of the issues uh with with getting good data having reliable data that you can trust so what happens when it does get together maybe could you walk us through a use case or two just for those that maybe are just getting their data feet wet so to speak so they can see

hey, how do these, when good data and good gen AI comes together, here's good use cases. Yeah, absolutely. And it's been really fun sort of hearing kind of, you know, a variety of use cases and sort of innovation. I'm really excited. And honestly, like the hype around this is so big, but I think even if it materializes only 10% of the way, that's enough to be such a disruption for us.

and for future generations. So I'm really excited. I'll give a specific use case, actually one that we use at Monte Carlo. So one of the challenges that we have is oftentimes when, this is sort of very meta, but oftentimes when we work with data teams, they actually don't know the state of their data and they certainly don't know why their data might go wrong and what might go wrong there. And so if you need data

to set up sort of coverage with data quality monitors, you don't always know how to get started. Especially if you are a less technical user, that might be a lot harder. And so what we do is actually sort of build data quality monitor recommendations where we actually sort of profile the data that specific customer. We use Anthropix Cloud 3.5 Sonnet.

And one of the advantages of working with LLMs is that they have a really strong semantic understanding. And so we can actually use, with a combination of profiling the data and the metadata, a bunch of other contextual things that we bring together, we can use that to actually help define what monitors you should be setting up.

So I'll give sort of a really, hopefully an easily understandable example. You know, we work with sports organizations, for example. And so if you take like a, you know, like a baseball organization, for example, and you think about like pitch types, you know, and actually like baseball and in sports in general, collect a ton of data about, you know, different athletes, different players, and a lot of sort of data about the games itself on a ton of statistics and analysis.

For anyone who's seen Moneyball and others, one of the things, one of the types of data you might collect is the type of pitch and also the speed of the pitch. And so, for example, using analysis, you can actually learn that, you can actually determine that if you have a fastball, that should always be over 80 miles per hour.

And if it's under 80 miles per hour, then maybe there's a problem. It's not really a fastball, right? And so that's sort of the recommendation that we can make using generative AI or using LLMs to say, hey, you should set up this data quality monitor. And we can do a lot more to kind of help users actually make sense of their data in order to drive what sort of data quality monitors they need.

And that's a good example. And I think it really illustrates a point. So, you know, because we can all relate to, you know, oh, classifying a pitch, right? And, you know, you never really know what it is until, you know, you see it or, you know, you're watching on TV. But maybe could you walk us through one, maybe one more example of, you know, how people

good data and, you know, knowing that you can rely on it and how that can really make a difference. Yeah.

Yeah, for sure. So another example of, you know, I think the generative AI use case that's really cool is something that Credit Karma from Intuit does. So, you know, for folks who don't know, you know, Credit Karma is sort of a financial assistant that's based on AI and so can make recommendations for you on how to best manage your finances. And so, like I mentioned before, you know, any organization has access to the latest and greatest services.

you know, OpenAI, API or others. What Credit Karma has that no other organization has is information specifically about their users. And, you know, they serve hundreds of millions of users and they can tell, you know, you have this specific credit score and you've had this Honda car for the last 10 years and you're going to be selling it at this time and you have this kind of history and

all that information can be used to help make specific recommendations for you about your specific financial situation. Now, the downside is, you know, we want to make sure that we're not surfacing to you the wrong credit score. So you, Jordan, should be able to access only your credit score and not my credit score, for example. And also the financial recommendations that are being made to you should be based on your data and your data alone.

And so I think the power, you know, um,

Credit Karma actually builds RAG pipelines. And so they use LLM and actually like enrich them with data that they have about their users in order to build these highly personalized assistance, if you will. And so, you know, I think being able to actually build such a personalized product that's also based on reliable, accurate data results in a really, really strong outcome for customers.

That's sort of one kind of example from the financial world. There's a lot more examples where companies make good use of LLM and Gen AI actually for efficiency internally, which I've seen. So, you know, the Credit Karma Intuit is more an example of an external data product that gives you a really strong ability to make an impact on your customers externally.

If you think about seeing value internally as well, lots of organizations, the most basic example is where organizations see an increase in engineering productivity. So where you have a sort of coding assistant, that's like the most basic, I think, that most organizations are seeing today. And I think that often helps for

more sort of junior and experienced engineering, but also senior engineers as well. So if you have a largely sort of junior or new organization, you'll realize even more benefits in that. But I think, you know, some numbers like, you can increase significantly the number of, the ratio of sort of number of coders and code that's being reviewed with LLMs. Another example is, you know, in the,

pharmaceutical or sort of medical space, but also in insurance, there's a lot of compliance reports that are being shared. And those oftentimes can take six to 12 to 18 months to generate. And those include, you know, a lot of internal data, but also sort of status and protocols. And, you know, a lot of sort of like wrote in sort of manual reportage,

report writing, generative AI can significantly reduce the time based on that. So if you sort of use kind of the data that you have off the shelf and actually train it on past reports, it can actually generate really good examples or at least drafts to start out with.

So those are kind of examples of how folks are really gaining internal efficiencies. There's also some clever ways specifically around structured and unstructured data. So oftentimes folks find that it's, I would say in general, kind of the whole

unstructured data stack is very new and is just emerging. Like I think this is very early days for unstructured data. One of the things that are hard is how do you monitor and observe unstructured data to make sure that unstructured data is reliable? Like that's really, you know, I want to say very, very early days of that. Something that Monte Carlo, obviously we think a lot about

But also our customers obviously think a lot about one good example of how you might use LLMs to better observe unstructured data is we work with an insurance company that has customer support chats. And if you think about customer support, customer support,

customer support conversation, that's largely unstructured data. And you can use LLMs to actually structure that specific support chat and give it a score based on what is the reading of the tone and the conversation and the resolution and understanding whether that support conversation went well or not. And basically assign it a score between zero to 10 on how it went.

And one of the sort of use cases is that is then you can observe that structured data. Like, let's say some let's say the LLM gave it a score of a 12. Like, what does a score of 12 mean on a score between zero to 10? Right. And so in those instances, you can actually make sure that

that data is reliable. So there's a lot of kind of clever ways in which folks are using LLMs to structure the unstructured, if you will. Yeah. Yeah. I love that. And that's something that, you know, I'm a, I'm a huge proponent of, especially for smaller, medium-sized businesses. It's like, yeah, use LLMs to turn that unstructured data into structured data that you can actually use. So I would be remiss if I didn't ask you this though, because, you know, it's been a growing, growing trend, at least of 2024 is, you know, using synthetic

data. What's your thoughts on that? I love what you said. Having no data is better than having bad data. Is using synthetic data better than using bad data? Where do you stand on that? And is this going to be a big play in the future?

Yeah, I mean, I think there's, I think, I don't remember who, but I think there was a former scientist of OpenAI who said, like, we are at peak data right now. Like, there's so, like, we, you know, sort of maxed out on the data that there is. And, you know, we now need to turn to synthetic data in order to kind of make advancements. And so I think it's definitely an interesting time for synthetic data. And I think there's going to be a rise for that in terms of, you know, how we're going to train LLMs and, you know, how we're going to,

sort of reach even better performance, if you will. But I do think that there's no, obviously no replacement to, you know, real world data that enterprises need to use. And I see most of the attention and time spent there. So it's actually interesting. There's this return to some of the maybe more unsexy things like data governance is sort of like rearing its head now. I haven't heard that.

you know, that word in a couple of years. And now there's like a, you know, a return of data governance. And so I actually think a lot of like, you know, what, what was old is new now, if you will. And perhaps synthetic data is, is in that camp as well.

All right. So we've covered a lot in today's conversation, Barr. We started with the reliability of data and how observability worked. We gave some examples and talked about the future of data as well. But as we wrap up, what's maybe the one most important thing that you think our listeners need to know, especially those making medium and long-term decisions on how their companies play in the AI space?

What's the one most important thing they need to know about reliability of their data? I would say, you know, your generative AI product is only as good as your data. And so...

Excuse my language, but if your data is shit, then your generative AI is going to be shit. And so getting that in order is the first thing. And actually, that's a really tall order. It's actually really freaking hard to do that. But I think there's no other way. I do think that we are seeing more and more organizations actually seeing ROI on their generative AI products. And so it's high time that any organization starts to, if you haven't already invested, then you're already too late.

Love to hear it. Yeah, that's a great warning call to all of you people that are somehow still sitting on the fence in 2025. I don't get it, but there's a lot of you out there. So, Barr, thank you so much for joining the Everyday AI Show and taking time out of your day to help us all better understand data. We really appreciate it.

It's been fun. Thanks for writing. Good luck to us all. All right. And hey, y'all, that was a lot to take in. Yeah, just so much. A data avalanche of great information. If you missed something, maybe you're on the elliptical and looked away. Don't worry. We're going to be recapping it all.

on our website, youreverydayai.com. Sign up for the free daily newsletter where you will find a lot more insights and complimentary, supplementary info to go along with today's conversation, as well as everything you need to know to be the smartest person in AI at your company. Thank you for joining us. Please join us tomorrow and every day for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Export Podcast Subscriptions