We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Ep 541: AI & Trust: When 98% accuracy won't cut it and how Sage can fix it

2025/6/6

Everyday AI Podcast – An AI and ChatGPT Podcast

#artificial intelligence and machine learning#ai market trends#ai privacy concerns#generative ai#ai chatbot impact#ai research#cybersecurity#consulting and professional services People

Aaron Harris

Jordan Wilson

一位经验丰富的数字策略专家和《Everyday AI》播客的主持人，专注于帮助普通人通过 AI 提升职业生涯。

Topics

@Jordan Wilson : 在财务领域，即使是98%的准确率也可能导致灾难性的后果。因此，提高人们对AI的信任度至关重要，而Sage正在通过其解决方案来帮助实现这一目标。 @Aaron Harris : 作为Sage的CTO，我认为CFO的工作是建立信任，确保财务报告的准确性。为了让客户信任AI，我们必须以可信的方式设计产品，并重视人工审核。我们构建了自己的机器学习基础设施，并建立了安全机制和控制措施，以确保AI的可靠性和安全性。此外，我们还与AICPA合作，将他们的专业内容用于训练模型，这表明会计行业正在拥抱AI。

Deep Dive

Chapters

The podcast starts by discussing the critical need for accuracy in finance, where even small errors can have significant consequences. The high standards of accuracy required in financial reporting are highlighted, emphasizing the importance of trust in AI systems used for financial tasks.

In finance, even 98% accuracy is considered a failure.
CFOs and finance teams require extremely high accuracy in financial reports.
A single error can severely damage credibility and trust.

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. What if good isn't good enough, right? I think it's something that business leaders are constantly thinking about when it comes to AI.

They're like, hey, if we get this right most of the time, let's go ahead and roll this out to the entire organization. And sometimes that might be okay, right? If you're doing strategy, creative work, content production. But what about your books? What about your finances? Sometimes being 90 or 95% correct

could be bad. It could be a recipe for disaster when it comes to your business's AI plan in 2025 and beyond. That's why today I'm excited to talk a little bit about trust in AI and how, well, if you're watching our video, our live stream, you see I am at the Sage Future Conference here in Atlanta and how Sage is really helping increase everyone's ability to trust their

their AI. All right, I'm excited for this conversation. I hope you are too. What's going on, y'all? My name is Jordan Wilson. I'm the host of Everyday AI. This is your daily live stream podcast and free daily newsletter, helping everyday business leaders like you and me not just learn what's happening in the world of AI, but how we can actually leverage it to grow our companies and our careers.

So, make sure, if you haven't already, go to our website at youreverydayai.com. We're going to be recapping today's conversation and a whole lot more. But like I said, you can probably see a little different setup here. I am in Atlanta at the Sage Future Conference, and I'm excited to welcome our guest for today, Aaron Harris, the CTO of Sage. Aaron, thank you so much for joining the Everyday AI Show. Thanks for having me. Really excited to be here.

Yeah, on the road in your hometown, actually. But for those of our audience that maybe don't know Sage, tell us what Sage is. Sure, yeah. So Sage is a global software company that focuses on accounting, HR, payroll, manufacturing, sort of all of the things that a finance and accounting team needs to run the back office of the business. So we're actually a British company. We're headquartered in Newcastle, England. But we've got offices all over the world. And our headquarters here in North America is obviously Newcastle.

here in Atlanta. It's a company that's been around for a while. We've been building accounting software for more than 40 years now. The company that I co-founded that sort of got acquired in, we started more than 25 years ago. So we've been doing this for a while. We're not as well known in the U.S. because the U.S. is a huge market and there's sort of more players in this market. But if you go to the U.K. or Spain or France or Germany, like some of those countries, we're a bit of a household name within the accounting industry.

Yeah. And let's just kind of skip to the end. When it comes to the intersection of AI and accounting, why is sometimes being 90 or 95% accurate, why does that not work? Yeah. So, I mean, there's so many ways to address that question. I think the first way to start is that, look at the CFO in a company and the finance team, that CFO trades on trust.

Their job is to create confidence, not only within the business, but with stakeholders, whether it's investors, creditors, that you can rely on the accuracy of the financial reports that they're issuing. Internal stakeholders can rely on the forecasts and the budgets that are being provided. And the minute that CFO puts something out that has a mistake in it,

they're going to lose that credibility. They lose that trust. And so the bar is incredibly high. One of the things that's kind of interesting in the mindset of a CFO and a finance team, if I'm a penny off in the basic equations of accounting, they will hunt that penny down for days until they find it. They may not ever give up until they find that penny. So 99%, yeah, that's not going to cut it, right? It's got to be perfect.

Yeah, and not only that, right? And I'm sure many of our audience can relate to this. Large language models by themselves, right? So if you're using something like ChatGPT,

Not always the best at math, right? Yeah, no, not at all. Not at all. Yeah. In fact, large language models are trained. I mean, this is what makes them so amazing. They're trained to be creative. We don't want creative accounting and we certainly don't want them doing math, right? So in the way that we build AI, this is something that I often tell audiences. One of the first rules of building trusted AI is not to use AI when traditional development will work better.

So if you want AI to do math, we're going to give AI a calculator to do that math with. And I want to get more into this piece of trust, but I saw your keynote and when you were showing some of the things on screen, I'm like, wait, I need that, especially when it came to Sage Copilot. So can you explain a little bit for maybe our audience that doesn't use Sage in your AI offerings properly?

What the heck? Like, how can you do that with such high accuracy and, you know, help people kind of close the books, you know, much faster? You know, I think you said originally it was like two to three weeks and now down to like two to three days. Yeah. Yeah. We're still out. We want to get rid of it. Right. Right. It's a relic.

It's archaic. We want to get rid of this thing. So we think about this in waves, in the way that we build and deploy AI. And the first wave we call task-based AI. This is AI that you don't necessarily see at work. So the first thing that we built in the world of accounts payable automation was AI to read and categorize an invoice. And my first conversations with the data science team was, okay,

There's a bunch of models, right, from big players that are built just to do this. Why are we not using one of those? And, you know, what they had to convince me of was that those models just weren't good enough, right? They were about 80%, 75, 80% accurate. But in addition to that,

you know, that wasn't an even, that wasn't evenly deployed accuracy, they really struggled to find the total on an invoice, right? They might be 30 or 40% accurate on that. Turns out it's kind of hard to find. And so ultimately,

we had to go build our own models, models plural, right? We've got five models, just, you know, some of them are looking for the total. Some of them are checking the work of the other models to make sure that, yeah, you really did find the total. So, you know, it ended up being dozens of models. Now, you know, the limitation of this approach is,

is that you can't really interact with that AI. And that AI has to be very, very carefully orchestrated, scripted. It does what it's told to do exactly the way it's told to do it.

We can't really get to the next level of automation until you can interact with AI. And so that's the big breakthrough with large language models that sit behind Sage Copilot. Like now you can be directive of the way AI works and sort of how it gets its job done. And so that's, you know, it's a huge breakthrough and it's really a change of psychology, if you will, in the way you design the product. When we are in that task-based phase, right?

A lot of customers don't even realize how much AI is actually operating behind the scenes. They don't realize that until they're confronted with a conversational interface. And so that totally changes the way you design. You've got to design for confidence now.

Yeah. And how did we get to the point where you have a product like Sage Copilot that can accurately, you know, take advantage of, you know, the powers of generative AI yet work in a more almost

deterministic, right? Like, like way. But, you know, I saw that, you know, there's been billions of predictions, millions of documents that you've used to get here. So without, you know, going into, cause I'm sure we could talk about this part for hours, but how did we get to the point where, yes, you can feel confident as one of the global leaders in the space to say, yeah, you can go use AI for some of your most important financial tasks.

Yeah. So I think there's two parts to that question or two answers to that question.

But the first, I guess, I will get back to is how do you design the product? You have to be credible and believable with your customers. And so if I stood in front of our customers today and said, don't worry, it's 100% accurate. We've got it. They wouldn't believe at all. So you've got to adapt the experience. You've got to design around this understanding that, hey, we're going to get that large language model to be incredibly accurate.

But what we're going to do is we're going to over-index on understanding, okay, have we met that level? And if not, how do we engage a human properly to review the work? And so that is such a huge, huge part of it.

But the other part of it is that, again, the off-the-shelf models, as good as they are, as magical as they are, as powerful as they are, they're not quite good enough for what we needed. We need a large language model that knows in depth, with expertise, how our products work. It needs to be an expert at our APIs. So in the process of completing a task, it's probably going to write some code.

on the fly, right, to use an API. And so it's not going to work if it hallucinates in the process of building that API request. And so all we found was these off-the-shelf models are pretty brilliant

But two problems. First, as brilliant as they are, they still make mistakes. But second, they're incredibly expensive. So if we wanted to go about not just operating these models, but sort of get into the world of building one of these gigantic large language models, I'd never get the budget to do that. What? $100 million? A lot of money to train those. Trillion parameters cost a lot. Yeah. And so fast forward two years.

And the efficiency and the capability of fine tuning these models has increased rapidly. So costs have come down, efficiency is increased, but also the tools available to companies like us have gotten better and better and better. And so we took, so GPT, as you mentioned, trillions of parameters. GPT-4, we think 2 trillion probably.

We started with a model that's 7 billion parameters. And we fine-tuned it from there. And when you're fine-tuning, you can sort of slough off the stuff that you don't want it to do. We train our model to not accept toxic prompts. We train it to be pleasant in the way that it interacts. And then if the conversation is not about accounting...

then we don't want to have that conversation. So we can get it down to a 7 billion parameter model and we can fine tune it to be really, really, really good.

at these accounting tasks we wanted to do. You know, it's interesting because I'm sure there's a lot of people in our audience, specifically, you know, CFOs, people who work in finance, that maybe their first or one of their first interactions with, you know, AI, they were probably saw a result and they're like, I'm never going to touch it again, right? Because...

Some of the earlier, right? Even if you say something like GPT-4 or some of these earlier trillion parameter models, they couldn't do basic math, right? So I think even a lot of people that I talked to, they kind of wrote off, hey, we're not going to use AI in this department anymore. But it sounds like the...

One of the models that you have powering your Sage Copilot. I mean, it sounds like it has like a like a Ph.D. and CPA, right? Like talk a little bit more about how you were able to address that trust issue by essentially going through and training this seven billion parameter model to become an expert. Yeah. And I want to talk about that first experience, too. But but.

So what we've done is we've taken that base model and then we've trained in all the product documentation and sort of loads of material around that product documentation about best practices and how products work, all the developer code. We've trained it on accounting textbooks and accounting exams. We've trained it on content that helps it to sort of understand and speak in the vernacular of accountants and financial analysts.

And one of the things that's super exciting that we announced today is we're partnering with the AICPA, which is the industry association that accredits CPAs. They're now going to make their professional content available to us to train into the model. Now, it's a proof of concept. We're going to be conservative as we are when it comes to AI. So I can't sit here and say, like, this is exactly what we're predicting.

But I think it's an incredible signal that the accounting industry, which early on, the headlines were saying accountants aren't going to exist anymore. I think it's incredibly interesting that the accounting industry is not just sort of embracing AI, they're contributing to the development of AI models. But I want to kind of come back to that first experience.

Because it's so critical. You're absolutely right. If a CFO uses our co-pilot and their first experience says it makes a big mistake, we won't get another chance. I grew up in Silicon Valley, probably like a lot of people in your audience. And the mantra in Silicon Valley has always been move fast and break things. Mm-hmm.

When you're building this kind of AI in this industry, we've got to have a completely different culture. So I talk about, it's kind of pithy, but accept humility, embrace responsibility. We have to have a different mindset. We can't rush AI to our customers. If they have that bad experience, they won't come back. Yeah.

So you've talked a little bit on how you were able to increase accuracy, right, by creating and fine tuning your own model, trained on everything that anyone in finance, CPA, et cetera, really cares about.

But what about on the back end? What about observability, traceability? How does Sage Copilot and some of the things that you announced today, specifically kind of this trust label, how does that address it? Yeah. So one of the things that we had to do very early is we had to build our own infrastructure for machine learning.

So I kind of like to compare this to the early days of software as a service. So if you go back to the very earliest days, all of us, Salesforce included, and the other pioneers that are kind of still around,

We had to set out an objective for our developers that we would release a new version of the code on a weekly basis. We would upgrade every customer to that next version of the code automatically so everybody would always be on the same version of the code. And we had to do this without disruption.

This sounds normal now. Like 25 years ago, that was not normal. Like that was an incredibly provocative thing to say. So if you fast forward, when we started building AI, I had to give the engineers a different mission that was even more provocative, I believe. It's like, I don't want a weekly release. I want you to automate the training of this AI and detect when it's improved enough and then automatically update the version. Um,

But it gets worse, Mr. Developer. Like, you need to be able to do this on a customer by customer basis, right? So we're going to have some big models that are trained, you know, from the collective. But a lot of what we do, we need to train from the individual customer on a customer by customer by customer basis. So you've got to build this infrastructure that can automate all that, but do it in a safe way.

So we built all the automation. This is why we've got tens of thousands of models in production today. But what we also did was we built in all of these safety mechanisms, all of these controls. So we've got systems that detect model drift and launch a process to get a data scientist involved. We've got a couple of safety mechanisms that detect hallucination. This is where we've actually gotten some of our patents on this. So we call this whole thing the Sage AI Factory.

And if you see me talk to our customers or analysts, partners about AI, I'm invariably going to talk about the Sage AI factory because I think it's so important to understand behind the scenes, how does the factory work? How does this stuff get built? And how do I know that you're taking steps to make sure that it's safe? Yeah.

I'm curious, how many total organizations you have using the AI co-pilot features inside Sage right now? So we have tens of thousands using co-pilot in various places around the world. We started small. We started with small businesses that have simple accounting needs. We started with small accounting firms that tend to serve those businesses. And we started on the early capabilities.

And over time, we've expanded it to more products, you know, more countries. But we've also sort of gone into now more sophisticated businesses. So we launched early access for Sage Intact at about six or eight months ago. And I guess the thing that I would want to reinforce here is we're being very deliberate about how we open that to more customers. This is right. We're just, we're so...

We're so careful to not have that first bad experience. And so we're going to be very deliberate in the way we roll this out to more customers over time. Yeah, it seems like a super strategic rollout and obviously makes sense when trust is paramount, right? And you can't, like you said, you can't have someone have that first bad experience, right?

where something goes wrong because the stakes are so high, you know? So I'm, I'm curious throughout the iterations of, of Sage co-pilot. And then, you know, obviously another wave coming with what was announced here at Sage future. What were maybe some of the, uh, initial maybe, uh,

obstacles that you were able to overcome? And then maybe what do you think is next, right? In terms of not like, hey, what's the exact product roadmap, which I know you guys did lay that out a little bit, but in terms of trust, in terms of reliability, what have you guys already been able to overcome and then what's next to overcome? Yeah. So if you'll forgive me, I'm going to tell a story. Please do. And I promise it's setting up for the answer. So I've been talking about

Sage Intact, and I was one of the co-founders there. So 25 years ago, we just started the product. We just launched the product. Our biggest objective, the biggest obstacle to getting customers to buy the product is that they were not willing to put their data in the cloud.

And we sort of scratch our heads at that now, but you have to imagine how new this was at the time. And we would use all the arguments you'd expect. We can spend more on security than you can. Our livelihood depends on keeping your data safe. You trust a bank with your money. Why not trust us with your data? It just wasn't working. And so what we ultimately ended up doing, and I think it's pretty clever, we put a button in the product that said, see my data.

And when a user clicked that button, we would pop up a window that had a webcam in the data center pointing at the server that had their data on it. Now, we only had one server at that time with customer data. So, you know, this was pretty easy to do. Then this was actually pretty awesome. It actually kind of worked. Thankfully, we reached a point where it was less needed when we turned it off. Now, the reason why we turned it off, and I promise again, true story,

We had a technician in the data center who was doing some cabling and doing some maintenance, sort of bending over and moving around a bit. Probably should have been wearing a better belt. And a sales rep chose at that point to click in front of a prospect, show my data. And they got the full transparency. The full transparency. They saw the data and then some. And so this is kind of like an old story about building trust and how transparency plays into that.

Yeah. Today, it's still true. If you talk to companies today and they're evaluating your software and they know that AI is a big part of the offering, they want to know that you're going to keep your data safe, right? They're going to want to know that they can trust the technology. And the problem with AI is the industry and the technology is moving so fast that

that sort of the regulatory environment around it hasn't kept up. So we can't rely on a lot of sort of external signals, right, that you can trust the AI. We do use some, and I'll get into that. So what we determined was, well, we need to put a button in the product

We're going to put it in each AI feature where a user can click it. And what's going to happen is instead of that webcam, we pop up what we call the Sage Trust Label. And in that trust label, this is kind of like a nutrition label, we're going to be super transparent about, okay, what are the models that we use to build this? Are we training our own models? If so, how do we use your data and the training of those models?

What are the steps we're taking to keep your data safe? What are the safeguards in place to defend against issues of bias or other ethical concerns?

And we're putting it in a nice, easy to read format. And if they want to learn more, we give them a button where they can click and go out to read all of our AI commitments, kind of solving the same problem, really. So, yeah, we announced that today. We're encouraging other vendors in the industry to kind of follow suit. We know we can't wait for the regulatory bodies to catch up.

So we've got to plow ahead with something we think simplifies this and sort of signals to the customer, here's why you can trust us. What do you think is going to be...

Or maybe this is it, right? Because I think every single company that's trying to put out responsible, trustworthy AI products, there's always that hurdle, right? With a trust label, is this something that you think is going to be not like the last hurdle because there's always innovation, there's always new capabilities, but is this going to be one of those hurdles that after you get over it, you're like, wow, this made quite a difference for me.

for people in finance to be able to trust AI in their data? I think it's going to help a lot. I mean, I think the first thing that it's going to do is it's going to get the CTO off the phone. So when you get a customer who's evaluating your software, and we have millions of customers, by the way, right? We've got 4,000 people in sales. If we get a sophisticated customer that's got a question about AI, like...

They're going to ask me to get on the phone and talk to that customer and explain it. If we've got this trust label, it just makes it easy for people to understand and evaluate. So I think that's critical. I don't think it's going to go away. I mean, I think it's going to evolve and change. Is this the moment where we solve the issue for good? Is there going to be a point where I can turn off that button the way I turned off the webcam? I don't know, but I guess...

everything is happening so fast with AI. It's progressing so fast. And I think we all need to be a bit honest, right? That there's going to continue to be lots of reasons to not trust AI, right? It's just broader than just the accounting field. So I think we've got to have this mindset that this isn't

This is one step in the journey, and we're going to have to continue evaluating and looking at, okay, how are people feeling about trust in AI, and what are the new things that are causing them to not trust it? And we're going to have to just keep adapting as we go. So, Aaron, we've talked about a lot in today's conversation. So everything from trust and transparency to even how Sage is not just...

building their own models, but how they're showing their work to customers on how it's being used and how it's being implemented. But, you know, as we wrap up, what do you think is the one most important takeaway for those people, whether they're, you know, a CPA, whether there's a CFO at a huge organization, what's the one biggest takeaway that you want people to know from, you know, Sage future here when it comes to trust in AI?

So I think the one thing is the future is the future that we build. And sort of taking on the responsibility to build that future is pretty serious. And so that's why I'm having this big conversation with you. What we've learned through all of our conversations is the biggest signal of trust is the company behind the AI, right? I'm not a sophisticated person, maybe. I don't know how to evaluate this, but hey, that's a brand.

That I trust. And it's got to change your mindset, right? You've got to be transparent. You've got to be credible. You've got to be willing to admit that, hey, AI is not foolproof. It's going to have problems. And then you've got to make these commitments that you publish and you've got to stand behind them.

I think that was such an insightful look into not just what's happening here at Sage and their co-pilot and everything happening at Sage Future, but also just the industry as a whole. I think it's important for people to hear, yes, authenticity and transparency are just as important as productivity gains and everything else that you get from AI. So, Aaron, thank you so much for taking time out of the very busy Sage Future conference to join us. We really appreciate it.

Thanks for giving me a chance to talk about my favorite topic. I love it. I love it. So there was a lot happening in this conversation. Maybe you missed a golden nugget that Aaron just dropped on us. Don't worry. We're going to be recapping it all in our newsletter, as well as everything else that was announced here and there.

at SageFuture. And it's not just if you're a CFO or a CPA, even if you are a small, medium-sized business owner, a lot to know that was just announced. It's all going to be in our newsletter. So thank you for tuning in. Please join us back tomorrow in Everyday for more Everyday AI. Thanks, y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Ep 541: AI & Trust: When 98% accuracy won't cut it and how Sage can fix it 27:08 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Ep 541: AI & Trust: When 98% accuracy won't cut it and how Sage can fix it