We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Can AI Agents Finally Fix Customer Support?

2024/12/18

AI + a16z

AI Deep Dive AI Insights AI Chapters Transcript

People

Jesse Zhang

Kimberly Tan

Topics

Jesse Zhang: 我认为随着时间的推移,AI智能体将越来越依赖自然语言,因为这是大型语言模型(LLM)的训练方式。理想情况下,一个超级智能的AI智能体就像一个熟练的员工,可以学习、理解反馈并更新自身。它能够处理各种信息,并根据用户的反馈不断改进。我们构建AI智能体的目标是让它像一个熟练的员工一样工作,而不是依赖于复杂的决策树。 Decagon专注于为客户服务构建AI智能体,灵感来源于我们自身与客服电话沟通时的糟糕体验。我们关注的是围绕AI智能体的工具建设,以便人们可以更好地构建、配置和管理这些智能体,而非将其视为黑盒。我们的品牌建立在为AI智能体提供周全的工具和支持之上,使其不成为一个黑盒。从消费者产品转向企业软件,是因为企业软件的问题更具体,有实际的客户、需求和预算,更容易优化和解决问题。与传统的决策树相比,大型语言模型(LLM)在客户支持方面具有更高的灵活性和个性化能力,能够处理更复杂的问题。LLM的灵活性使其能够更好地个性化客户支持,提高解决问题的效率和客户满意度。LLM赋能的AI智能体能够实时提取数据、执行操作以及进行多步骤推理,从而处理更复杂的用户问题。LLM的出现使得AI智能体在处理客户支持方面取得了显著进步。 Decagon定义的AI智能体是一个协同工作的LLM系统,能够串联多个LLM调用,甚至递归调用,以提供更好的用户体验。AI智能体能否从演示走向实际应用,关键在于其应用场景的特性,而非技术堆栈。AI智能体应用场景的ROI必须可量化,例如通过解决问题的百分比来衡量,才能说服客户付费使用。AI智能体应用场景需要具有增量性,即使不能完美解决所有问题,也能带来显著价值。 AI智能体与人类交互的体验取决于客户的选择,有些客户希望将其拟人化,有些则希望明确其AI身份。Decagon通过整合用户和业务逻辑上下文,实现AI智能体的个性化,从而提升用户体验。企业客户在部署AI智能体时,最关注的是安全防护措施,例如规则、监督模型和对恶意行为的检测。AI智能体的安全防护措施包括规则设定、监督模型和恶意行为检测等。Decagon的核心理念是赋能用户自主构建和管理AI智能体,包括自定义安全防护措施。Decagon专注于提供工具和基础设施,让用户能够自主构建和管理AI智能体,并自定义安全防护措施。Decagon致力于提供易于使用的工具,即使是非技术背景的用户也能轻松构建和管理AI智能体。企业可以通过优化知识库结构和API设计,更好地支持AI智能体的应用。企业可以通过改进知识库结构和API设计,提高AI智能体的效率和准确性。未来AI智能体的交互方式将越来越自然,更像与人类的对话。未来AI智能体的交互方式将越来越自然,更像与人类的对话,而不是依赖复杂的决策树。构建一个真正可投入生产的AI智能体比简单的GPT封装要复杂得多,需要解决许多实际问题。Decagon销售的是软件,LLM只是其组件之一,客户购买的是软件的整体功能,包括监控、报告、反馈等。将AI智能体投入生产环境需要解决许多问题,例如幻觉、恶意攻击、延迟和语气等。许多企业选择Decagon是因为他们不想自己处理AI智能体部署过程中遇到的复杂问题。通过将敏感操作放在确定性系统中,可以有效降低AI智能体面临的安全风险。企业通常会进行安全测试,例如红队测试,来评估AI智能体的安全性。Decagon鼓励客户进行安全测试,例如红队测试,以识别和解决潜在的安全漏洞。未来可能会出现针对AI智能体的新的安全认证标准。作为一家应用AI公司,Decagon需要在保持产品路线图可预测性的同时,及时跟进最新的技术发展。Decagon的软件开发工作与传统的软件开发类似,主要挑战在于及时评估和选择合适的LLM模型。Decagon会定期评估新的LLM模型,并根据评估结果进行切换。Decagon更关注LLM模型的指令遵循能力,而非推理能力。Decagon内部的评估基础设施对于快速迭代和确保AI智能体性能至关重要。Decagon内部的评估基础设施对于快速迭代至关重要,因为它可以帮助团队快速评估模型变化的影响。多模态对于AI智能体很重要,但其普及取决于技术和市场需求。在构建了完善的工具和逻辑之后,添加新的模态(例如语音)对Decagon来说并不困难。多模态AI智能体的普及取决于客户的接受程度和技术成熟度。从文本开始是合理的,因为文本更容易被客户接受和监控。语音AI智能体面临的技术挑战高于文本AI智能体,例如延迟和自然度。 Decagon在早期阶段就得到了很多客户的积极回应,这出乎意料。客户对Decagon的AI智能体解决方案表现出极大的兴趣,这与该解决方案的时机和应用场景密切相关。企业在采用AI智能体时,更关注的是其价值和客户满意度,而非幻觉问题。AI智能体的定价不应该基于传统的每用户许可模式,而应该基于其工作产出,例如每次对话或每次解决问题。传统的每用户许可模式不适用于AI智能体,因为AI智能体的价值不取决于用户数量,而是取决于其工作产出。AI智能体的定价应该基于其工作产出,例如每次对话或每次解决问题。Decagon采用按对话付费的模式,因为它比按解决问题付费更简单、更可预测。按对话付费的模式比按解决问题付费更简单、更可预测,也更能避免一些潜在的激励问题。未来,AI智能体在工作场所中的应用将显著增加对AI管理人员的需求。AI管理人员需要具备观察、解释和构建AI逻辑的能力。一些对错误容忍度极低的行业,AI智能体可能更多地扮演辅助角色而非完全自主的角色。 Kimberly Tan: 如果一个想法看起来很明显,但没有一个明确的解决方案,那么就意味着这个问题实际上还没有得到解决。Decagon在早期阶段就获得了大量客户的关注,这表明市场对AI原生客户支持解决方案的需求巨大。AI智能体的采用率取决于其ROI是否清晰可衡量。企业在采用AI智能体时,更关注的是其价值和客户满意度,而非幻觉问题。 Derrick Harris: (节目主持人的角色,没有表达核心观点)

Deep Dive

Key Insights

Why are AI agents gaining popularity in customer support?

AI agents offer higher personalization, flexibility, and the ability to handle complex workflows, which improves customer satisfaction and resolves more inquiries compared to traditional chatbots or decision trees.

What is the difference between a chatbot and an AI agent?

Chatbots rely on predefined decision trees and simple NLP, often leading to frustrating experiences. AI agents, on the other hand, use LLMs to handle complex inquiries, adapt to different situations, and provide personalized support by chaining multiple LLM calls and integrating business logic.

Why do most customers prefer a per-conversation pricing model over per-resolution?

Per-conversation pricing offers simplicity and predictability, as defining what constitutes a resolution can be ambiguous and lead to misaligned incentives. Per-resolution pricing could encourage deflecting difficult cases, which customers dislike.

What challenges do incumbents face when adopting AI agents?

Incumbents struggle because AI agents cannibalize their traditional seat-based pricing models. They also have less risk tolerance due to their large customer base, making it harder for them to iterate quickly and improve products compared to startups.

What are the key skills needed for an AI supervisor in the future workplace?

AI supervisors will need skills in observability (understanding how AI makes decisions) and decision-making (providing feedback and building new logic). They will also need to monitor AI performance and ensure it aligns with business goals.

How do AI agents handle security concerns in enterprise settings?

AI agents use deterministic APIs for sensitive tasks, reducing the risk of non-deterministic outputs. Enterprises often conduct red teaming to stress-test the system, ensuring it can handle potential attacks or misuse.

What is the role of personalization in AI agents for customer support?

Personalization involves tailoring responses to both the user and the specific business logic of the customer. This requires context about the user and access to business systems, enabling the agent to provide a more accurate and relevant experience.

Why is the customer support use case well-suited for AI agents?

Customer support has quantifiable ROI (e.g., percentage of inquiries resolved) and allows for incremental adoption, meaning agents don’t need to be perfect from the start. This makes it easier for businesses to adopt and scale AI solutions.

What are the technical challenges of implementing voice-based AI agents?

Voice agents require lower latency and more natural interaction, which makes them technically more challenging to implement than text-based agents. They also need to handle interruptions and respond in real-time, which adds complexity.

How does Decagon manage the rapid evolution of LLMs?

Decagon evaluates new models whenever they are released, using internal eval infrastructure to ensure they don’t break existing workflows. They focus on instruction-following intelligence, which benefits their use case, even as models improve in other areas like reasoning.

Shownotes Transcript

Translations:

中文

How do you actually build an agent? Our view is that over time, it'll become more and more like natural language based because that is how agents think or how this basically what LMS are trained on. And in the limit, right, if you have like a fully just like super intelligent agent, it would basically be like a human where you can show it stuff, you can explain it stuff, give it feedback, and it just kind of updates in its mind. Like if you just think about having a really competent human on your team, they arrive.

You teach them some stuff, they start doing work, and then you just give it feedback and you can show new things. You can show it like new documentation or show it new like charts or whatever. In the limit, it kind of moves towards that where things are a lot more conversational and things are more natural language based. And people aren't just using these stopgaps of like building gigantic, complex decision trees that sort of capture what you want, but can break apart pretty easily.

Good day and welcome to the A16Z AI Podcast. I'm Derek Harris. And joining me for today's episode are Decagon co-founder and CEO Jesse Tsang, along with A16Z partner Kimberly Tan. Kimberly leads the discussion with Jesse, who shares his experiences so far with building Decagon as both a company and a product.

If you're not familiar, Decagon is a startup supplying businesses with AI agents to assist in customer support. These are neither chatbots nor single API call LLM wrappers, but rather advanced, tunable agents personalized to a company's specific needs and able to handle complex workflows.

In addition to explaining why they started Decagon and how it's architected to handle different LLMs and customer environments, Jesse also touches on the benefits of a per-conversation business model and how AI agents will change the required skill sets of the people in charge of customer support. It's also worth noting that Kimberly recently wrote a blog post titled "RIP to RPA: The Rise of Intelligent Automation," which we briefly discuss in the episode.

It's a great starting point to understand where and how this type of automation is taking off for business processes. And we'll post a link to that in the show notes.

As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com slash disclosures.

So a quick background on me, born and raised in Boulder. Grew up doing a lot of math contests, stuff like that. Studied CS at Harvard, started a company afterwards that was also backed by A16Z. We eventually got acquired by Niantic. And then here we are building Decagon.

What we do, we're building AI agents for customer service. When we first got started, it was for us, we wanted to build in something that was like very, very relatable for ourselves. And so, of course, no one needs to kind of be taught like what AI agents for customer service can do, right? We've all been on the phone on hold with airlines or hotels or whatever. And so that's kind of where the idea originated. And we just talked to a bunch of customers to see like specifically what we should build.

I think for us in particular, the thing that stood out is that as we learned more about AI agents, we started to really think a lot about what would the future look like when there are a lot of AI agents. Like, I think everyone believes that there's going to be a lot of AI agents that come up. And so for us, an interesting thing would be what would the humans that work around the AI agents do? Like, what tooling would they have? What sort of control or visibility would they have into the agents that they're working with or managing?

And so that's really what we built the company around. I think that's the thing that's made us special so far is that we have all this tooling around these AI agents for the people that we work with to build them and configure them and just make it not really a black box. So that's kind of where we've created our brand.

What inspired you? Because your last company was a consumer-based video company, correct? Yeah. What was the move to get into enterprise software? Great question. I think in terms of topics, when founders think about topics, it's generally going to be pretty topic agnostic because actually when you approach a new space, you generally have like, you're pretty naive. And so there's some advantage into having a fresh perspective on things.

And so when we were ideating, it was pretty much just like no topics off limits. I think what's a very common pattern that also includes myself of like more quantitative people is after you've tried a consumer product, you gravitate a lot more towards enterprise software because the problems are a lot more concrete. Like you have actual customers with actual needs and budgets and stuff like that that you can optimize for and solve problems.

Whereas a consumer is also very exciting, but it's a lot more intuition based than like running experiments. And I think like for myself personally, it's just that's a better fit.

And maybe just to start out, what are the most common categories of support that Decagon deals with today? And talk a little bit more about how you actually are leveraging LLMs to solve that problem and what's possible now that maybe wasn't before. Sure. So if you think about automation before, you would have maybe decision trees. You can do some simple NLP of figuring out which path to go down in the decision tree.

But we've all used chatbots. That's a pretty frustrating experience. You usually don't have a question that can be fully solved by a decision tree. And so you end up getting shoveled down a path that's sort of related to what you're asking, but not really.

Nowadays, you have LLMs. And so the magical part of LLMs, as we've all used ChatGPT, is that they're very flexible and they can kind of adapt to a lot of different situations. And they just have like a baseline intelligence around them. And so when you apply that to support or support inquiries or people that are questions that customers have, you're able to just be a lot more personalized. So that's number one, right? The personalization factor goes way up.

And that unlocks higher stats across the board. You're able to resolve more things. People are happier. Customer satisfaction is higher.

And so then the natural next step is like, okay, well, if you have this intelligence, then you should be able to do more of the things that a human can do. And what a human can do is they can pull data for you real time. They can take actions. They can reason through like multiple steps. If you're coming in with a pretty complicated question, that's like, okay, I want to do like this and that. And maybe the AI is only prepared for the first thing. LLM is smart enough to recognize that there's like two questions here. And like, first, let me resolve the first question and then I'll help you with the second one.

It was basically impossible before LLMs came out. And so that's why we're seeing now today there's like a step function in terms of all the things that technology can do because of LLMs.

And how are you defining AI agent in this context? Because people use the term agent quite broadly. And I'm curious in the context of Decagon, like what does it actually mean? So I'll say agent is more or less a system of LLMs that are working together, right? So you have one LLM call, you basically send a prompt through, you get a response back. With an agent, you want to be able to chain multiple of those together and maybe even recursively where you have one LLM call that maybe decides what to do with the message. And then that kind of leads to other calls that

pull in more data and like can take actions and iterate on what the user has said and maybe ask follow-up questions right so agent for us you can kind of think almost like it's a web of lm calls or api calls or like other logic that all works together to produce uh

a better experience. On that note, maybe if we talk a little bit more about just the actual agent infrastructure that you've built, I think one thing that is really interesting is that there's a lot of demos out there for AI agents all the time, but very few who I think are truly working in production. And it's very hard to know just from the outside, like what is real and what's not. So in your opinion, like what are AI agents today very good at doing and where does there still need to be technical breakthroughs in order to get them to be robust and reliable?

So my take on that is actually slightly different in that the differentiator between whether AI agent is just a demo versus quote unquote actually works is not so much like the tech stack, because I think most people are probably going to be using roughly the same techniques. I think once you're further along your company journey, like us, we've been around for like over a year. You have created things that are very specific to your use case.

But at the end of the day, people have access to the same models. People have access to the same techniques. I think the biggest differentiator for something working or not is actually the shape of the use case. It's hard to know this when you're first starting. But looking back, you can kind of reflect. There's two properties I would say are very important for something to evolve past the demo.

The first is that the use case you're solving, the ROI has to be very quantifiable. And that's super important because if that's not the case, then it's very hard to convince people to actually use you and spend money on you. And so in our case, the quantifiable metric is what percentage of support inquiries are you resolving?

And because there's a hard number there, people can justify like, oh, okay, well, if you're resolving more, let me map that to what I'm currently spending and the time this currently takes. And so if you have that, and the other metric for us is customer satisfaction. So

Because it's really easy to quantify the ROI, people actually adopt it. The second piece is that the use case has to be incremental. So if you basically need an agent to be super human level and solve like near 100% of the use case off the bat, that's also very difficult. Because as we all know, LMs are non-deterministic. You have to be able to have like some sort of fallback. And

Luckily, support has this nice property that you can always escalate to an agent. And even if you're like solving half the things, that's hugely valuable for people. So I think the support use case just has that property that makes it nice for an AI agent. I think there's a lot of other fields where people can create an impressive demo and you don't even have to squint that hard to see why AI agents would be useful. But yeah, maybe it has to be like perfect off the bat. And if that's the case, that

No one's really willing to try it or even use it because the ramifications of it not being perfect are kind of serious. So like security or something, right? Like people run Sims and it's pretty classic idea of like, oh, it'd be cool if LLMs could read this. But it's hard for me to imagine anyone just being like, okay, AI agent, like go, go do that. And I'll trust you to do it because if it makes like one mistake, you're kind of screwed. How clear is it, I guess, that I'm interacting with?

an AI agent versus interacting with a human versus like, is there an attempt to make it seem natural? Is it just like, this is actually, it's pretty clear you're interacting with an LLM and proceed accordingly. That's generally up to our customers to decide actually. And we see a pretty high variance. Like on one end of the spectrum, you have people that really try to personify their agents. And so there's like a human avatar, there's a human name. It's just responding naturally. On the other end of the spectrum, it's like it calls itself an AI. It basically makes it really clear. I think different companies work with have different stances on this. Like

oftentimes if you're in a regulated industry, you have to make it clear. I think now what's really cool is that you're starting to see a behavior shift in the customers. And because like a lot of our customers, they get a ton of social media posts about like, holy crap, this is the first chat experience I've ever tried that actually feels real or like this is magical. That's great for them because now they're they're they're

Their customers are learning that, like, hey, if it's AI experience, it could actually be better than a human. And in the past, that was not the case. Because in the past, like, probably all of us have been on the phone and it's just like, all right, like, AJ, AJ, AJ, right? You mentioned a couple of times this idea of, like, personalization, both in terms of

Everyone uses the same technical infrastructure under the hood, but it's about personalizing for support. Some of your customers want different types of personalization. Can you talk more about that and like what exactly it is that you do do such that you're able to get the personalization that causes people online to say like, oh, shit, this is the best support experience I've ever had for us.

there's the personalization that comes from molding to a user. So you need to have context of the user itself, right? So that's like additional context you need. And then two, you need to have the context of the business logic of our customer. If you combine those two together,

you have a pretty good experience. Obviously, it sounds pretty easy. It's pretty hard to actually get all the context you need. And so that's mostly what we build is like, how can you build the right primitives so that when someone deploys us, they can pretty easily decide like, okay,

this is the business logic we want. First, you need to do these four steps. And if the third step fails, you have to go to a fifth step, that sort of thing, where you want to be able to really easily teach the AI that, at the same time giving it access to, OK, here are the account details of the user. If you need to fetch more things, you can hit these APIs. That's the sort of layer that sits on top of the models that-- it's kind of like an orchestration layer, I guess.

that makes the agent real. It sounds like in that case, and you need a lot of access to business systems, you need a lot of information about the user, you probably need a lot of information about how the customer actually likes to interact with their users as well. And I imagine it's pretty sensitive data. So can you talk more about like,

What are the things that enterprise customers typically need assurances around when it comes to actually deploying AI agents? And how have you thought about the best way to handle that, knowing that your solution does provide a better experience, but also is new for a lot of people experiencing agents for the first time? Yeah, so this kind of comes down to guardrails. And over time, because we've done a lot of these implementations, it has become clear what types of guardrails that people care about.

For example, the simplest kind is there might just be some rules that you always have to follow. You know, if you're working with a final services company, you can't really give financial advice because that's regulated. So you kind of have to tune that into the agent and make sure that it never does that. And so oftentimes what you can do is you have a supervisor model or some sort of system set up that runs these checks before the results go out. Another type of guardrail you might have is if someone's coming in and just like

trying to mess with you. And they see that this is a generative system, and they're trying to get you to be like, OK, what's my balance? OK, multiply that by 10, that sort of thing. You want to be able to check for that as well. So there's a lot of types of these that we've found over the, I guess, months to slash year that we've been deploying these. And for each one, you can classify, OK, you need this type of guardrail for it, and then

As you build more and more, the system becomes more and more solidified. And how unique is each guardrail to each customer or each industry? And how do you think about building that at scale as you bring on more and more customers across a wide variety of use cases? This kind of comes back to our core thesis, which is in a few years, agents are going to be pervasive.

So the thing that really matters is giving people the tools and like empowering almost the next generation of jobs, I guess, which is like agent supervisors, giving them the tools to build the agents and also add their own guardrails. Because we're not going to be the ones that defines the guardrails for them. Every customer understands their guardrails the best and their business logic the best. So our job is really to be the best at building the tooling and the infrastructure for them to business.

build agents. And so that's why we keep talking a lot about like, hey, your agents shouldn't be a black box. You should have the control over how to construct these guardrails and construct the rules and construct the logic that you want to do. And so I think that's probably the one thing that set us apart so far where we've just invest a lot into this tooling and like we've come up with a lot of creative ways for people that you probably don't even have super technical backgrounds.

probably don't have the deepest understanding of how AI models work, but you can still download what's in their brain and what they want AI to do into the agent. So I think that's going to become more and more important over the next couple of years. And if people are evaluating tools like this, I think that should be one of the top criteria, no matter which type of agent you're evaluating, because you want to feel like that as time goes on, you have the ability to make it better and better.

Are there things that customers or businesses can do to prepare their systems or their practices for any sort of automation, but probably like the sort of agent in particular in terms of how they design their data systems or how they design their software architecture and business logic to be able to enable this? Because I feel like a lot of AI things, we come at it, it's very new. But then once you get into this existing legacy system, like all things, you're dealing with a lot of spaghetti and duct tape and that sort of thing. In terms of if someone was building from scratch right now,

There are a lot of best practices that will make your life easier, right? So the way you construct a knowledge base, we've written about this where there are some things you can do to make it really easy for AI to ingest it and increase its accuracy. And part of that comes down to having really modular chunks of your knowledge base rather than just having big articles that have a bunch of answers in them.

Right? So that's one tactical thing that people can do. When you're setting up your APIs, you can make them agent-friendly and set up the permissions in a way and set up the outputs in a way that makes it easy for the agent to ingest that and not have to do that much computation afterwards to find the answers. So there's stuff like that.

But I wouldn't say there's anything that's like, you have to do this in order to use agents. That sounds like better documentation. Always a good thing. And then, yeah, so like information organization, basically. It sounds like if you're trying to teach people to basically be able to prompt your agent to act in a way that has the most fidelity to the

like their customer specifically or their use case specifically. There's a lot of experimentation or I would say like new ground to be broken on just the UI and UX of how someone does that because it's so different from traditional software.

I'm curious, like, how have you guys thought about that? Like what the UI UX looks like in an agent first world? And then how do you think it actually changes in the next couple of years? Yeah, I mean, I will not have claimed that we've solved this. I think we found like maybe a local optimum that works pretty well for our current customers. But this is like an ongoing field of research, both for us and a bunch of other people. And the core problem comes down to similar to what we've been saying, right, is you have agent.

How can you, number one, see exactly what it's doing and how it's making decisions? And then two, use that to decide what

updates to make to it and like what the feedback to the AI should be. And so those are where the UI elements actually come together. And especially the second piece, right? It's like, how do you actually build an agent? Our view is that over time, it'll become more and more natural language based because that is how agents think or how this basically how like what LMS are trained on in the limit, right? If you have a fully just like super intelligent agent,

It would basically be like a human, where you can show it stuff, you can explain it stuff, give it feedback, and it just kind of updates in its mind. If you just think about having a really competent human on your team, it's like they arrive, you teach them some stuff, they start doing work, and then you just give it feedback. And you can show it new things. You can show it new documentation or show it new charts or whatever. So I think in the limit, it kind of moves towards that, where

Things are a lot more conversational, and things are more natural language-based. And people aren't just using these stopgaps of building gigantic, complex decision trees that sort of capture what you want but can break apart pretty easily. We had to do that in the past because that's all we had, right? We didn't have LMs. But now, as the agents get better and better, the UX and the UI is going to be more conversational.

A year ago, which is about when Decagon, or a little over a year ago, which is about when Decagon got started, it was very common for people to say that, you know, a lot of the use cases that are very good and very practical for LLMs were also just going to be what people called like GPT wrappers, meaning companies could just make one API call to a foundation model and solve their support challenge immediately. But clearly we're seeing with companies opting to use something like Decagon versus doing that. That hasn't seemed to be the case thus far. And I was

wondering if you could explain why that is. Like, what was it about building this in-house that was actually more complicated than people expected? And like, what did people get wrong about this whole notion? There's nothing wrong with being a GBT rapper.

You basically say that Vercel is like an AWS wrapper or stuff like that, right? I guess when people say the term, it's usually meant in a derogatory way. And I guess my view on that would be, I think if you're building an agent, by definition, you're going to be leveraging LLMs as tools, right? So you're kind of building on top of things just like...

you would normally build on top of AWS or GCP or stuff like that. I think if we really run into trouble is where the software that you're building on top of the LLM is just not thick enough or not complex enough for someone to feel like, OK, there's actually a differentiation here. But for us, I think looking back,

The thing we're selling is mostly the software. And we're basically just like a normal software company. And we're using LLMs as one of the components and one of the tools of the software. But when people pay for a product like this, they mostly want the software, right? They want to be able to have the tools to monitor and just reporting on the AI. They want to be able to deep dive into every conversation the AI is having. And they want to be able to give it feedback and build it and stuff like that, right? So that's where a lot of the...

sort of software comes from. And even with the agent itself, what people run into is like, it's pretty cool to make a demo, but if you're trying to make this production ready and actually customer facing, you have to squash the super long tail of like, yeah, protecting against hallucinations, protecting against bad actors that come in, like trying to mess with you.

We were really nailing the latency and the tone and stuff like that. And so we've talked to many teams where they've done some experiments themselves and built the initial version. And then they're like, OK, yeah, it's pretty clear. We don't want to be the ones that build this long tail. And we also don't want to be the ones that are constantly building in new logic for the CX team.

I'm like a customer team. And so it's like, okay, it kind of makes sense to go with someone. You mentioned a little bit, there's just like long tail of different things you have to quash, bad actors, et cetera. I'm sure a lot of folks who are listening who think about using AI agents are sort of worried about, you know, when you start introducing LLMs into the picture, there are new vectors for security attacks. Or when you introduce agents into the picture, there may be new security risks.

How do you guys both think about that as well as think about just like in general best practices when it comes to dealing with agents and ensuring that you still have top tier enterprise security?

There's some obvious things you can do on the security side. And so those are some of the things I mentioned, right? Like you just want protections in place. At the core, what you can do-- what people are scared about around LMs is that they're not deterministic. But the nice thing is that you can actually put most of the deterministic-- like most of the sort of sensitive and complex stuff behind a deterministic wall, where when it calls out to an API, that's where the computation happens. And so you're not really leaving that to the LM. And that basically squashes a lot of the core issues.

But then you still have situations where like, yeah, you have bad actors that come in or like people are trying to get it to hallucinate and things like that. And so what we've seen is that in all the big customers we work with, their security teams will basically come in and like red team our product essentially, where they just spend several weeks just like hammering it with all the different things that they can think of to try to break it.

And we're probably going to see that more and more as agents become more pervasive, because that's one of the best ways to actually gain confidence in does this work or not? It's like you just red team it and throw a ton of stuff at it. And that's why I like, yeah, I know there's services now that are like they're startups trying to build red teaming tools or like ability for people to do this themselves.

But I think that's a cool thing that we've seen so far. And so a lot of the companies we work with, like during the probably like the late stage of the sales cycle, they just have their own security team or they contract with some external team and they're just power testing it.

And for us to partner, we have to do good on that. So that's what it comes down to. MARK MANDEL: Is that something you encourage from your customers? Because I know when we talk about AI policy, one of the big things we talk about is the application layer and putting onus on the user of the LLM and the person running the application, as opposed to the model itself being this dangerous thing. It's like, yes, red team and figure out what use cases and what attacks and what vulnerabilities you have specifically to protect against versus just relying on whatever open AI or whoever put in place.

For sure. I also think that there will probably be new certifications that come up because like, you know how everyone's like SOC 2 and HIPAA and stuff like that for different industries. And then most of the time when you sell normal tasks, like people ask for pen tests, like we always have to provide our pen tests. That's going to be something similar for AI agents where there's probably some new thing that someone will coin a name for it. But it's like a test for the agent, like robust.

One thing that is interesting is people are very excited, obviously, about all the new model breakthroughs and tech breakthroughs coming out of all the large labs. And as an applied AI company, you're not doing the research yourself, obviously. You're leveraging the research and building a lot of software around it to deliver it to an end customer. But you are building on top of very quickly shifting sands underneath. And I'm curious, as an applied AI company, how do you manage both being able to

predict your own product roadmap and build for what users want while also staying abreast of what all the new tech changes are and how it affects your company. And just more broadly, what do you think is the right strategy for an applied AI company that might be facing similar situations? Well, you have different parts of the stack, right? So you have the LLMs, which are kind of, if you just think about the application layer, the LLMs are at the bottom. You might have like tooling in between that helps you manage LLMs or do your evals or whatever. Yeah.

And then the thing at the top is mostly what we build, which is, again, it's kind of like just standard SaaS. So most of the work we do is actually not too different from normal software, except we obviously have this extra research component of LMs are changing so fast. What can we use them for? What are they good at? Which model should we use for this task? That's a big one where you have open AI pushing new things, Anthropic pushing new things. Gemini is getting better now.

So you have to have your own evals for setting up what people are good at so you can use the right model in the right situation. Sometimes you want to fine tune, and then it's a question of when do you fine tune? When is it worth it? So those are probably the set of researchy questions that are mostly related to the LLMs that you can do. But at least so far, it hasn't felt like the sands are shifting that quickly since we're not that reliant on the middle layer right now. So it's mostly the LLMs that are changing.

they're not changing that frequently and even when they do change it's mostly an upgrade so 3.5 sonic had an update a couple months ago at this point and it's like okay well

Should we just swap it out and use that one instead of the old one? You just run a bunch of uvalves. And when you do swap it out, you just stop thinking about it because now you're on the new model. And 01 came out, and it's a similar situation. What do you use it for? In our case, it's a little bit slow for most of our customer-facing use cases. So we can use it on some more back-end things.

That's more or less what it comes down to for us. We just have to have good systems in place to do the research around the models. How often are you evaluating new models and swapping them out? We'll evaluate them pretty much anytime a new one comes out. You just have to be sure that even if...

It's a more intelligent model. It doesn't somehow break some things that your use case is built around. And that can happen. The model can overall be more intelligent, but maybe in some edge case, it's bad at choosing A or B in one of your workflows. And so that's what the evals are for. I think overall, the type of intelligence that we care a lot about for us is-- I would describe it more as instruction following, where we want the models to get better and better at instruction following. And if that's the case, it just strictly benefits us. So that's great.

It seems like a lot of the research recently has been around more of like reasoning type intelligence, getting better at coding, getting better at math, stuff like that. That's helpful for us too, but it's not as helpful as the first type.

And one really interesting thing that you brought up a couple of times that I also think is pretty unique to Decagon is you've built a lot of eval infrastructure internally to make sure that you know exactly how each model performs against the set of tests that you provide it. Can you talk more about that? Like, how core is that internal eval infrastructure? And how exactly does it give both you and your customers confidence? Because some of it is also customer facing that the agents are performing the way you would like.

I think it's very important because otherwise it's very difficult for us to iterate quickly. Because if you feel like every change you're going to make has a big chance of ruining something, then you're just not going to make changes that quickly. But if you have the eval set up, then, all right, we have this big change, we have this model change, or we have this...

new thing that that's been created let's just run it against all the evals and if they're good then you can feel like okay we improve things or we can you know ship this without being too concerned right so in our space the interesting thing is that the evals need input from the customers because the customers like our customers are the ones that decide if something is correct or not and there are obviously high level things we can check for but oftentimes it's

them coming with a specific use case and this is the right answer or it has to do this, it has to have this tone, it has to say this. And that's where the eval is based on. And so we have to make sure we have the robust system. We just started building this ourselves at the start. It hasn't really been that hard to maintain. And so we know that there are kind of eval companies out there and we've kind of explored a few of them. And maybe at some point we will see if it makes sense to adopt them. But the eval system isn't like a huge pain point for us anymore.

You know, one popular topic today is multimodality and the idea that AI agents should be able to interact across all forms that humans do today, whether it's text, video, speech, etc. I mean, I know that Decagon primarily started out as being text-based. So I'm curious from your perspective, like, how important is multimodality for AI agents? And what's the time horizon? What do you think it becomes fully mainstream or even expected?

It's important in the sense that if you're thinking about it from a company perspective, it's not that much harder to add a new modality. I mean, it's not trivial, but at the core, like if you solve for the other things, like all the things I mentioned, right, like the tooling to actually build AI and monitor it and have the logic there, then adding a new modality isn't the hardest thing.

So it makes a lot of sense for us to have all the modalities and expands our market. We're basically modality agnostic. We have our own agents for every single modality. And the limiting factor in general is one, our customer is ready to adopt a new modality.

I think starting with text makes a lot of sense because that's what people are more aggressively adopting, and it's just lower risk for them. And it's easier for them to monitor, easier for them to rationalize. The other big one is voice, obviously. I think there's still room to grow in the market for people to be more comfortable with voice. I think now we're seeing early movers actually adopt voice agents, which is exciting.

And then the other piece is obviously on the tech side. So I think most people would agree that the bar is just higher for voice, right? If you're on a phone call with someone, you need the latency to be super crisp. If you interrupt them, they have to respond really naturally. Because the latency is lower, you have to be more clever about the way you're doing computation. If you're on a chat and it takes five, eight seconds to respond, you barely even notice it. It feels very natural.

If it takes five to eight seconds before replying to you on a phone call, then that feels a bit odd. So there's more technical challenges, I would say, in voice. So as those technical challenges get solved and the market becomes more interested in adopting voice,

That's what's going to unlock a new modality like that. Before we move on, because I want to talk a little bit more about just like what the business model of AI agents looks like. Are there any last things that took you by surprise when you were either building AI agents for the first time or when you were chatting with customers about either systems they were using, data they were handling, concerns that they had? And what are just like any like non-intuitive or surprising things that Decagon had to do in order to be able to best serve enterprise customers? Right.

I think the big surprising thing was when we were first starting, how willing people were to chat with us because we're just two people. I mean, we both had started companies before and so we had known a lot more people, but still it's everyone, people who have started companies, it's very relatable, right? You're trying to get intro conversations and if what you're talking about is not that interesting to people, it's just a pretty lukewarm conversation. When we started talking about this use case, it was

I would say it's pretty surprising how excited people were to talk about it because it's such an obvious idea. And you would think that, okay, because it's obvious idea, there's people doing it or there's solutions or people would have thought of some solution already. But I think the timing was good. It was just a big use case. People really care about this. And for the

reasons I mentioned before, the use case is very well suited for adopting AI agents and like pushing them into production because you can do it incrementally, you can track the ROI. I think that was pleasantly surprising. But obviously, that's I mean, there's still a lot of stuff to do after that. Like you have to work with the customers, you have to build the product, you have to figure out what direction to take. But I think the early days, that was a bit surprising.

Kimberly, I mean, I might be remiss not to mention that you wrote this RIP to RPA blog post, which gets into like a lot of automation type tasks and startups. Is that something you see across some of these automation tasks or just things that like the solutions have not been great? So people are always on the lookout for a better way to do it? Yeah, I definitely think so. I would say a couple of things about this. The first is that if

If an idea seems obvious to people and there's no clear company who's solving it that everyone points to and says, oh, you should just use that, then that means that the problem actually hasn't been solved. And it is, in some sense, like a wide open opportunity for companies to go build it. Because, you know, we've been investors with Decagon since the beginning. We saw them

go through the idea maze. And when they landed on support and started chatting with customers, it was very clear that all the customers were desperate to have some sort of AI native support solution. And it was very common. This is the question I asked a little bit before about like, it was very common for people to believe that this was just going to be a GPT wrapper. And the level of interest that Decagon got from customers in the very early days led us to believe quite early on that a lot of these problems are just a lot more complicated than people expect.

So I think we do see this across industries, whether it is customer service, whether it is maybe more niche automations in specific vertical markets. I think one thing that is underrated is sort of what Jesse said earlier, knowing that there's clear ROI for the automation task that you're doing. Because if you're going to ask somebody to adopt an AI agent, they are in some sense taking a leap of faith because this is a very unfamiliar territory for a lot of people.

And it's much easier to get an AI agent adopted if you are automating a very specific flow that is either clearly revenue generating or was a bottleneck in the business before to get new demand. Or it was like a major cost center that scaled linearly with customer growth or revenue growth or something like that. And to be able to take a problem like that and actually make it much more productized such that it can scale in the way traditional software scales, I think is very compelling. Maybe.

Maybe one last question on this topic before we move on is, you know, I remember one thing, Jesse, when you and I were talking in the past was we always thought that when enterprises adopted software or adopted AI agents, hallucinations would be the biggest challenge that they faced or hallucinations would be the biggest thing they were worried about. I remember one thing that you told me was that's actually tends to not be the case. I'm curious if you could elaborate on that and like what it is about hallucinations that is either misunderstood in the public and what it is that people actually care a little bit more about.

I think people do care about hallucinations, but they care a lot more about the value that can be provided. And so pretty much every enterprise we work with cares about the same things, like literally the same things. It's what percentage of conversations can you resolve? How happy are my customers?

And then hallucinations might kind of be lumped into the third category, which is like, what's the accuracy? Generally, when you're evaluated, the first two matter. And let's say hypothetically, you are talking to a new enterprise and you just like completely knock it out of the park on the first two. There's going to be so much buy-in from the leadership and from just everyone in the company that like, holy crap, this will not only transform our customer base, it's

It's like the customer experience is different. Every customer now has their own personal concierge in their pocket. They can ping us anytime. We're giving them good answers. They're actually happy, any language, 24-7. So that's like one piece. And you're saving a ton of money. So there's a ton of buy-in. And there's a lot of tailwinds into getting something done. Hallucinations obviously has to be solved, but it's not really like the top thing on their mind, right? So the way you kind of address hallucinations is the things I mentioned before. Like people will test you.

There'll be probably a sort of proof of concept period where you're actually running real conversations and they have agents on their team monitoring stuff and checking for accuracy. And if that's good, then generally you're in the clear. And as I mentioned before, there's a bunch of hard protections you can put against the sensitive stuff. Like you don't have to make the sensitive stuff generative. So it's a talking point for most deals where it's not a

unimportant topic and you'll go through that process, but it's never really the focus for any of the conversations. And now switching over to the business model of AI agents. One big topic of conversation today, as I'm sure you know, is how to actually price them. Historically, a lot of SaaS software were sold per seat since you were selling workflow software specifically for individual workers to increase their productivity.

But AI agents are not tied to individual worker productivity. So a lot of people think, probably rightfully so, that seat-based doesn't actually make as much sense going forward. I'm curious how you thought about that dilemma in the early days and how you guys decided to price Decagon. And then also where you think the future of software pricing is headed more broadly as AI agents become more commonplace. Our view on this is that in the past, software is based per seat because it's roughly the

scale based on the number of people that can take advantage of the software. With most AI agents,

the value that you're providing doesn't really scale in terms of the number of people that are maintaining it. It's just like the amount of work output, right? And this kind of goes in line with what I was saying before, where if the ROI is very measurable, then it's very clear what level of work output you're seeing. Our view on this is, OK, proceed definitely doesn't make sense. You're probably going to be pricing based on the work output, right? So it's kind of like the pricing that you want to provide has to be a model where the more work you do, the more that gets paid.

So for us, there's two obvious ways to do that. There's like you can pay per conversation or you can pay per resolution, like a conversation that the AI actually resolves. I think one fun learning for us has been that most people have opted into the per conversation model. The reason is that per resolution, the main benefit is you're paying for what the AI is doing. But then the immediate thing that happens next is what is a resolution?

And first of all, no one wants to get into that because then it's like, OK, well, if someone came in and they're really upset and you sent them away, why are we paying you for that? So that's a weird situation. And then it makes the incentives a bit odd for the AI vendor because then it's like, OK, well, we get paid per resolution. So why don't we just resolve as many as possible and just deflect people away when there's

a lot of cases where it's kind of a toss up and the better experience would have been to escalate and like customers don't like that. Right. So it just creates a lot more simplicity and predictability on the per conversation model. And how persistent would you say you believe the pricing to go going forward? Because, you know, right now you're getting comped to when you say ROI, often it's like ROI on probably some kind of labor spend that was historically used or something like that. As agents get more and more common, do you think you'll be compared to labor long term and that that's the appropriate benchmark or?

or not? And if not, how do you think about long-term pricing to the value beyond the labor cost? I think it will probably be mostly incurring the labor costs because that's what is exciting about agents, right? Is that you have

all this spend in the past that was going towards services, that size of the spend is probably like 10 to 100x the software spend. So a lot of that's going to move towards software. And so when it does, the natural benchmark is, of course, the labor. And for our customers, ROI, again, it's very clear, right? If you're saving like X million in labor costs, then it makes sense to adopt a solution like this. But it'll probably be somewhere in the middle, right? Because there will be other agents that come out

Even if they're not as good, they set prices in this kind of the classic SaaS sort of situation where you're competing for business. What do you think the future of current SaaS incumbents is in the world of AI? Either given that their products are maybe not architected to be AI native or the way they price is seat-based and therefore they just aren't really adjusted to an outcomes-first pricing model. Yeah, it's a little bit tricky for incumbents if they're trying to launch agents because it just canalizes their seat-based model, right? Yeah.

If you don't need that many agents anymore, then it's kind of tricky if the new thing you're pushing just eats up your current revenue. So that's one thing with incumbents. But it's also hard to say. The incumbents always have the power of, hey, we have distribution, right? The product doesn't have to be as good, but people don't want to go through the effort of adopting a new vendor if it's like 80% as good. So number one, if you're a company like us, you have to make sure that you're 3x as good as the incumbent offerings.

And then, two, the issue, it's like the classic incumbent versus startup thing. Incumbents have less risk tolerance, naturally, because they have a ton of customers. And if they're iterating quickly and something doesn't go well, that's a big loss for them. Whereas, younger companies can always iterate a lot faster. And then the iteration process just inherently leads to better product. And so that's the cycle. And for us, we always want to pride ourselves on shipping speed, quality of the products,

just how hardcore our team is in terms of delivering things. And so that's how we've been winning our current deals. I love for

I'd love for you to make any predictions on the future of AI in the workplace, either like how it'll change staffing needs or capabilities or how human employees and AI agents will have to interact or different types of like best practices or norms that you think will become commonplace in the workforce as AI agents become more prevalent. Yeah. Number one thing is.

We have pretty high conviction that the amount of time people spend going forward in the workplace on building and managing agents, kind of like the AI supervisor type role is going to shoot through the roof. Even if your title is not officially like AI supervisor, it's like whatever you were doing in the past, a lot of that time is now going to be on managing the agents because the agents give you so much leverage.

So we've seen that with many of our deployments as well, is that the people on the team that were leading the team, they're spending a lot of their time monitoring the AI, checking to make sure that nothing needs to be improved, or making changes, and monitoring how is it going? What are the overall stats looking like? Is there a specific area we need to be focused on? Is there a gap in the knowledge base that could help the AI just be better? And can the AI fill that in for me? There's just all this stuff that comes with working with agents.

that the amount of people's work hours that go towards working with agents is going to go straight up. And that's our core thesis for the company, right? As I mentioned. And so that's why our whole product is built around giving people tooling and visibility and explainability and control over the AI agents. And in a year, I think this is going to be a huge thing. Makes sense. What do you think are the capabilities that an AI supervisor needs going forward? Like what is that skill set?

MARK MIRCHANDANI: There's two sides of it. There's the observability, explainability piece of can you just very quickly grok what the AI is doing, how it's making decisions?

Second side of it is the decision making or like the sort of not decision making, like the building part of how do you give a feedback? How do you build new logic? Those I think those are the two sides of the coin. And is there any type of work that you think either in the medium to maybe long term AI agents will not be able to handle and it's actually incumbent upon humans to still be able to manage and do properly? I think it will mostly come down to the point I was making earlier around like how perfect something needs to be. I think there's a lot of jobs where like the

bar for error is like super low. And so what will usually happen in those cases is that any AI tooling ends up being more of a co-pilot rather than like a full agent. Maybe in like the more sensitive industries like healthcare or security or whatever, where you have to be like almost perfect. Yeah, then the agents, I think, are going to be less autonomous, which is not to say that they won't be useful. Like I think the style is going to be a bit different. Whereas in a space like ours, you're really just

You're kind of deploying these agents to be autonomous and for them to complete the whole job. There you have it. Another episode in the books. If you found this interesting and or informative, please do rate the podcast and share it far and wide. We should be publishing one more episode this month before retooling for the new year. So thanks for listening. And depending on when you hear this, happy holidays.

Can AI Agents Finally Fix Customer Support? 44:12 Share