We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Behind the Numbers: Next-Gen AI: From Assistants, to Autonomous Agents, and Beyond

Behind the Numbers: Next-Gen AI: From Assistants, to Autonomous Agents, and Beyond

2025/4/18
logo of podcast Behind the Numbers: an EMARKETER Podcast

Behind the Numbers: an EMARKETER Podcast

AI Deep Dive AI Chapters Transcript
People
D
Dan Van Dyke
J
Jacob Bourne
Topics
Marcus Johnson: 我很好奇,什么是AI智能体? Dan Van Dyke: 最近我学习到,AI智能体其实是一个范围很广的概念,它包括从简单的聊天机器人到能够自主执行任务的工具。真正的AI智能体能够根据预定义的任务,自主地采取行动并使用工具。 Jacob Bourne: 我认为AI智能体和AI聊天机器人最大的区别在于自主性。聊天机器人需要一步一步地提示,而AI智能体能够在没有明确指令的情况下自主执行任务,从而完成你想要达成的目标。 Dan Van Dyke: AI智能体的定义还在不断变化,目前主要取决于自主性和对工具的访问权限。 Jacob Bourne: 目前AI智能体的实际发展水平与行业最终目标之间存在差距,现有的AI智能体只是朝着最终目标迈出的增量步骤。 Dan Van Dyke: 我认为‘Agentic AI’是一个更广泛的概念,它包含了真正的AI智能体和一些具有类似功能的技术。 Jacob Bourne: 未来AI智能体的发展方向是多智能体系统,多个AI智能体协同工作以实现共同目标。此外,未来的AI智能体将能够预测用户的需求,无需用户过多提示就能完成任务。 Dan Van Dyke: 虽然AI智能体很容易快速搭建原型,但要将其投入生产并满足实际需求却非常困难,这需要大量的评估和迭代。 Jacob Bourne: AI智能体出现错误的风险比聊天机器人更高,因为AI智能体可能进行在线交易等操作。 Dan Van Dyke: 构建和正确使用AI智能体需要一定的技术能力,但随着工具的改进,学习曲线正在下降。 Jacob Bourne: 目前AI智能体的采用率较低,部分原因是人们对AI智能体的定义存在差异,许多所谓的AI智能体实际上只是AI助手。

Deep Dive

Shownotes Transcript

Translations:
中文

This episode is made possible by Connective Media by United Airlines. Connective Media by United Airlines is redefining traveler media with a world-first omnichannel network from in-flight to online and in-app experience best-in-class tech helping brands engage travelers where it matters most. Are you ready to make an impact? Of course you are. Discover more at connectivemedia.com. That's connective with a K.

Hey gang, it's Friday, April 18th. Somehow. Dan, Jacob and listeners, welcome to Behind the Numbers, a new marketed video podcast made possible by Connective Media by United Airlines. I'm Marcus and today we'll be discussing what's ahead for AI. Join me for that conversation. We have two people. Let's meet them right now. We start with our VP of Gen AI based in New York. It's Dan Van Dyke.

Thanks for having me, Marcus. Yes, sir. Of course. We're also joined by our technology analyst on the other coast down in California in the Bay. It's Jacob Bourne. Thanks for having me, Dave Marcus. Absolutely. Absolutely. All right. So today's fact.

Right before we hit record, Dan said, I've got one for you. So I wasted three hours today on my one. But what I said to Dan is we're going to compete. So Jacob's going to ref. Dan, you can go first. Okay. So today I learned that sloths, my favorite animals, will only use the bathroom. I'm going to say that broadly once a week.

And they'll come down from their trees and they excrete one third of their body weight, one third. And they take the time to dig a hole and then they bury it and they risk their lives in the process. And nobody knows why. So that is my fact of the day. Interesting. Slow metabolism, I guess. Yeah, I guess so.

Marcus, I think this is the second time that sloths have come up in the past few days. It is, really? Yeah. I think, yeah, I think before I was talking about how they can hold their breath for 40 minutes.

What? I don't know why you would need that. I mean, maybe for deep sea diving in the Great Barrier Reef. Is that true? For 40 minutes. Yeah. Yeah. Shocking amount. I think it's one of the most, if not, yeah, it must be. Well, maybe if the most. How did this come about, Dan? Oh, I was just on Reddit. Okay. That's how it happens. All right, cool. That is a good one. I've got one for you.

Jacob, don't be swayed, but I did invite you on the show. Okay, so Denmark holds the Guinness World Record for the oldest continuous use of their national flag since 1625. They've been using the same flag. So I went down a rabbit hole of flags because I'm cool. So a lot of flags look very similar, I found out, which is another fact about flags. Chad and Romania's are literally identical. Mm.

Romania's flag came first by a hundred years. All right, Chad. So you stole theirs. Senegal and Mali have the same one, but Senegal's has a little star in the middle. Indonesia and Monaco both have two horizontal stripes, red over white, but their dimensions differ. New Zealand and Australia are the same, but the stars are different.

colors. Venezuela, Ecuador, Colombia are yellow, blue, red, horizontal bars, but have different emblems in the middle. Two more for you. Luxembourg and the Netherlands are red, white, blue lines, but the blue slightly different shades. And Slovenia, Russia, and Slovakia are all white, blue, and red horizontal bars, but with different coat of arms. Two reactions. One, that's like 42 facts. And two, remarkable that you could say all that without stuttering. I'm very in awe. I practiced. Who is the winner? Jacob.

Come on, Jacob. You know, I mean, the sloth is always... Anything about animals is memorable. Ah, you make good points. Of course he was. But this year's made me think something for the first time, which is that, well, we don't really think outside the box much with flags, do we? Yeah, not at all. Not a whole lot of creativity there. Not at all, no. And they're all grouped together, so I guess...

regions change but you know the flags stay very very or quite similar based on uh whereabouts the countries are that are coming up with them but yeah not a lot of tone of creativity uh yeah dan absolutely wins of course it wasn't even close anyway so it's real topic the dawn of ai agents and also the ai native company

Alright, everyone's talking about AI agents, barely anyone knows what they are, writes Isabelle Busquets of The Wall Street Journal. She notes that AI agents are broadly understood to be systems that can take some action on behalf of humans, like buying groceries or making restaurant reservations. But in some cases, the question of what constitutes an action is blurry. Dan, I'll start with you. What is an AI agent?

Yeah, so I actually had an education on this recently. I was talking to a vendor of one of those AI native companies, getting a demo from them. And I use the term agents wrong. And the person on the other end of the phone or Zoom

politely sort of told me that really there's a whole spectrum from you know, the chat bots like chat GPT to workflows that are like rigidly orchestrated but a little bit more robust than a chat bot to what fits into the you know term agent in the classical sense. So agent means a AI based

tool that can take action based on a predefined task with autonomy and use tools. So that's kind of what defines an agent. But Jacob, you've actually written on the subject, so I'm curious if that gels with your definition. It does. I mean, I think that the

Well, first of all, it's a buzzword at this point. And so your story, Dan, is kind of relevant because these are technical terms that become commercialized and become part of the consumer marketplace, and then it takes on new meaning. But I think kind of just distinguishing between Gen AI chatbots or Gen AI tools and agents, I think it's really about the level of autonomy. With a chatbot, you have to prompt it for every small task. With AI agents,

It can take an action without that need for step-by-step prompting. So it can kind of do things in the background that you didn't necessarily tell it to, but it's all geared towards the goal that you want, essentially. Okay. So, I mean, you said wrong. You said you said it wrong, Dan. I mean, different?

Maybe. It feels like if you ask 100 people, you get 101 responses, even if you are talking technical terms. One from Tom Kosho, Senior Director Analyst at Gartner, says, Does the AI make a decision and does the AI agent take action? Software needs to reason itself and make decisions based on contextual knowledge to be a true agent. And there's another quote here from Robert Blumhoff, CTO at Akamai Research.

technologies and said many use cases he said today resemble assistive agents rather than autonomous agents requiring direction from a human user before taking action and narrowly focused on individual use cases. He does say it's a bit of an oxymoron and assistive agents and agents supposed to do it for you. But what do you think of those

variations of definitions. I think it reflects the fact that the goalposts are shifting for what constitutes an agent. For now, it's kind of like, what's next? And the threshold is defined by the level of autonomy and the access to tools, but the capabilities of the baseline, so what you can get within ChatGPT,

really resemble a lot of the characteristics that you were describing, Marcus, in that ChatGPT can decide to search the web based off the request that you ask it. It can invoke different tools like image generation. Does that constitute an agent? And so as nice it would be to be able to come up with a crisp and specific definition for what constitutes an agent,

It is a murky term and the definitions are changing over time. Yeah. Jacob, there are levels to this, right? And I'm surprised. I mean, with autonomous driving or references quite a lot, there are the six levels from zero to five various levels of autonomous cars.

I'm surprised that agents don't have something similar because Miss Lin, Bell Lin, who writes for the journal, was saying AI agents can perform simple tasks like ordering office supplies. Eventually, you know, some enterprises want to get them to financial transactions and hiring new workers. But that's quite a variation in difficulty. Yeah, I mean, I think that's a great analogy you're making there or a comparison anyway with the autonomous vehicles. I think the difference here is that autonomous vehicles are doing a very specific task.

drive your car, right? With AI in general, I mean, it's potentially anything, right? Anything that a human could do, at least that's the vision with artificial general intelligence. And I think what this really highlights here is there's a bit of a disconnect between the vision for the AI sector, AI companies building this, and where the technology currently is at.

So the vision is boundless automation, essentially. Artificial general intelligence that can do anything a human can do. I think that's the vision, but it's far from getting there. And so a lot of these terms become sort of incremental steps towards what the ultimate goal is. If you think about the initial agents that launched, like OpenAI's TAS, for example,

very limited automation, very limited capabilities, but we're still calling them agents. And I think it's just, they're really incremental steps towards where we'll eventually get, which is that you have AI tool or, I mean, agents that can really, you know,

handle very complex tasks that a human would do. And really, it means people giving up a lot of that sort of micro decision making to the AI that's really operating fully in the background. It's quite ironic, actually, Zoe Weinberg, venture capital investor, was saying it's ironic to see a term that started out describing human agency as

being used to talk about its opposite technology operates with little to no human oversight. Dan, we were talking before this recording about this quote from Erin Grither for the New York Times. She says, after AI agents comes agentic AI. How are they different?

I don't know if I agree that it's what comes after. Oh, interesting. Okay. At least according to my definition. And we've already touched on how subjective and non-uniform those are. The way that I define agentic AI is sort of an umbrella term that encompasses...

both agents in the true sense of the word. So Jacob, you were talking about tasks. OpenAI has also released Operator, which can browse the internet on a user's behalf and do things like attempt to book a flight or Deep Research, which can write a research report by browsing tons of sources. So those are true agents, but there's a middle ground that's sort of like above what ChatGPT can do.

but below the capabilities of a true agent. So an example of that would be I'm building a lot of workflows to assist our research team in gathering content from Feedly, curating it, and writing what are known as research blog posts, which is like an internal tool. And although it's tons of large language models strung together, and although there's a high degree of

prompting and complexity in this workflow. I wouldn't call it an agent. I would say that it fits within the realm of agentic AI. But to your question of like, what's next? I would say multi-agent workflows is the thing that's next. And what that means is like, think deep research, which can write reports meets, you know,

Writes a report that ends up triggering 10 operators to go out and accomplish 10 different tasks, all in service of a user's request. It's like starting to build to an organization all working in unison towards a common goal.

Yeah, I 100% agree with that. I think it's really about these different AIs or different types of AI with different skills in themselves coming together to accomplish more. I think the next step is also AI agents that can anticipate users' needs so you don't really need to hardly do any prompting. It knows what you're going to need in the future and is already working on it in the background. I think, though, for AI,

Daily purposes, ultimately, and already we're seeing that these terms get used interchangeably. And so I imagine that, again, the deeper meaning or the technical meaning probably will get lost eventually. Yeah. Yeah. I liked it. You said down this kind of umbrella term, Dr. Andrew Ung, a prominent AI researcher, was saying there's a gray zone and that agentic is an umbrella term encompassing tech that wasn't strictly an agent,

but they had agent like qualities. We talked about some of these agents, at least one, I think Dan, you mentioned OpenAI, maybe Jacob, OpenAI tasks. Who else do we have? What are some other examples of kind of popular AI agents at the moment?

I mean, there's all kinds from most of the tech giants, leading AI companies. I mean, Amazon has its Bedrock agents through the cloud. Google has its Vertex AI agent builder. Also, Google has agent space, which just announced that its agent now has coding capabilities, so autonomous coding. Okay.

Dan mentioned Operator from OpenAI. Oracle has a clinical AI agent for healthcare. NVIDIA has an agentic AI blueprints, which allows organizations to create their own custom agents.

Microsoft Salesforce has agent force and the list goes on and on. There's also more industry specific agent platforms as well. So are they interoperable, Jacob? Can they speak to each other? I mean, Dan was talking about a multi-agent world. Is that within the umbrella of Google, within the umbrella of the ecosystem of Amazon, or do they talk to each other across companies? Well, I think that that's the, that's part of the vision. And I think they're working towards interoperability, but I wouldn't say that we're quite there yet.

Two recent steps that have brought us closer are, and I guess recent is a stretch for this one. First, the introduction of MCP. So MCP stands for Model Context Protocol, which is released by Anthropic, the creator of Clods. And Model Context Protocol is simply a way for agents to be able to access tools. So think, you know, accessing GitHub repos or accessing...

databases or Zapier for automations, it's a simple way that is very elegant and is becoming sort of the mainstream accepted standard to connect AI to all these other things that exist on the open internet or even local files on a user's computer if they give it access. And then secondly, the new A2A

Alex Blanche: Protocol, which was released by Google aims to complement the capabilities of MCP by allowing for agents to communicate to agents in a common sort of language. Alex Blanche: And so that that vision of interoperability is starting to come a little bit more into focus.

But the reality is quite fragmented, as Jacob, you were painting a picture of, where everybody wants to be the de facto home for agents. And I think we're inevitably headed towards consolidation as a provider starts to emerge at the forefront. But for the moment, it's just getting increasingly crowded, competitive, and fragmented. Yeah. So...

a lot of options out there in terms of different agents that you can choose. And OpenAI, artificial intelligence company, released a platform that lets companies create their own AI bots for completing tasks such as customer service and financial analysis. Bell Lin of the Wall Street Journal is noting this. Dan, we talked before, I think it was last week, and you said to me that part of this conversation

maybe a part that's not being discussed as much, is that AI agents are hard to build right. What did you mean by that? I mean, they're very easy to build, period. So you could spin together a prototype with...

a couple of hours if you're technical enough and it'll be really impressive. Once you try to push that into production to fulfill a real need that you have in an organization or build something that would be client facing is where you start to encounter difficulties. And that's why the eval process is actually the most crucial

part in measuring the efficacy of agents and where a lot of people will get hung up is they'll realize that for a particular task what they really need is 95% accuracy to meet the baseline that they have with people and an AI agent maybe right out of the box will get to 80% accuracy but that last 15% is actually 80% of the effort and so what

I was describing in some of the workflows that I've strung together in assistance with our research team actually turned into a very protracted process of figuring out evaluations, pushing new iterations out to the research team, having them come back to me and realize, oh,

I didn't ask for this feature and it's actually crucial. And then doing that again and again. And it's through no fault of the research team. It's just you don't know what to ask for until you're actually deploying these in the real world and seeing where they fail. And so it's a much more...

difficult process than it looks like on its face to take something from very promising POC to something that's actually in production and starting to create value, which is not to say it's impossible. In fact, the thing that I had been describing for the research team,

They're really positive about it. It's really useful right now. But I'm already looking into new capabilities that would make it even more useful. So it's definitely a journey and easy to get sucked up in the hype and think that it's going to be fast.

it is yeah yeah just to add to that i mean i think we all know about the you know the issue with chatbots hallucinating uh it's well documented with lots of examples um but you know the risk there is that okay you have something problematic in a chat box an output that's you know erroneous or problematic in some other way but when you have ai agents that are potentially making transactions online

If they get stuff wrong, then the stakes are a bit higher. And so I think that makes it difficult on, I mean, on the technical level in terms of putting in safeguards to reduce the likelihood of that happening, but also just deploying it commercially, knowing that there's that risk there, I think makes it difficult. Yeah. Yeah.

Yeah, this is not something that happens overnight. And Greg Shoemaker, Deco Group Senior VP of Ops and AI, had a good quote saying, companies should approach agents less as a tech deployment and more as the development of digital workers that need to be onboarded and trained. Dan, you mentioned a word which I thought was interesting, which was, I think you said something to the effect of it, that kind of technical ability. Evolvism, maybe.

With just the fact that you have to have some kind of a technical understanding of how these things work. I'm wondering if that's part of the problem is that this is hard. OpenAI was even saying to use its AI agent building platform, enterprise developers still need to have a comprehensive technical background. So how proficient do you have to be with AI to be able to build one of these agents and build it right to your point, Dan?

Well, I've been covering the AI space for maybe eight or so years, but primarily from a financial services lens as formerly the head of financial services research within eMarketer.

And recently, I would say two and a half years ago with the advent of ChatGPT, if I'm getting that timeline right, started to focus more and more of my workday, now 100%, on AI and starting to build POCs and applications that

have transitioned into it becoming my full-time focus. And so over the course of that amount of time, say two and a half years, I've gotten to the point where now I feel proficient enough that yes, I could build a POC. Yes, I could do evals that would help get something into production. And in fact, I've done those things. But it did take years. And that time is spent figuring out things like

How do you set up a GitHub account? And what is the importance of not hardcoding environment variables into repos that you push into production? All these arcane terms that really have real world consequences if you're talking about an application that you're building, putting out into the world that would otherwise become a mess of spaghetti and quickly attacked by hackers and you become a cautionary tale.

I think it is well put that there is a learning curve that you still have to overcome, but that learning curve is rapidly dropping as tools like Cloud 3.7 become more effective helpers. That's led to the emergence of something called vibe coding, which is somebody like me is just describing, "Here's what I want.

I'm proficient enough that I can describe, here's the platform I want you to use. Here's what I want you to avoid. I can kind of guide it every now and then, but it's just a lot of like, I get errors and then I'm saying, help me fix this error and I'm going to feed you documentation, which helps. But I don't want to overstate it. It's quite often frustrating, mind-numbing work and hopefully increasingly less so in time.

But it's a great example of how someone in-house can learn it. You know, you don't have to hire externally someone who studied it and is a, you know, got a PhD in it and had been at a company for 20 years. Someone in-house, actually someone in-house has an understanding of internal processes and what the company needs and a relationship with those people at the company as well. And so there's an argument to be made that maybe that is better perhaps, Jacob. Right. Yeah. Just to note too that, I mean, I think

things are changing. I mean, just yesterday, Google cloud announced its new no code agent designer, which is launched specifically to tackle this problem of, of how do, how can non-technical, uh, people, you know, take advantage of developing their own agents. And so I think this is something we're going to see more of to meet that need. So listen, when this gents, uh,

AI agent adoption, it seems as though it's been extremely limited so far. I have one data points from Mr. Kosho who I mentioned earlier from Ghana. He was saying just

6% of 3,400 people in a recent Gartner webinar on the subject said that their company had deployed agents, just 6%. There's an argument to be made, look, you've deployed one. There's also an argument to be made about, yes, but did you deploy it well? How advanced is it? Dan, you were saying you can do it, but they're hard to do right. Dan, I'll start with you for this one. What...

What do you see the next couple of months looking like for agents for agent deployment in I'd say yeah, we're only April. So maybe I should just say 2025 because in a few months as I was saying before the show it's Christmas.

I think by the end of the year, you'll probably get into the low tens up to 20% adoption if you re-ran that same study, if I had to guess. And that will be as a result of more companies releasing agentic platforms so that the developer workforce who is eager to build these tools

can go out and build on permissioned, secure platforms that they're already using. And additionally, you'll start to see a trickle of folks internally. I'm thinking about very advanced AI users that we have within eMarketer, like Henry Powderly, for instance, going out and starting to pick up skills and build their own tools.

So I think we'll see a convergence as both groups start to build more agents. And I'm excited to see that continue to grow into 2026. Yeah. I agree with Dan's forecast there. And Marcus, I think that that 6% number is,

It does seem low, especially since there was other data that indicated adoption was higher. And I think the issue here goes back to what we're saying about what constitutes an AI agent.

And the lower number points to the fact that I think the adoption of true agents is very low. But I think there is definitely a lot more adoption of AI assistants that are getting called agents, which pushes the data up a bit in terms of adoption. And we're going to continue to see this in terms of

okay, are you actually using an agent or not? But as the technology gets better and we achieve a higher level of automation, then I think it'll become more clear over time. - Yeah, Dan mentioned Henry, Henry Padley. He was on with Gargia Sevella, who,

both talked about using AI at work. The two-part episode or series, if you will. I think it was March 31st, April 4th, both those episodes came out. So check those out. That's what we have time for, for today's episode, unfortunately. Thank you so much to my guests for hanging out with me today. Thank you first to Jacob.

Thanks for having me today, Marcus. Appreciate it. Yes, sir. Thank you, of course, to Dan. Thank you. Absolutely. Thank you to the whole editing crew, Victoria, John Lance, and Danny. Stuart runs the team, and Sophie does our social media. Thanks to everyone for listening in to Behind the Numbers, a new marketer video podcast made possible by Connective Media by United Airlines. We'll be back on Monday, happiest of weekends. ♪