We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

25 Agent Predictions for 2025 - Part 2

2024/12/27

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Insights AI Chapters Transcript

People

Nufar Gaspar

英特尔设计部门的 AI 无处不在和通用 AI 总监，专注于推动 AI 技术在企业中的应用和发展。

Topics

Nufar Gaspar: 新的认知架构将使构建更好、更安全的AI智能体成为可能。现有智能体存在通用性和约束性不足的问题，导致性能不可靠。新的架构通过提供防护措施和框架来控制智能体，从而提高其可靠性和效率。开发智能体需要专门的工具和框架，以简化开发流程，提高可靠性，并实现智能体之间的协调。重点领域包括应用开发框架、可观察性和测试工具。企业需要根据自身需求选择合适的工具和人才。多智能体系统将变得越来越普遍，因为多个智能体协同工作可以提高效率和结果准确性。多智能体系统类似于跨职能项目团队，每个智能体承担特定角色。智能体需要具备多模态能力，才能更好地感知和理解环境，例如处理视频、音频和图像。Google的Project Astra就是一个很好的例子。多模态能力将为残疾人士和企业带来更多可能性。学术界和开源社区对智能体研究的投入将加速其发展。大量人才的涌入将推动智能体技术快速发展，并带来更多令人兴奋的应用。新的接口、标准和协议将出现，以支持智能体与计算机以及彼此之间的交互。这将涉及到新的API和协议，以及更严格的沟通方式。需要针对智能体开发新的基准测试方法，以更全面地评估其性能。现有的LLM评估方法不适用于智能体，新的基准测试需要考虑多步骤推理和开放式思维等因素。将出现专门为智能体设计的LLM，以更好地支持其自主活动。这些LLM可能在通用基准测试中表现不佳，但它们更适合智能体的特定需求，例如多步骤推理、长期记忆和上下文保留。风险投资将大量涌入智能体公司，这将导致大量新公司的出现，以及现有公司对智能体产品的投资和转型。夏季媒体将出现关于智能体是否被过度炒作的辩论。媒体的炒作可能会夸大智能体的能力，但智能体技术的价值仍然存在。智能体将与AGI讨论交织在一起，并加速AGI的发展。OpenAI的O3模型在ARC基准测试中超越了人类水平，这将引发关于AGI是否已经到来的讨论。 2026年将比2025年对智能体来说更重要。2025年将是智能体技术发展的关键一年，而2026年及以后的几年将是智能体技术真正发挥其巨大潜力的时期。 NLW

Deep Dive

Key Insights

What are cognitive architectures in the context of AI agents?

Cognitive architectures are blueprints for building intelligent and autonomous systems, essentially designing the 'minds' of AI agents. They provide guardrails or frameworks to control agents, improving their memory and capabilities, and preventing them from becoming too general or unreliable.

Why are multi-agent systems expected to grow in 2025?

Multi-agent systems involve several AI agents working together, each with a specific role, similar to a cross-functional project team. They are expected to grow because they can handle more complex tasks and deliver better results by combining specialized agents, making them more practical and scalable for enterprises.

What is the significance of multimodal abilities in AI agents?

Multimodal abilities allow AI agents to perceive and interact with their environment using multiple senses like video, audio, and images. This enhances their ability to perform tasks more like humans, opening up new use cases, especially for accessibility and enterprise assistance.

How will agent-oriented LLMs differ from traditional LLMs?

Agent-oriented LLMs will be purpose-built for autonomous activities, prioritizing multi-step reasoning, long-term memory, and context retention. Unlike traditional LLMs designed for broad tasks, these models will be tailored to enhance agent performance, potentially using a mix of models for different tasks within an agent.

What is the predicted impact of agents on AGI discussions in 2025?

Agents will accelerate AGI discussions as they demonstrate increasingly autonomous behavior, especially when using advanced models like OpenAI's O3, which has surpassed human benchmarks in certain tasks. However, 2025 is not expected to be the year of AGI, but agents will blur the lines and reignite debates about AI's human-like capabilities.

Why is 2026 expected to be even bigger for AI agents than 2025?

2026 is expected to be a pivotal year as enterprises will likely deploy agents at scale, integrating them into their workforce. The learnings and developments from 2025 will set the stage for broader adoption, leading to significant advancements in how work and life are transformed by AI agents.

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, part two of 25 agent predictions for 2025. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.

Hello, friends. Here we are back with part two of our 25 agent predictions for 2025. It's not strictly required that you listen to part one first. However, I would recommend it. Once again, we are joined by Nufar Gaspar, the director of AI Everywhere and Gen AI for Intel Design. Nufar brings the perspective of someone who has built AI products inside Intel, helped with broader AI transformation, and thinks about these issues professionally and personally all the time.

In the second part, we talk about technology as well as financial trends and close out with a big vision for where this is all headed.

All right, and we are back once again for part two of this conversation around 25 predictions for AI agents in 2025. We've talked about all sorts of things, a lot of ground setting in part one. And now we're digging into some of the more kind of discrete and specific technology predictions. Kicking off with number 14, new custom cognitive architectures will enable better and safer agents.

So what do you mean by this? Yeah, so let's start by defining what a cognitive architecture is. It's basically a fancy term for like a blueprint or a building for a blueprint for building an intelligent and autonomous systems.

And you can think about it as designing the minds of the agents. So maybe some of you have heard about agents over a year ago where auto GPT and baby AGI, those were the tools that everyone discussed and they never took flight. And the reason for that is that they were too general and unconstrained. And thereby they had an unreliable performance.

And with the newest generation of agents, there were an introduction also of new custom cognitive architectures by many individuals and companies. And those provided a lot of guardrails or sometimes referred to as scaffoldings and frameworks for controlling these agents. And thereby, with an improved memory and improved capabilities, those kept the agents much more focused on what they are trying to do and prevent them from flying off these rails.

And because they were so successful in 2024 by bridging the gap of being too loose to getting to actual results, there are so many labs and companies working to improve them even further. And this will probably continue in 2025 and we will get even better results with that.

I actually want to bring in your 15th prediction because I think it's related and we can discuss a little bit. The development of new tools, frameworks, and conventions for agent development and management.

Right. So, you know, up until now, often we're using the same tools for a new technology. And with the rise of agents, we do need to have more dedicated tools for agent development. They should be explicitly designed for building agents in order to streamline and speed up the process of billing these agents.

Some of the focus areas will be on the application development, so we will for sure see more and more frameworks. We also already have the land graph and other open source capabilities that we're seeing, but more and more libraries and frameworks will probably emerge to help developers build the backend or the backbone of these agents.

in order to make them more reliable and also to orchestrate either between agents within the same system or to orchestrate the relationship between agents talking to one another. And we'll talk more about it in the next predictions. The other area where there will be a lot of focus, in my opinion, is on the observability and the ability to test these agents. They will help the developers be much more confident about whatever they're building or to debug their agents.

anticipate whatever thing that needs to be improved, as well as the costs that are currently not that predictable. And whenever we want to really understand or govern or provide visibility to our customers about what the agent actually did, those observability will become even more critical as part of the development building blocks that we will have.

So my guess is that a lot of our listeners who come from the enterprise or business world, a lot of those words that you just said would sound like total Greek to them. How much do you think, how much understanding do companies that are thinking about exploring agents and piloting agents, you know, for their companies, for their enterprises, how much do they need to understand about all of this?

Okay, so of course I'm a bit biased because I've been among these Greek people who build AI capabilities for literally all of my career. So I am constantly thinking about these things.

And I do think that for the organizations that want to have a tailored set of capabilities because improving the outcome of even with a fraction of percentage have bottom line implications, they will for sure have teams that are experts in building agents or utilizing AI and they need to understand because they're in a position where they're not fighting for the 80%. They're fighting for the additional 20%.

So if you are from a company that utilizes AI to create a competitive advantage that is very unique, you will probably have to have people that understands that. For the early stages of agents, you will probably be able to utilize out-of-the-box capabilities and you don't have to go down that specific rabbit hole. I perceive that some of the listeners, even if they're currently not at this point, they might want to get there eventually,

maybe later in 2025 or the years to follow. Yeah, so that's where I land on this. I think that there's going to be plenty to experiment with next year that is very point and click. There will be some amount of...

In fact, what you're seeing, we're seeing a lot of agent companies do the forward deployed engineer thing where they're actually embedding a developer inside companies to help customize agents for their particular data set and their particular environment. Sierra is doing this and others are as well. And so there will be a lot of support, I think, for those initial pilots and deployments.

And so I don't think that the lack of understanding of this should be a an a priori barrier for digging in. However, I also think that the more that there is some amount of institutional understanding around these these topics and particular an ability to assess.

or at least have the right support to figure out and assess where the current agents that are being tested or deployed sit relative to new capabilities that are coming online and what's likely to happen in the future, the better organizations will be able to make good strategic decisions. I think that the challenge is,

that this is going to be such a fast evolving landscape of solutions that it's not really going to be as clean as, you know, we piloted an agent, we liked it. And so we deployed it and then cool, we've got our agent figured out. It's very likely that that's a process that's going to be, you know, continuously reinterpreting and retrying things as capabilities improve.

And this competition, you know, expands the boundaries of what's possible. So building a learning organization that can actually understand this on a deeper level is going to be, I think, pretty, pretty essential. Yeah. And even if you just buy, the ability to define the right requirement for the vendor will probably have you at least talk the talk to some extent. Okay. Number 16, growth in the number and practicality of multi-agent systems. Okay.

Okay, so this is an exciting one. Again, you don't have to be scared about the technological aspect, but just a brief explanation of what a multi-agent system. These are systems where we have several AI agents working together to accomplish a goal.

And typically each agent has a specific role. They will often act just like a cross-functional project team. So that's the best analogy that there is. And in many cases, people who are building these agents will really give each agent like a title that really seems like a job title.

If you want a concrete example for a coding task, you might have one agent that writes the code, another agent that tests the code, another that debugs it, and so on. And eventually, the overall code functionality can be even better by having a well-defined set of AI agents working together if they're built properly. But...

It's not easy to build a multi-agent system because this is where you have to really have a good understanding of the agents. Or if you will be using frameworks or other capabilities that will enable you the ability to build multi-agent system, it will become much more prevalent

probably during 2025 and beyond. And because the analogy to real teams working and because we're already seeing some very promising results for multi-agent systems, there will be more industry confidence and we will see more and more of them in 2025. This is a really interesting area.

I wouldn't be surprised if when the dust settles, it's only really when multi-agent systems become the norm that enterprises really start to see value, or at least big scalable value. The reason for that being that if you're asking, right now we have sort of a

a correlation between how specialized an agent is and how likely to perform it is, but that makes it, you know, very, a very discreet set of tasks that are, that tend to be very narrow where you can kind of deploy these things right away. Uh,

The multi-agent systems are going to be where you can get more customizable and you can sort of ask for more complex things. And so I think when people are really imagining in their mind's eye all that agents could do, they're probably in many cases actually imagining multi-agent systems, even though that won't necessarily where we begin the year. Yes.

And also, like humans, if you try to get an agent to do too many things at once, it will get confused. And thereby, the multi-agent system, even for sometimes the smaller use cases, if we were able to nail them, they will probably get us to better, more accurate results. Okay, number 17, more focus on multimodal abilities of agents.

Okay, also very exciting one in my opinion, because when we're talking about AI agents, we're talking about things that will have to perform tasks and have a good sensing and understanding almost like humans. And in order to do that, we will have to have more ability of these agents to have like multi-model perception of the environment, whether they will be processing video, audio, images,

whether they will be controlling the computer and so on. All of these amount to something that is very exciting. The most exciting thing that I've seen recently is Google's Project Astra. I've seen some demos and some testimonials of people who use that, and it's a great example of where you have a model that is able to perceive the environment using video and interact with you and literally be like your eyes and ears in a real environment.

And I think more exciting even is the possibilities for people who are with some kind of a disability to have these agents work for them. I know that we're very focused on the enterprise, but this is a consumer use case that I'm very excited about. And even for the enterprise, you can think of having a much more robust assistant that has all of these sensors working simultaneously to help you. Yeah, I think that...

This is one of the areas that's been really notable to me. Even in the last couple of weeks, we got an update in Project Astra, and we also got, as part of OpenAI's 12 Days of Shipmas, advanced voice mode with vision. And I think that we are still underestimating how different the modality will be of when the normal way that we interact with AI, is it having the same visual and auditory context for

for the world around us that we have. It's very hard, I think, for most people, myself included, to break out of thinking about it as a thing that exists in a computer that you write to, you know, or maybe you speak to. But I think that we're going to just see a gradual shift over time that opens up totally, not just totally new use cases, but I think a fairly fundamental different understanding of what these tools actually do for us.

All right, number 18, more academic and open source brainpower will be devoted to agentic research, which should further accelerate development. Right. So I mentioned that in the previous conversation, but I've been working in AI for many years now, and I'm still amazed by what happened over the last two years.

And I think thinking about what created all of these capabilities beyond some specific technological improvements is the fact that so many smart people all over the world is literally focusing on one domain or one problem. And I believe that agents will enjoy the same thing with so much hype and attention. We will just be able to get so much more.

And with so much brain power coming from all directions, whether these are open source or academy or industry, the exponential curve will continue and we will all be probably both excited, scared and utilizing all of these technologies much more because of all of that.

You know, it's kind of ironic, but interesting. I actually think the fact that pre-training as a scaling methodology seems to be plateauing or at least running into some limits will only increase how much of that energy and brainpower goes to agents and applications and expressions instead of just thinking about raw capabilities enhancement of the underlying LLMs.

It was interesting. So on the Dwarkesh podcast, I don't know, a while ago now, maybe three months, six months, something like that, Francois Chalet basically said that he thought that OpenAI had actually set back AGI, which is fascinating. And his argument was that once ChatGPT hit, everyone just switched to thinking about and focusing on LLM architectures and not doing anything else.

And now that we're running into some limits in terms of getting kind of to the next level of capabilities, although who knows if that's actually true given O3, I think that there's going to be just even more fertile realms of experimentation on different ways to pull capabilities out of the tools that we have. Yeah, but I'm not sure whether it's the slowdown or the natural progression towards inference time reasoning. Yeah.

You know, the cynics will say that because they can't give us good enough results in scaling, then the hyperscalers have all shifted to agents. But I'm not sure. Maybe it's because like you and I are seeing the potential of agents. That's why they're so excited and are working on that as much. And maybe they have some good stuff installed for us in the let's call them regular LLMs.

because they are all claiming, maybe aside from Ilya Suskaber, they're all claiming that we're not there yet in terms of scaling completely, like a slowdown. So it seems like also a marketarial discussion and then not just the technological discussion. Yeah, I agree. So speaking of this, number 19, new interfaces, standards, and protocol will emerge. An agent computer interface. Yes.

Right. So, you know, we all were very excited when Entropic first introduced the computer use. Everyone rushed in to experiment with that and it really sounded like the true beginning of something major. And then everyone quickly realized that it's much more cumbersome, expensive and not very accurate.

And I'm not sure whether this is the right approach. Like, do we want agents to control the computer like humans do? Or in fact, because agents will be doing so much work on the computer, there will be a new need for an interface for these agents to control a computer.

And moreover, because there will be so many agents working together, then there will be a need for new APIs and new protocols for how to communicate between agents to agents, as well as perhaps being much more literal about how we write stuff because agents can't read between the lines like humans often do.

Maybe your error messages have to be machine readable versus human readable and so many other things. So that will also, I believe, will be a huge focus. And an interesting part will be whether all of these different players will be able to get to an agreement between them or we'll get to a point where everyone blocks one another with different protocols rather than being open-ended and letting other companies' agents operate on your data. And I'm not sure whether...

All of these websites will let agents call and do stuff on them? Or will we see an economy of blocking each other where essentially they're telling you, the end user, that if you want to do this action, you have to use our agent because we will block your agent from doing that on our data or our tools? My guess is it proceeds sort of similarly to how most...

versions of this have, which is initial balkanization and attempt to capture value that ultimately loses out to open protocols and standards that underlie things because there's just too much efficiencies to be had. If it's anything like the way that the internet has developed in other areas. I definitely think that this is going to be a big part of the next few years is those sort of subaltern kind of battles happening.

let's hope that the open-ended will win for the sake of all of us because it will be a better economy in my opinion. Number 20, a lot of investment in creating agent-oriented benchmarks. Okay.

So how do you measure an agent's performance? Is it only when it arrived at its final destination? Sometimes we don't even know the final destination, so it's very hard to measure that. And we have seen some recent emerging benchmarks that are trying to be more open-ended like agents are and try to...

pose a set of evaluation questions that will require these multi-step reasoning and open-ended thinking that an agent will have. Two concrete examples, the SWE, Software Engineering Benchmark, that tries to let an agent or an AI have multiple human-like software engineering tasks and measure how well they perform in that. And there is also an interesting benchmark of research engineering where the agent needs to basically...

do the AI research that a human expert would do. So these are two interesting benchmarks that are emerging. And I believe that we will see more and more because the existing methods for evaluating DLLMs are not suitable for agents. They often look at the bottom line and are not really indicative of how well the agent performed, especially if you want to open the black box and see the multiple reasoning steps that the agent has done in order to get to the result.

So we will see more of those and rightfully so, because as we talked before, there will be so many competing offerings. And aside from maybe experimenting ourselves, it's going to be very difficult to assess how well they're doing if we will only use the existing benchmarks. Yeah, I completely agree with this. I think there's going to be a highly functional set of benchmarks necessary. Again, just thinking about it strictly from the standpoint of the enterprise. So as we are thinking about how to recommend in...

agent X versus agent Y for some specific purpose that we've with an enterprise determined is a great place to, to start experimentation. The types of things that would be valuable for us to know are exactly the types of things that you were just mentioning that there currently aren't benchmarks for. So for example, uh,

How many times in the process of completing the task at hand is the agent likely to need guidance from humans? A one on that is very different than five on that. The value proposition is totally different based on that. Whatever that score is called is a score that I would like to see as relates to making decisions around agents.

So I think that you're right that there's going to be a lot of exploration here. And it won't just be pure technical benchmarks. I think those will be highly functional and related to actual usage as well. Yes, for sure. Number 21, the emergence of agent-oriented LLMs to serve as underlying models.

Right. Again, maybe something a little bit more controversial, but this is my opinion and feel free to weigh in yours. But I believe that unlike the traditional LLMs that are very much designed for a broad natural language tasks or sometimes image videos and so on, the LLMs that are more oriented towards agents will be more purpose-built for powering those autonomous activities that the agents will need to do.

And, you know, open AI is 01 and now 03 and the likes. They are a good step in this direction of having LLMs that are more geared for agent reasoning. We can and we will probably also see more of these models created and used, some concrete explanations.

They might not be better in the general benchmarks because they don't have to be smart in everything like we are benchmarking our existing OpenAI and other models, but we want them to be more suitable for agents. So maybe they will prioritize the multi-step reasoning. Maybe they will prioritize the long-term memory, or maybe they will be very smart about retaining very good context,

or enabling the agents to be more thoughtful in the way they plan and the way they make decisions. And while we see these LLMs being specializing, we might even see a mix and match where even one single agent will use different models for the different steps of performing its tasks. So maybe it will use the O3 model for the initial planning, and then it will use a smaller model for the

doing its ongoing tasks as part of the overall flow. And I believe that eventually we'll see a very hybrid approach where some of the models that are used are smaller, cheaper, faster. Some of them are smarter and the best engineering practices will be around finding the right models and using the ones that were probably tailored more

the most even for not only the overall agent concept, but even for your specific vertical, we might even see those emerging.

Yep. I don't have much to add. I think this is absolutely going to happen. I think that the more that we get sophisticated around what gets better performance, I think, and I think that there's going to be cost incentives to do this experimentation, if nothing else, right? The fact that the sort of highest state-of-the-art intelligence is still very expensive means that

There's a lot of reasons to try to bring more value out of other models and other approaches. And so I think we're just going to see tons and tons of this sort of customization.

Vanta automates compliance for ISO 27001, SOC 2, GDPR, and leading AI frameworks like ISO 42001 and NIST AI risk management framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer-facing trust center all powered by Vanta AI.

Over 8,000 global companies like Langchain, Leela AI, and Factory AI use Vanta to demonstrate AI trust and prove security in real time. Learn more at vanta.com slash nlw. That's vanta.com slash nlw.

If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. All right. And now we get to our last section, investment and media hype. Number 22, this is probably your safest prediction. Significant VC dollars will be invested in agentic companies.

Yes, so probably everyone who wants some funding or take a good care of their stock will have to say agent. And this can also be a fun drinking game. Each earning call, how many times each CEO will say agent.

And you mentioned in the previous conversation the Y Combinator team saying that vertical agents will be 10x bigger than SaaS. That created a lot of headlines, and we know that other VCs are also already on board with the technology trend. And this for sure will continue in 2025, and thereby there will be many newly founded companies.

startups and companies, but also many companies that will add an eugenic offering or pivot towards an eugenic offering. And some of it rightfully so, some of it, the natural evolution and progression of things will probably not have them in the history books as the companies that yielded a lot of value from that. I think that this is absolutely true. It's already happening. Certainly, you know, this has been a major theme with venture recently.

a couple of things that I think are, are interesting to watch for that should tell us how this is evolving. One is, is exactly what you just called out to the extent that AI gets supplanted or supplemented with agent mentions in earnings calls and things like that. That'll be very telling. Um, but two, one,

One of the things that I think will happen is that lots and lots of companies and startups will not for the sake of funding, but just because they realize that an agent can do something discreet and unique for them, accidentally start building agents on top of or as part of or as replacing their existing offering. We've had this process. So Superintelligent delivers AI support enablement applications

as a team, as a self-serve platform, and now increasingly as an agentic offering. And that wasn't a money chasing thing. It was because we realized there are things that we could do with agents to scale ourselves that we couldn't do any other way. And I think lots and lots of companies are going to stumble into experiments next year where building with agents actually unlocks totally new possibilities that they haven't seen before. So this might be one of those rare experiences

VC themes, there's enough there there to justify all of the excitement and the capital that flows in. Yeah, I believe there is, but you have to be smart about what you're building and for what reasons. Yeah.

Yeah, I think I would much more so than speaking to the investors. I think the reminder for me when it comes to builders is it's always a bad, it's often, if not always, a bad choice to just chase the trends and what VCs are looking for rather than making the right decision for whatever company you're trying to build and whatever problem you're trying to solve. However, I would say I would caution against

explicitly not looking in the way of agents because you believe it's overhyped and just sort of a VC thing. I think there's going to be lots of opportunities to build there that are going to be really fun and meaningful.

Until the summer, of course, because number 23, come summer, there will be a media debate about whether agents were overhyped and whether development is slowing down. So, you know, the existing challenges might not be resolved. And we mentioned many of them as we discussed in these two episodes. But new challenges will probably emerge and reality will meet the currently probably overhyped and overinflated media expectations.

And fortunately, the media will also be the ones probably during the summer when the news cycle subsides that will take upon itself to deflate the bubble and tell us all how agents were mostly hyped and are not delivering to promise. And

what we predicted that comes fall, we will meet reality and the reality, at least in my opinion, is that agents will continue to yield a lot of value. And bottom line here, and from my perspective, is that

while the news cycles will come and go and we will see many headlines saying that agents are not what they promised to be, they will be. And it might, the only caveat is that it might take us slightly longer than anticipated and it might be a little bit harder than anticipated, but the value is there and will continue to be there, at least in my opinion. Yeah. So in the summer of 2023, the version of this was,

that ChatGPT had its first down month in June of 2023. And that was the context for all of these pieces. Then in this year, of course, it was the Goldman Sachs, too much money, too little value, and the Sequoia $600 billion question posts that created the whole discussion. And so there does seem to be a trend where summers generate kind of a FUD cycle around AI. Yeah.

Interestingly, part of why this one's going to be extra funny when it happens is that a

Agents have actually been the most hyped thing since ChatGPT launched. If you go back to April 2023, when the AI Daily Brief was just starting, the thing that everyone was talking about was AutoGPT and BabyGPT and all, you know, like it was agents from then. And so it'll be very funny to see that we're actually have, you know, a discrete set of perhaps very specific, you know, kind of single purpose agents deployed. And yet, you know, then

narrative might be that it's disappointing. But I agree with both the likelihood that there will be that anti-hype cycle and also the reality that it is incorrect, ultimately. Yeah, let's play this tape once we're there to prove that we predicted that. Number 24, agents will be intertwined and accelerate AGI discussions.

Okay, so when I created this prediction, it was before last week, last day of OpenAI, 12 days of shipmas. And for those of you who might have already went into vacation hibernation, so OpenAI literally shocked Asiet again last week when they announced the O3 model, because they said that it surpassed the human level in the ARC benchmark.

And the ARC benchmark is a benchmark that was created specifically to evaluate the AI system's ability to generalize and solve problems that prove that it's AGI worthy. And up until last week, the best performing model was very, like I think low 20s or low 30s, I don't remember the number, but OpenAI with their O3 model have surpassed human ability.

And I think even more so the discussions around are we there yet will reignite in early 2025. And, you know, with all of these agentic discussions, we need to ask ourselves what's the relationship? Because if agents demonstrate an increasing interest

autonomous behavior and they're utilizing O3 in the background that already surpassed some human benchmarks in AGI, the lines will become really, really blurred and the debate will probably go further about are we there yet with AGI? And I think during the year, as we will see more and more impressive use cases of agents come to fruition, some of these discussions might be even relevant.

However, I have to say, first of all, that bottom line, I don't think that 2025 will be the year of AGI, even with agents and all this intertwined relationship. And I also don't think that it really matters. I think, like I said before, what matters is the outcome or the results.

and agents will yield good results in 2025 and will have a lot of potential of having human-like abilities in many, many different tasks, but I'm not sure whether it will move the needle as much or will it matter beyond some financial and some specific companies that have the incentive to say AGI is here. Absolutely. I think AGI ultimately matters insofar as it's deployable

to change the way things actually happen, right? And so I think that that's why it will get caught up or connected to the agent conversation is that agents are going to be a lot of where the next frontier of the state of the art goes to get deployed when it comes to AI.

For what it's worth, Francois Chalet, again, who was the creator, the progenitor of the ARC Prize, he tweeted about whether this meant that O3 was AGI. And what he said was, while the new model is very impressive and represents a big

milestone on the way towards AGI. I don't believe this is AGI. There's still a fair number of very easy Arc AGI 1 tasks that O3 can't solve, and we have early indications that Arc AGI 2 will remain extremely challenging for O3. This shows that it's still feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible for AI without involving specialist knowledge.

We will have AGI when creating such evals becomes outright impossible. So even though there's a huge discussion right now, at least the guy behind that particular benchmark doesn't think we're there yet. But I do think that you're right to call out. Certainly, this has been the big discussion over the last few days. We're recording this on Monday, December 23rd, and it's been pretty much all anyone's been talking about for the weekend. But actually, when push comes to shove, you see this happen over and over again on Twitter slash X.

that, you know, someone will start with a debate around whether this is AGI, and then it'll quickly get to, well, it doesn't really so much matter. What matters more is, you know, does this mean software developers are cooked? Does this mean, you know, different job roles are totally going to change? And so I think that it's going to be all about that practice that really matters. And again, that's why, you know, agents are going to be such a big part of the story. However,

According to number 25, there will be an even bigger part of the story a little bit down the line. So 2026 will be even bigger for agents than 2025.

Right. So I think it came across in multiple times during this conversation that this is where we will see the beginning of the exponent. And of course, if everything that we just discussed will happen, 2025 will be an amazing year with a massive leap forward in agents and humanities progression overall.

But it's just the beginning. And I believe that 2026 and probably a few years after will be the years where many of these learnings and development and whatever we learn from this, you called it the pilot year or the year where more and more people put their hands on agents.

And this is where we will yield the big promise of Gen AI. And that's why I'm so excited. You asked me at the beginning why I'm so excited about agents. It's what will happen in 2025, 2026 and beyond that will get us all to be amazed about how work and life were before this era.

Yeah, so I agree with this and I'd go a step farther. So I think that in 2026, enterprises will, that'll be the first year that enterprises meaningfully and regularly have agents deployed to,

just in the normal course of their workforce, right? It'll be a hybrid human agent workforce will be increasingly the norm. Not the norm, but more and more it'll be normal to see that as part of certain functions. I think that it'll be highly focused on particular functions to begin, but I think that it'll be fairly normal in 2026 to be having agents deployed at scale across certain functions.

And so the implications of that are that you have to use 2025 to figure out which those functions are, how you integrate them with your systems, how you build the new systems around them that you need. And that's going to take a ton of work and experimentation. Obviously, this is what Superintelligent is positioning to help people for. This is why we're doing these readiness audits. It's why we're supporting agent deployment. It's why we're helping companies build systems for ongoing AI transformation.

2025 is going to be an incredibly important inflection year that is really going to push enterprises to build the systems that allow them to actually take advantage of this in 2026 and beyond. And I think that the implications of that are that you really will start to see, especially in 2026 and beyond,

a clear breakout of companies that have built these systems and have the capability, you know, who have gone through AI transformation and who have this system set up to continue AI transformation. They will start to break out from the pack in very meaningful ways in a way that hasn't even happened yet. So I think it's going to be very, very exciting. And I think that this year,

This year will be very fun because the stakes will be high, but still there's lots and lots of room to do things that don't work and to, you know, wander down paths that don't lead anywhere. That won't be the case for very much longer. It's going to be a fun year for sure.

All right. Well, Nufar, thank you so much for hanging out. This is a super fun conversation. We don't have anything quite yet to announce, but for anyone who did like this, keep an eye closely tuned or an ear, I guess, closely tuned to this as we might have some interesting announcements coming up. But hope that you have a very fun and non-agentic holiday, everyone. And we'll see you in 2025.

25 Agent Predictions for 2025 - Part 2 40:41 Share