We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

8 Ways Agents Will Improve This Year

2025/1/17

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

Ahsan Khaliq

Jared Kaplan

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

Ahsan Khaliq：我发现了 Google Gemini 的一个新功能，它能够同时处理两种视觉输入，例如图片和视频，这在以前的多模态大型语言模型中是做不到的。这为医疗、工程和质量控制等领域带来了许多新的应用可能性。 Jared Kaplan：我认为 AI 智能体将在今年在四个方面得到显著改进：首先，它们将更擅长使用工具，能够处理更复杂的任务，并在出错时寻求用户反馈；其次，它们将更好地理解上下文，能够根据业务逻辑、行业背景和法规环境等进行调整；第三，它们将改进代码辅助功能，能够理解代码错误、进行调试和运行代码；最后，它们需要提高安全性，以应对提示注入等安全挑战。主持人：除了 Jared Kaplan 提到的四个方面，我还认为 AI 智能体将在以下四个方面得到改进：首先，企业将努力改进数据质量，使其更易于 AI 智能体使用；其次，多智能体系统和编排技术将得到发展，从而实现更复杂的任务处理；第三，围绕 AI 智能体的可观察性、评估和基础设施的工具将得到改进；最后，企业将更加关注 AI 智能体的投资回报率 (ROI) 的追踪和衡量。Google 将 Gmail 和 Google Docs 中的 AI 功能免费提供，这可能是 AI 竞争的一部分，旨在吸引更多付费用户。Meta 公司高管非常关注打败 OpenAI，并将 GPT-4 视为主要竞争对手。AI 头像技术虽然不可避免，但其普及还需要时间。

Deep Dive

Chapters

Google's Gemini AI shows unexpected ability to process two visual inputs simultaneously, unlike other LLMs. This opens possibilities for various applications, from student learning to medical diagnosis, but raises questions about Google's awareness of this feature.

Gemini can process visual and video inputs concurrently.
This was discovered by researchers using AnyChat.
Potential applications span various fields, including medicine and engineering.
Questions remain on whether Google was aware of this capability.

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, four, no, actually eight ways agents will get even better this year. Before that in the headlines, a surprising new capability of Gemini AI that it's not even clear they knew about. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. Sometimes advancements in Gen AI hit you over the head. More often though, every week sees some small but significant change that would be easy to miss if you weren't really paying attention. An example of that comes this week from Google. AI researchers have discovered a novel feature of Google's Gemini, which is the ability to see two things at once.

Until now, multimodal LLMs have been only able to accept one visual input at a time. For example, either looking at a picture or watching a video. Researchers developing an experimental application called AnyChat have discovered that Gemini can do both at the same time. Ahsan Khaliq, machine learning lead at Gradio and the creator of AnyChat, said even Gemini's paid service can't do this yet.

you can now have a real conversation with AI while it processes both your live video feed and any images you want to share. The previously unavailable feature could be a result of Gemini's unique architecture. Unlike OpenAI's GPT-4.0, Gemini was trained to be natively multimodal rather than having additional input modes added later. In terms of the new and improved use cases that this opens up, on the low stakes end, VentureBeat noted that students could share a video of a problem along with a picture of a textbook, or artists could share a live stream of work in progress along with reference pictures.

For higher stakes usage, they wrote, Imagine a medical professional showing an AI both live patient symptoms and historical diagnostic scans at the same time. Engineers could compare real-time equipment performance against technical schematics, receiving instant feedback. Quality control teams could match production line output against reference standards with unprecedented accuracy and efficiency.

Now, the release of this feature through a third-party tool begs the question of whether Google were aware of Gemini's capacity to perform in this way. It's totally possible that they decided to keep the feature locked away due to the high resource usage associated with this type of processing. Then again, it might also be a sign that small teams of curious devs continue to discover things that the large research labs overlook, even about the emergent features of their own models.

The one thing that did come directly from Google is that the company has announced that they're making AI in both Gmail and Google Docs free. This is definitely part and parcel of the AI race and the war for premium users. It used to be that if you wanted to use Google AI features inside Gmail, Docs, Sheets, Meet, basically the Workspace suite, it was going to cost you $20 per month. Basically, if you're already paying for Workspace, all of that's going to be bundled for free.

At the same time, however, the base level price of all workspace plans is increasing. Basically, companies are going to now have to pay about $2 more per month per user for workspace, but all the AI stuff will come natively.

Now, I think this is a pretty interesting play. One of the gripes that I've frequently heard from our enterprise partners at Super is the grumbling about how much more it costs to buy co-pilot and AI subscriptions on top of existing Microsoft service. Now, of course, price is a fast-moving target and AI has real costs, but this is definitely a big move and one that could force other companies' hands. Staying in big tech, but moving over to meta for a moment, internal messages have revealed that meta executives were extraordinarily focused on beating OpenAI.

These internal discussions have been unsealed as part of the Sarah Silverman-led lawsuit against Meta. They suggest, not surprisingly, that the company had their sights firmly on GPT-4 as their major competition. In an October 2023 message, Meta's VP of Generative AI, Ahmad Al-Dhali, said, Honestly, our goal needs to be GPT-4. We have 64,000 GPUs coming. We need to learn how to build Frontier and win this race. And interestingly, although Meta was competing in the open-source field, they didn't seem too worried about rival open-source labs.

For example, in one message, Aldale said, Mistral is peanuts for us. We should be able to do better. It was clear even as we were watching that in between Lama 2 and Lama 3, Zuckerberg and Meta shifted their attention from being the best open source model to being a world-class, state-of-the-art model in general.

Now, a lot of the framing of this article is all about how obsessed they were and uses a lot of words that suggest that pejoratively. But that sort of aggressive focus is the only way that companies, especially big companies, are ever going to be able to stay in a race against a startup like OpenAI. One person's obsession is another company's focus.

Now, there's a whole other dimension to this battle around the use of the external LibGin dataset, which contains pirated versions of copyrighted works and was billed as the largest free library in history. LibGin has been sued multiple times and was ordered to shut down. Aldali discussed clearing the path to use the dataset by contacting publishers, but it's not clear that he obtained all the relevant clearances. In one message, he asked, do we have the right datasets in there? Is there any reason you wanted to use but couldn't for some stupid reason? Of course, as that lawsuit goes on, I'm sure we will hear a lot more about that.

Lastly today, flipping over to the startup side of the world, a big new round for AI avatar startup Synthesia, who has raised a fresh $180 million at a $2.1 billion valuation. That's about double their valuation from 18 months ago. AI avatars are to me one of the more interesting technologies in that I think that they are at once completely inevitable and yet will still take an artificially long time to become normalized, at least in the business world. There are a lot of great companies competing in this space, but Synthesia has a very big war chest now to compete.

With that, though, we will wrap the headlines. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.

Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms.

agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, friends. A quick note here as we transition between the headlines and the main episode. Fun announcement I have today has to do with a new sponsor and partner for the AI Daily Brief.

You may have heard the name KPMG being kicked around a lot recently. I did part of an episode last week about their AIQ4 Pulse survey, which is a survey of 100 business leaders from companies that have a billion dollars or more in revenue. And then, of course, I had senior AI leader Steve Chase on earlier this week.

I started to get to know KPMG last year. Many of their senior leaders are listeners of the show. And now we've had a chance not only to get to know one another, but also to collaborate in other contexts as well, on learning sessions with super intelligent. And it's very clear that they are extremely diligent, conscientious, and forward thinking when it comes to generative AI.

All of my conversations with them have been extremely enriching and thought-provoking and just generally fun as relates to this crazy new future we're moving into. And so I'm thrilled that they will be supporting the AI Daily Brief more directly for the year to come. So a big welcome to KPMG. I really appreciate their support of this show and I'm excited to continue to get to work together.

With that, let's head back to the main episode. Welcome back to the AI Daily Brief. Today, we are doing something a little bit different that I'm quite excited about. It is no secret at this point that 2025 is definitely the year of agents.

or at least the year of agent pilots. And basically since ChatGPT came out or very soon thereafter, people were already racing ahead to the possibility of autonomous agents that were actually executing using AI tools on people's behalf so that instead of just having a smart assistant, you could actually have employees, a team, an army, in fact, working for you, allowing you to do more things.

And the first attempts at this, although very hyped and incredibly exciting to people, things like auto GPT and baby AGI from all the way back in April 2023, just were very, very limited in what they could do. In fact, agents have been limited in what they could do all the way through up until now.

Still, in the back half of last year, we saw a lot of very specific agents start to come to market. Salesforce announced their agent force, and that's basically all CEO Marc Benioff would talk about for the last quarter of the year. Google announced agent space in December, which went beyond just a framework for building agents and actually started to provide out-of-the-box agent experiences.

We had the nudges towards agentic from companies like OpenAI, who were clearly seeing their O1 and O3 reasoning models as a step in that direction. And we had Anthropic showing off computer use, a way that agents could actually start to manipulate and interact with websites in the same way that humans would. All of this has created a scenario where for many, many big companies, 2025 will be the first year that they experiment and do proof of concepts in the agent space.

Some of the areas that we anticipate to be the most common for that to happen will be customer service and coding, but there are many other examples as well. And yet, we also anticipate that this will happen in fits and starts. Many of the things that big companies will hope they can do with agents just won't be quite ready yet. In fact, right now, we're in the middle of deploying what we're calling our agent readiness audit. We started advertising for this at the very end of last year and have been just

absolutely inundated with companies who want to figure out their agent strategy. And one of the things that is clearest is that companies who have very clear and defined expectations are likely going to do better with these pilots and have a better experience than those who come in assuming that agents can do everything that they imagine they could,

right now. But with all of that said, an important thing to remember is that this is the worst that agents will ever be. Recently in the MIT Technology Review, Anthropics Chief Scientist Jared Kaplan gave four ways that he believes agents will be even better over the course of this year. What we're going to do today is go through his four ways agents will get better, and then we're going to add four of my own.

First up, Kaplan believes agents will get much better at using tools. He said, I think there are two axes for thinking about what AI is capable of. One is a question of how complex the task is that a system can do. And as AI systems get smarter, they're getting better in that direction. But another direction that's very relevant is what kind of environments or tools they can use. We were excited about computer use basically for that reason. Until recently with LLMs, it's been necessary to give them a very specific prompt, give them very specific tools, and then they're restricted to a specific kind of environment.

What I see is that computer use will probably improve quickly in terms of how well models can do different tasks and more complex tasks, and also to realize when they've made mistakes or realize when there's a high-stakes question and it needs to ask the user for feedback. In short, tools are going to be a key way that agents actually get more autonomous and generalizable. Next up, Kaplan suggests that agents will better understand context.

Anthropic recently introduced new features to train Claude to use a particular tone or writing guide, making it all the more useful in business settings. Doing something similar with agents might mean being able to apply a set of business logic, industry context, regulatory environment, etc. to the agent. Kaplan said, I think we'll see improvements there where Claude will be able to search through things like your documents, your Slack, etc., and really learn what's useful for you. That's under-emphasized a bit with agents. It's necessary for systems to be not only useful but also safe doing what you expected.

This is definitely what the big players are promising. A big part of the value proposition for Google agent space as they frame it is that these agents, which are way more out of the box than their previous frameworks, have access to all of the information that makes your company run.

They write, Kaplan also pointed out that recognizing context would mean cutting down on resource use. He pointed out that reasoning models shouldn't need to think very hard to open a Word document, commenting,

Kaplan's third prediction is a very specific use case. Agents, he say, will make coding assistance better. Developer assistance is definitely a breakout use case, not only of Gen AI, but agents now as well. Kaplan said, my expectation is that we'll see further improvements to coding assistance. That's something that's been very exciting for developers. There's just a ton of interest in using Cloud 3.5 for coding, where it's not just autocomplete like it was a couple of years ago. It's really understanding what's wrong with code, debugging it, running the code, seeing what happens and fixing it.

And lastly, Kaplan points to something that he seems to think is a necessity, which is that agents will need to be made safe. He said, We found it anthropic because we expected AI to progress very quickly and thought that inevitably safety concerns were going to be relevant. I think that's just going to become more and more visceral this year because I think these agents are going to become more and more integrated into the work we do.

We need to be ready for challenges like prompt injection. Prompt injection refers to the ability to sneak prompts past guardrails. He continued, prompt injection is probably one of the number one things we're thinking about in terms of broader usage of agents. I think it's especially important for computer use and it's something we're working on very actively. Because if computer use is deployed at large scale, there could be pernicious websites or something that try to convince Claude to do something that it shouldn't.

Now, one of the things that was really interesting when Anthropic announced computer use was that that's something that people have been historically concerned about. So Anthropic seems to be on a similar page, at least to OpenAI, in the idea that the best way to figure out how this is all going to play out is to release very incrementally and try to let people adapt and see how AI interacts in the real world.

So those are Kaplan's suggestion for how agents are going to get better this year. But as I said, I wanted to add a few of my own. And once again, these come out of the now dozens of agent readiness audits that we are currently engaged in.

So one way in which agents will get better, which is really kind of an extension perhaps of understanding context, is better data. Organizations are hyper aware right now and have a very strong belief that a big determining factor in how well AI works for them is going to be how good their data is and how prepared it is to be used by that AI.

In KPMG's Q4 Pulse survey about AI, which surveyed about 100 executives from firms with a billion dollars or more in revenue, those people actually identified the quality of organizational data as the biggest challenge for their Gen A strategy in 2025. 85% said that they expected it to be a big challenge compared to, for example, 71% who pointed to data privacy and cybersecurity.

This is something that we're seeing as well. Organizations are very, very conscientious and aware of how they need to improve their data to make it more accessible for Gen AI and specifically agents. Given how much of a focus that is, I think that that context that Jared was talking about won't just be from these casual plugins to existing data sources, but will also interact with real significant enterprise efforts to make data agent ready.

Next up, orchestration and multi-agent systems. Right now, a lot of the use cases where people can reasonably do proof of concepts for agents are very, very specific, single agent kind of workflows.

In fact, the agents that most people will be testing for at least the first half of this year, and probably most of the year in total, are fairly close to what might have previously been called an automation. Still, everyone knows that this is just an incremental step towards where they're really trying to go, which is agents that are capable of taking on complex tasks from end to end without a human moving them from one step to the next. Those sort of multi-agent systems require orchestration, and this is one of the most fertile categories for agent infrastructure development right now.

Companies like Emergence are working furiously on platforms that allow agents to come together to do much more complex tasks than would be available in the past. And I think in addition to seeing those very specific and singular agent proof of concepts, we're also going to see enterprise leaders start to get a little bit more sophisticated and actually hack at these multi-agent systems. I anticipate that in 2025, it will only be the vanguard of enterprises, especially those who have a little bit more in the way of technical resources internally, but that won't be the case forever.

Next is sort of a catch-all, observability evaluations and infrastructure. Basically, the tooling around agents is going to get a heck of a lot better this year as well. You are starting to see purpose-built platforms for things like observability start to emerge. Observability in this case refers to the idea of having full visibility into what the agent is actually doing so that you can see how it's working and more particularly where it's not working and what got it stuck.

If you hang out on agent Twitter, one of the things you'll hear a lot is agent companies griping about the fact that enterprise customers who are trying agents don't want to think about evaluations. But once again, I anticipate that to be something that third-party platforms start to normalize and make much easier for them.

And just in general, right now, a huge amount of the development effort and entrepreneurial effort, frankly, around agents is going into developer tooling. And more simply put, just trying to make the agents actually work as well as all the business people think they should be working. However, one of the things that I anticipate in 2025 is a massive explosion in the infrastructure and deployment support specifically focused on business use of agents.

That's obviously a place that Superintelligent is playing around as an AI transformation and workforce management platform. And I think that we are going to be far from alone in those endeavors. Lastly, let's talk about ROI. ROI has had a very interesting place when it comes to AI for the last couple of years. ROI is never far away from the conversation when you talk to people who are in charge of AI transformation. And yet, it has not been a barrier to adoption at this moment.

What I mean by that is that companies have such a strong sense that these AI tools are so powerful that they will inevitably make their workforces work better, that the fact that Gen AI tools maybe have a little bit more trouble exactly explaining their own ROI hasn't slowed down adoption. In fact, there has been such a push for adoption that ROI has been shunted backward as a thing we'll figure out later.

Think about it this way. For the last two years, if you were a CEO, what's more likely to get you fired? Saying we're not exactly sure what the ROI of Gen AI is, so we're going to hang back and let people figure it out before we get into the game, or diving in headfirst and saying we don't know how to measure it yet, but we're fairly convinced there's ROI there, and we want to be out ahead figuring out the use cases and how it actually benefits our business now.

It's not even a question. The idea of slowing down because ROI measures haven't been clear hasn't really been on the agenda even a little bit. At the same time, it lurks around the corner. And part of, I believe, why agents are so explosive right now in the marketplace is that they have an implicit ROI built into them. If an agent works, it does a certain task or set of tasks for much cheaper than the equivalent human labor. Full stop.

Now it is an entirely separate question of what organizations choose to do with those savings. This gets back to our frequent conversation about the efficiency era of AI or doing the same with less versus the opportunity era of AI, which is not about cost savings, but about reinvesting those savings into building fundamentally different, more innovative and better services and products. But still the point remains that agents will do certain tasks and categories of tasks much faster, cheaper, and eventually better than their human equivalents. And if your robot does task X,

For a tenth of the cost of the equivalent human labor, there is ROI right there. And this, I think, explains a huge amount of why agents are so attractive and so on the agenda for 2025. And yet, there is a big gap between the implicit knowledge that there is ROI if these things work and actually tracking and measuring it. And I expect this to be a huge opportunity for companies and startups in this space to actually help enterprises with, and one that I anticipate many jumping into.

As we are helping companies with this agent readiness audit and then pilot support, which involves not only scoping and partner selection, but also monitoring and evaluation, we are certainly thinking about how you measure or at least estimate ROI in real time. And once again, I do not think we're going to be alone in those endeavors.

So that's my complete list combined me and Anthropics Chief Scientist, clearly two very equivalent perspectives when it comes to expertise in this area. For those who aren't clear, that is tongue in cheek. Once again, agents will get better at using tools. Agents will understand context. Agents will make coding assistance better. And agents will need to be made more safe. And then my additions, enterprise efforts for better data, orchestration and multi-agent systems, observability, evaluations and infrastructure, and ROI tracking.

For those of you who are in the agent game, let me know what you think. If you want to talk about the readiness audit or pilot support, shoot me a note at nlw at bsuper.ai. And for all of you listeners who are just along for the ride, join the conversation. Spotify has now started allowing comments. And so far, the discussion has been really cool. Anyways, guys, that is going to do it for today's episode. Appreciate you listening or watching as always. Until next time, peace.

8 Ways Agents Will Improve This Year 21:52 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

8 Ways Agents Will Improve This Year