We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

MCP, Agents and What AI Engineers Are Thinking About Right Now feat. Swyx

2025/4/17

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

Swyx

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

@Swyx : 我认为AI工程师峰会的成功之处在于紧跟工程师的步伐，快速反应，比传统的机器学习会议更快地捕捉到最新的技术趋势。我们注重工程师能够实际应用的知识，强调动手能力，而非空洞的理论讨论。在规划2024年峰会时，我们面临一个挑战：'Agent'一度被认为是负面信号，但随着OpenAI等公司发布的Agent相关产品，情况发生了逆转。因此，我们果断地将Agent作为峰会的核心主题，并对Agent工程的专业化进行了深入探讨。我们认为，Agent工程的兴起是模型性能提升、工具改进以及商业模式转变共同作用的结果。模型性能的提升使得Agent能够达到人类基准，而多模型可用性、模型贬值曲线以及商业模式的转变则进一步促进了Agent的发展。我们关注Agent在大型企业中的实际应用，而非仅仅是演示。来自Jane Street、Bloomberg、Ramp等大型企业的Agent应用案例在峰会上很受欢迎。强化学习(RL)和Windsurf等Agent应用案例也受到关注。 MCP协议的成功在于其作为非模型的工具，定义了工具与模型的交互方式。Anthropic在峰会上对MCP的详细介绍和未来规划，激发了人们的兴趣。MCP的成功原因包括其AI原生性、开放标准等特点，以及其在GitHub上的星数增长速度超过预期。OpenAI对MCP的支持以及Google的后续跟进，改变了Agent协议竞争的格局。MCP的成功促使人们将Agent视为一个新的创业领域，而非仅仅是一个工具。我们对Agent工程的定义进行了探讨，并提出了六个要素：意图(I)、记忆(M)、规划(P)、控制流(C)、授权(A)和工具使用(T)。下一次峰会将增加MCP和Local Llama相关的主题，并关注安全和职位相关的主题，以及'vibe coding'相关的议题。我们希望通过峰会，帮助企业领导者更好地理解AI战略制定和人才招聘的重要性，并促进工程师与其他部门的协作。 @主持人 : 2024年AI工程领域的一个重要转变是“AI工程师”角色的扩展，以及“Agent工程师”的兴起及其与传统AI工程的互动。企业对Agent的未来预期是正确的，但当前Agent的能力有限，企业需要权衡投资时机。MCP等技术能够加速Agent能力的提升和新用例的出现。OpenAI和Lillian Wang对Agent的定义有所不同，这反映了对Agent工程定义的不一致性。“vibe coding”带来的新挑战需要逐步解决，但它也赋予了设计师和产品经理更大的自主权。企业领导者需要关注AI战略制定和人才招聘，其他方面可以授权给工程师。

Deep Dive

Chapters

This chapter explores the current discussions within the AI engineering community, focusing on the evolving role of AI engineers and the importance of staying updated on technological advancements. It emphasizes the significance of hands-on experience and building in the AI field.

The role of AI engineers is evolving.
The line between technical and non-technical roles is blurring.
Hands-on experience is crucial for AI engineers.

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, what non-engineers need to know about the state of the discourse in AI engineering. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. One of the things that I think is very exciting about AI is that it's breaking down barriers between technical and non-technical people.

AI is an intermediating technology whereby people who are non-technical can start to grok the tools of creation from an engineering and development perspective. And as people are trying to make that side of their brain work, the resource that I most often point them to is the Latent Space podcast and newsletter and the AI Engineer Summit that's produced by some of the same people.

Specifically, today's guest, Swix, is at the center of all of that amazing work. He has an incredibly good pulse on the state of conversations when it comes to AI engineering. And so today we're talking about what the big themes that people in that community are talking about, building around, and what the implications are for the rest of us. All right, Swix, welcome back to the AI Daily Brief. How are you doing, sir?

Very good. Long time listener and glad to be back. Yeah. So I think that this would be a really fun conversation. What I was saying to you kind of before we were recording is that I think that what's super valuable and what I hope to kind of have come out of this is to help my listeners, which I would say are on average...

There's a higher portion of non-engineers, right, than your audience. So, you know, helping enterprises and non-engineers understand kind of where their big discussions in AI engineering are. And I think that it's pretty clear at this point that, you know, it has always been valuable for people who are inside technology and building with technology, whether they're engineers, whether they're developers or not, to try to keep a pulse on technology.

what builders are building, how they're building, you know, what tools are using. I think it's even more pertinent, obviously, with AI, right? That like the space between the non-engineer and the engineer is getting blurrier, right? Perhaps to some chagrin somewhere. But so I think that that would be super value. And I think you obviously have a unique vantage on this, you know, in terms of content you produce, you know, with latent space, but also through planning the AI engineer summits, right? So we just got off one,

I guess a couple months ago now, it feels like just a minute ago. That was super fun in New York. You've got the AI Engineer World's Fair coming up in San Francisco again this summer.

And so I thought what would be fun is maybe we kind of just go through, like use those planning processes to kind of frame what people are thinking about and how that's changing even in this compressed period of time. And maybe to start off, I think what you were just sharing with me about how you think about planning, I think is actually very useful context for folks to get into the conversation. Yeah, sure. Thanks. Yeah.

The planning process, you know, this is my third year doing this, so I don't feel like I have it fully on lock. But, you know, I think the main thing is that our source of alpha is that we stay close to the engineers and also we react quickly.

faster than the machine learning conferences. And so these are the two things because there are competitor, there are obviously many, many conferences, but the, for example, the research conferences like NeurIPS, ICML, ICLR, all of them, I don't know if people know, in order to speak at one of them, you have to submit a paper six months in advance.

And so in AI time, that's a long time. And that's just purely because NeurIPS is 38, 39 years old. It just wasn't that fast when they were started. And now it is. And, you know, it's hard to change a tradition like that.

And then the other conferences are typically organized for business heads and talking heads and people interested in that kind of thing. And so they don't get too far down on the technical detail. And I think the key problem with that is there's a ton of fireside chats, a ton of panels. Everyone shows up with no preparation whatsoever. They yap for 30 minutes and then you're done and you don't remember any of it.

So the thing that I emphasize a lot is what are engineers going to take home with them to do their work and how to improve that. And that tends to, that means I demand of my speakers that they prep a lot. But then also like that, then that gets the results that we get, which is, you know, talks that actually matter. And the people that come actually, you know, want to meet the folks that like build things and are hands on.

So, yeah, I mean, like, it's weird because, like, it's slightly less prestigious to be hands-on keyboard than, you know, being like a CEO of like a major company, you know, going on stage and talking about how we're all not going to need jobs in five years. But, you know, the people who need, who are hands-on keyboard also need a place to gather. And that's what I do.

Yeah. And my argument as someone who's been involved very nominally, at least with the last one, helping MC and stuff, is I think that in general, the action is happening hands on keyboard. And even if you are, everyone's a builder now, it's sort of the short of it, whether they're building with code or building other ways. If you are kind of fully participating in

AI and agent land. So let's talk about going into the sort of the summit that was earlier this year. It felt like the big sort of inflection or change, again, from the outside, and you correct me if you were thinking about it differently, is an expansion of

The question that had kind of characterized a lot of 2024, which is what is an AI engineer, right? And, and, and how does one, what does it mean to do that? And what do you need to think about to what is an agent engineer and how does agent engineering, you know, interact with change, modify, transform that, that framework. So I'd love to know if that, if that was sort of how you were thinking about it, if that was the big sort of shift and what the implications were in terms of the conversations that you wanted to facilitate. Yeah.

Yeah, I don't know if people understand that it felt like a risk at the time because we made this decision kind of November-ish. And for a lot of last year, actually, agents was kind of a bad word because there had been a few agent startups that failed. And people were like kind of taking it. You know, we were telling people to take agents off of their description because it wasn't so ill-defined and so overused that people didn't.

Didn't really like seeing them anymore. It was a counter signal that you were doing something interesting. And then it really flipped with O1 and with all the other subsequent agent launches with operator deep research and all that.

And Manus now is crazy. So like last year's World's Fair, we had nine tracks. Only one of the nine tracks was agents. So like we really had to decide like, okay, this is the right time for agents and we're going to go all in on this one. And I think then there's also another consideration with regards to what can engineers uniquely do as opposed to researchers. And, you know, like we had other talks about open models. We had other talks about

GPUs and inference and multimodal models and all that. But a lot of that starts to entangle with the research layer of the stack. Those are great, but they are very dominated by people with the resources to do that research. And there are already research conferences. So we really wanted to be an engineer conference. And I think that specialization in the engineering layer on top of models to turn them into agents

was the key. I was kind of waiting for that moment and it felt like the time to do it, especially with NCP's launch at the end of last year. And so I announced that we were all in on agents and we planned out, here's what we think the disciplines of agent engineering are. It turned out to be

Very different in the end, but we sort of scoped out what their cover proposals was. And people came in and it was really popular. So I mean, I can talk through the individual talks that did well, but also that's a high level, which is like we made a high level bet on agents. And then I did my keynote with why now? I think there's a very strong...

moment to timing where if you know if you're too or if you're correct but too early you're still wrong and um i think that i think you know that this this whole this whole trend of like 2025 being the year of agents i think it's probably correct what do you think that you know without rehashing the entirety of the of the keynote what do you think the sort of the key inflection point was was it the like the reasoning models was it you know better infrastructure i mean what what what do you think sort of made that that shift for it to become real

If I can share my screen, I'll just, you know, for people on YouTube. I had sort of nine points and a lot of them are more slow bake, right? Like these have just been improving in terms of performance into the model of performance. So what I'm looking at on the screen now is the Gaia benchmark for meta performance.

By the way, we interviewed with the Gaia team last year if you want to learn more about what Gaia really is and what their intentions are. But things have just been improving on this S-curve and we're just kind of in this

top part of the S-curve now where we're starting to reach human baselines. And I think the closer we are to human baselines, the more we can start actually using them instead of just reporting the benchmark scores and saying, that's cute, but I'm going to go back to using my human intuition now. But then there's also all these other stuff, right? Like there's better capabilities, better tools, better tools. But also, like, you know, I like to emphasize the second tier of stuff because I think people aren't really...

People always focus on the first tier, which is that, oh, we've got a reasoning model now versus non-reasoning models, and that makes all the difference. And sure, that helps, but you could build agents without reasoning models and still benefit from all these other things. So multiple frontier labs, like Grok 3 now has an API. Gemini 2.5 Pro is arguably the best model in the world. You're not stuck to one model, therefore you can chain together different capabilities and get out of ruts.

the depreciation curve of models is also a constant force, right? And it's all Moore's law and all that other good stuff that we can talk about later. And I think the last thing on the business side I want to highlight to folks is that we're actually really...

moving from a cost plus model where you're just charging based on a number of tokens and then maybe you mark it up a little bit towards the outcomes that you deliver. And that's a huge change, right? Because now the reframing is going from, all right, how well can you consume my tokens to how much of my job as a human can you do? Therefore, you are worth this much. And that's a couple of orders of magnitude difference there.

Yeah, super interesting. And so kind of what were the what were the types of talks that you were trying to bring together to, you know, instantiate this and bring it to life? And what hit either, you know, and particularly, I'd love this sort of subjective take on, you know, what was popular that you didn't expect?

Or did expect, you know, out of the talks that kind of were most resonant with people. Yeah. Well, I mean, so we're still in the process of releasing all the talks. So I don't know in advance everything. But you could see, like, anything from Big Lab is good. Sure. Because we were organizing in New York, we really wanted to also focus on agents in production, right? I think the subtitle of the conference was Agents at Work.

So, really, I think a lot of people see demos and then they're like, that's cute. I can use it for fun demos and then probably they never actually use it. But who's actually using this thing at work? How much impact is it having to any of the Fortune 500s care?

you know, like what, what, what if they figured out because they're so smart and so big and so much, so resourced, what have they figured out that I haven't figured out? Right. So, so I got people from like Jane street, from Bloomberg, from black rock, which by the way, their talk wasn't,

approved to be released. So everyone who attended got an exclusive that we cannot release. Alpha Baby. And Ramp as well. Yeah. So the production agents and AI talk in like big companies like

Jane Street, Bloomberg, Ramp. Ramp, by the way, I think announced like an $11, $12 billion valuation after this talk. No correlation there. And then I guess like RL was very hyped and we can talk about that one, but also Windsurf, which I think always very surprising to me that you can just be a second...

mover after cursor and still do super well. And, and, you know, as long as you, you design your agents well and your, you, you take like,

good enough differentiation, you have good enough differentiation, people will give you a shot. And I think that's very encouraging. I think that just means that if you think something is over or a category is done, maybe you should just try harder. Yeah. When I was sort of pulling out and looking at, again, sample that not everything has gone up, but

Um, some of the standouts included the, the RL for agents, the windsurf thing. And then, and then of course, and maybe the one that we can talk about a little bit now is the MCP, uh,

uh, the MCP discussion. So that one I expected, I expected it to do well. So it is not. Yeah. Yeah. Well, so I would love to, like one of the things that was really interesting is you wrote a post, uh, or you guys wrote a post called why MCP one, which I think is super interesting. I think I did a whole episode about it. Basically that, that's a big post. Yeah. I mean, listen, you keep creating great content. I will keep, uh, you know, remixing it for, for, for this audience. Um, and,

So I thought part of what was interesting, because it's so recent but still so far in my memory now that MCP has completely taken over the conversation, right? And you had the Google CEO a couple weeks ago asking to MCP or not to MCP and then yesterday answering that question or yesterday from when this was recorded. But you were reflecting on the period following the immediate release

uh reaction to it so you basically argued that the immediate reaction was good but then it kind of you know got quiet for a little while so can you take us back to like you know end of november when it gets announced into december and january and how you were thinking about mcp and then where you saw the sort of the pickup in conversation and and what you attributed to yeah um i would give credit to alex albert who i think has a timeline of events of mcp uh somewhere out there um but yeah it was launched in

There was a lot of interest. I think it was top of hacker news. But I don't think there was a ton of immediate follow-through because people were kind of used to...

big companies launching protocols and then it's kind of flopping or not really working. One recent one that people might not remember is Meta actually launched a Lama stack, which is a full open source framework and stack. And every framework has a protocol embedded in it. So, I mean, that was kind of the insight there that everyone maybe went too far in trying to impose all these opinions at once.

And Anthropic did it, took a different approach and adopted a protocol that other frameworks could build on top of. And that maybe it was that minimal viable product that actually was the only viable product because everything else would have been too much for imposing too many opinions on everyone building stuff. So I think...

a lot of people started exploring it and I think like integrating it into their workflows. And I think probably it was driven by the IDEs. So like Zed and Windserve and then eventually Cursor, I think was the last one to add it. Maybe Copilot as well. I'm not really sure the exact sequence there. But I think the,

yeah I mean really I knew that it was going to be interesting I thought it was like useful for a big lab to come out and announce something that wasn't a model and it was how tools should interoperate with models and

And I think insofar as OpenAI did that in 2023, 2024 with their function calling spec and tool calling, which did well but didn't have that spark of excitement that MCP had, I think Anthropic taking a stab at it was really strong and I wanted to feature them. And that was really about the amount of calculation that I had about there. They really...

took it all the way. Anthropic has been a really strong supporter of my conferences. So they showed up and they had this like two hour presentation and they had tons of new alpha. They never dropped anywhere else.

And then they also talked about their future plans, announced the official MCP registry at the conference. And it was all this stuff. And so that sparked more excitement because I think the other thing about launching these things from Big Labs is that they need follow-through. People need to believe this is an actively worked on thing. I think one of my statements that you liked in the piece was that protocols are only as strong as the people who are already using them. And so you just have to believe that

If I invest in MCP, all my buddies are going to invest in MCP. All the people that I want to be compared to are all investing in MCP. And so like, yeah, I mean, you know, that workshop did really well. We released it. It was like a good hit. And I think like I saw the numbers earlier than anyone else just because I could see like the views. I could see that. I looked at the download statistics and I looked at, you know, everything where everything was trending. So I think this was the chart that I focused on. And I was like,

okay, you know, do I call it now? Is it too early to call it? It was like, this is exactly where, you know, it's like three, four months into MCP.

There have been many, many attempts at creating some kind of agent benchmark or agent standard, but nothing like this. And I was just like, oh, I think it's a decent chance that MCP has kind of won this. And I tried to articulate to myself why it won. And I ended up with these seven reasons, right? Or six reasons, which is like, it's AI native, it's open standard, blah, blah, blah. You already went through this in your podcast. And guess where? I don't know if you've seen the chart since this post.

I have not. No, I haven't. Yeah, well, we can click on it. And it has done... We might need some loading time for the data. But basically, I projected that MCP would take over the incumbent of...

So this is just GitHub stars, right? So we are going from zero to 15K in a very short amount of time and faster than anyone else. But the incumbent is OpenAPI. That's the big behemoth that is basically the old industry standard. And that one is at 30,000 stars. So it's basically continued to go there. So I was doing a conservative projection. I was like, it'll hit 30,000 stars in July-ish.

No, it's CrossFit this month. It's crazy. Today's episode is brought to you by Vanta. Vanta is a trust management platform that helps businesses automate security and compliance, enabling them to demonstrate strong security practices and scale. In today's business landscape, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC 2, ISO 27001, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices.

And we see how much this matters every time we connect enterprises with agent services providers at Superintelligent. Many of these compliance frameworks are simply not negotiable for enterprises.

The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35-plus frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC white paper found that Vanta customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months.

The proof is in the numbers. More than 10,000 global companies trust Vanta, including Atlassian, Quora, and more. For a limited time, listeners get $1,000 off at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off.

Hey, listeners, are you tasked with the safe deployment and use of trustworthy AI? KPMG has a first-of-its-kind AI risk and controls guide, which provides a structured approach for organizations to begin identifying AI risks and design controls to mitigate threats.

What makes KPMG's AI Risks and Controls Guide different is that it outlines practical control considerations to help businesses manage risks and accelerate value. To learn more, go to www.kpmg.us slash AI Guide. That's www.kpmg.us slash AI Guide.

Today's episode is brought to you by Super Intelligent and more specifically, Super's Agent Readiness Audits. If you've been listening for a while, you have probably heard me talk about this. But basically, the idea of the Agent Readiness Audit is that this is a system that we've created to help you benchmark and map opportunities in your business.

in your organizations where agents could specifically help you solve your problems, create new opportunities in a way that, again, is completely customized to you. When you do one of these audits, what you're going to do is a voice-based agent interview where we work with some number of your leadership and employees.

to map what's going on inside the organization and to figure out where you are in your agent journey. That's going to produce an agent readiness score that comes with a deep set of explanations, strength, weaknesses, key findings, and of course, a set of very specific recommendations that then we have the ability to help you go find the right partners to actually fulfill. So if you are looking for a way to jumpstart your agent strategy, send us an email at agent at besuper.ai and let's get you plugged into the agentic era.

Yeah, I mean, it was, to me, you know, picking up on the signals strictly from competitive pressure when, you know, because OpenAI was fascinating because when they announced Agents SDK...

uh it was sort of like all right cool there's you know the the natural interpretation i think you know perhaps unsophisticated was got it there's gonna be an agent sort of protocol war right like that's another vector of competition in terms of you know developer allegiance that we're gonna go for and then when like five minutes later they were like we love mcp we're supporting mcp too i was like all right well that's a that's a whole different ball game obviously google then follows

And, you know, I did resonate with that point that you made in terms of basically the network effect of protocols. Like, there is such huge advantage, obviously, for building where other people are building. And it just got over the bootstrap problem so quickly that, you know, it was almost like this is the type of thing where if everyone can get together more quickly and make the decision, it's so good for everyone in terms of just collective value that everyone provides each other, you know? Yeah.

you know you now you've got people who are like actually thinking about this is a category of new startups not just a new tool to use but an actual category of of things to build you've got darmesh shah who i know was on the show recently too like he's spitting out on linkedin all the all the mcp related startup ideas that he doesn't have time to do you know and uh i think he funded one of them when when someone responded saying that they were doing that thing oh wow yeah so so it's it's it's

It's very cool to see how fast that ecosystem has emerged and is starting to flourish. Yeah. So I think people think of me as an MCP show just because obviously I did just show a little bit, but I am a bit measured about it, right? Like I have seen protocols get hyped and then get very not hyped. And, you know, so the most recent version of this in developer land is GraphQL. The

It was like, yeah, this is like a better layer over REST. Everyone's doing like REST versus GraphQL threads and all that. And it's very reminiscent of MCP versus OpenAPI. It's like basically the same. And by the way, all the issues that came up with GraphQL also are emerging with MCP. Like how do you do authorization? How do you connect with remote MCPs and do discovery of them? Like the exact same thing.

because these are all the same type of problems which I call the M times N to M plus N problem right like that's how you sort of you solve combinatorial issues by adding one legal abstraction that has a standard interface that everyone plugs into right like

Common concept, everyone understands that. The authors of MCPR are super aware of it. They talked to me about it after we did a podcast with them. So, I mean, I think good governance and good judgments is still going to win the day. There is a way that they can screw this up. And I think I was also on these TBPN podcasts, and they were like, is this going to result in an explosion of agents? And I'm like, there's already an explosion of agents. This is not really...

changing that trajectory in any way. But this basically improves the quality of integrations. It's a boring answer. It's just like, you know, the integrations that you write, you expect in one app, you're going to see in another app because it's super easy to add them. And you don't have to wait for them to add, like, you know, Notion just because it's on their backlog and they don't have it prioritized yet. Like, no, it's just out of the box because Notion just launched an MCP yesterday. And that's it. Like, it doesn't make for super...

or anything apart from they are wider, they can integrate with more things, but we still have to solve a lot of the other core problems of agents. So I think it's a good, good caveating where, where I look at it from. And so, you know, again, bringing this sort of back to the, an enterprise audience, a lot of what I think enterprises are trying to figure out right now is that

uh interestingly when they think about agents and agent capabilities i actually think that directionally they're correct when they're imagining in their kind of brain and ideal mind's eye of what agents can do

they're kind of right about where things are headed. The problem is just that it's not there yet in most cases, right? The things that agents can do are more limited, they're more discreet, you know, yada, yada, yada, right? There's some gap between what they're imagining and really excited about and where they are now. And a big calculation is that

how much, how fast and in what way to invest given the, the rate of change. And this is really, really challenging because it is sort of obvious that the answer is, uh, it can't be for most cases, just practically go all in on building the thing that you most want to build right now, because in many cases, it's just not exactly where, where, where they want it to be, but it also can't be on the other end of the spectrum. Just wait for it to get ready because you're going to be behind by then. And so, um, I think that they're trying to understand, uh,

what to do in the interim. And, um, and so it's really interesting when you have, um,

call it acceleration as forces, which is sort of another way of describing, I think, what you just described with MCP. It's boring because it's not going to make more agents, it's not going to change the trajectory. But by having, you know, your point, not having to wait around for that notion of integration, not having to wait around for, you know, some other thing that you're waiting for, it does feel like it is likely to accelerate the speed at which new capabilities come online. And, you know, sort of each new thing that gets connected to the ecosystem is likely to open up some additional use case.

Yep. I broadly agree with that. Yeah, we can talk about the other elements of agent engineering that we got from the talk, from the conference. Yeah, I think MCP is a great protocol, but imagine if there were standards for all the other stuff.

Yeah, let's talk about the outside of that, the other pieces. Yeah. Well, so the other thing that happened at the conference that was followed up after was OpenAI actually previewed how they think about agents, which is they released, this is the OpenAI for VPs of AI talk, the one that was the day before you came.

And they said an agent is an AI application consisting of, one, a model equipped with two instructions that guide its behavior, three, access to tools that extend its capabilities, that's MCP, four, it can be encapsulated in a runtime with a dynamic lifecycle.

So that was what they previewed and then they launched the Agents SDK after that. I mean, they told me that that's what they were doing. So I had full knowledge of that. So it's interesting that that is one form of the definition. And then Lillian Wang, who used to be head of safety systems at OpenAI, had a different definition

characterization of agents where she was like, agents is LLMs plus memory plus planning plus tool use. So everyone agrees on the model layer, everyone agrees on tool use, and then they disagree on everything else. The agents SDK has no memory, no planning skills. And then Lillian Wang forgot that you need prompts for the models. And also there needs to be this runtime, effectively this wild loop

of like the agent in the loop deciding what to do next. And so I think it was like very disordered. And, you know, I don't like that. It seems very unstructured because people don't really take defining what agent engineering is seriously. So I took a stab at it. And there are six elements, right? So it's I-M-P-A-C-T, just because I, you know, whenever there's a lot of elements, I like to have an acronym to remember them. I'm not trying to push it. Yeah, man, it is.

You know, you gotta... Congress does this for the names of bills, too. You gotta make it memorable. I remember, like, I think, like, there was, like, this...

Jedi contracts or something. Anyway, it was like a really interesting acronym. But IMPACT, so the only forced acronym in here is I. I is intent because intent is literally borrowing from what OpenAI just used for their, what they call prompts. But I think you also need to encode goals and evals, meaning that if you, like an eval is kind of a prompt,

Because once you run an agent against an eval, you can take the negative results of the eval and then prompt it again to get the positive result.

So that is your intent. Like you want, what is your intent to the year that you're encoding or classifying and, uh, executing on, um, everything else is, is, is very straightforward. M is memory. P is planning. Uh, C is control flow. This, which is the, the runtime, the, um, the, if, if else driven by the LLM, um, a is authority because the, the OG meaning, the human meaning of agent, like my real estate agent, my, um, you know, estate agent, whatever. Um,

Is you work on my behalf because I trust you to work on my behalf, to look after my interest. And there is no, again, like in the technical definitions, the engineers, they like trust is the last thing that you think about. But really like for consumers, for enterprises, trust is probably number one. Like if I don't trust this thing, I'm not going to use it. And the last one is tool use, which is the thing that everyone agrees on.

So let's talk about maybe bringing this sort of forward into reality with where we are now. So you're living inside sort of fast changing understanding of this space. And now you're once again sort of having to put this back into a structure in the form of the run of show for the summer summit. So how has your thinking about what needs to be included in that set of talks, that

since you were planning the last event? And what does that look like in practice for, you know, the types of tracks that you guys are doing? Yeah. The feedback loop is very tight, right? So MCP did so well. So now we have doubled down and we are now, we just announced an entire MCP track with the MCP team hosting. And then we're just letting them invite their major contributors. And it's like a MCP little mini conference, right? My, like I,

I just love that I get to make calls like this because I know that the other conferences cannot do this. So we'll just do it just because we can. And we're doing the same for Local Llama because they are long overdue for a conference as well. They are the biggest community of open models out there.

And the reasoning in our L talk, Will Brown's talk from Morgan Stanley did so well as well that we announced a reasoning in our L track. So basically, I am not trying to push the concept of AI engineering that hard just because I like to talk about these ideas and then let them organically take traction because...

I'm not going to change people's minds if they're not, if the timing is bad or if like the concept isn't quite fit. Right. But I just want to focus on individual problems or domains where we can have the top speakers in the world gather and like,

they'll primarily do their talk, but really they're there to meet each other. Like I, I fully know as like the guy who curates the talks that the talks don't actually matter that much. And it's just the people just showing up and chatting in the hallways. So, you know, it is what it is. Like I, you know, we want, we want to do a good show. We want to help people who are not in San Francisco, get a sense of, of what the state of AI is. But yeah,

at the end of the day, people are just going to meet in person and talk offline, you know, to decide what to do next. So yeah, that was the whole thing, like quick response to what is working and what people want more of. And then also, I think for the summer conference, the set of things that I think you must have, even if they're not that super exciting, like no one, you know,

super really cares about security, but they do care, you know, at the end, especially when they're putting in their work. So yes, we have a security track, you know, like, because we have to. And then my job is to make the most interesting, practical, find the most interesting, practical speakers that won't bore you to death about, you know, things that you already know you should be doing. So stuff like that. I also wanted to focus on jobs. So like, I think that the,

One of the smartest things I did with AI engineering was just literally name it after the job that I was trying to create. And I think that there are adjacencies around engineering for AI PM and AI designer that work with AI engineers.

So I'm giving them a shot. I'm inviting design and PMs to talk about how they work with engineering or just thought lead on how they should do their thing. We'll never be a full PM thing, product management conference. We'll never be a full design conference. But I think if we can show them that they have a seat at the table with engineering, I think that's something that they want.

Yeah. So this actually, and the, one of the themes obviously that I wanted to talk about is, is vibe coding. And I actually think that this feels like a, an interesting bridge here because, so we just had this note from, uh, from the Shopify CEO, right? That's the sort of the, the new AI mandate. And one of the pieces of it that sort of, you know, obviously the one that everyone focused on was the no new hiring until you've proven that an AI can't do it. Um, but one of the ones that was most resonant to me as someone who's, you know, building a company in this new context is,

was effectively the, you gotta be prototyping with everything with AI, right? And so, you know, he didn't put it quite this crisply, but like, you know, there's a soft ban on talking about product stuff as opposed to showing product stuff that you can get your hands on. And that's a shift that we've made internally as well. Like, you know, there's one, you know, out of the, you know, core contributing team of six companies

with super intelligent, you know, one lead engineer CTO, but everyone is using lovable or bolt or for sale or whatever their, you know, their, their preferred tool is.

When they have ideas for features or things they want to change, right? It's just become sort of the norm. It's just from a pure efficiency standpoint. It's way easier for people to like, you know, go to step two or three in their own thinking about what they're trying to articulate by actually seeing some weird little prototype of it. And it's massively easier. I mean, it's a 2x, 3x, 5x, you know, difference between

in terms of their ability to communicate it to others at speed. So I think it's super interesting just to have that bridge in because, you know, it's part of the new capability of sort of AI engineering is it is inherently more invitational or maybe accidentally more invitational is a better way of putting it for non-engineers to engage with engineers on their own level in some way.

Yeah, I'm not sure what to say about that apart from I generally agree. You know, I think it's an enabler for all parts of the organization. And, oh man, I always want to...

have this like recommended stack of things that people should try should be trying out but i know that if i do this then like people will be pissed at me for like not including things or like miscategorizing things but like it's almost like a necessity that you should have one of these in in your company right um and uh it's it's fascinating like i think the people who are

self-driven and don't mind getting their hands cut on the beating edge sometimes, they will find the workflows that make them a lot more productive and that will win them in the competition of ideas, I guess. So I'm a little bit idealistic about that. But yeah, happy to double-click on anything. I think vibe coding, by the way, obviously coined by a mentor of mine, Andre, and somewhat taken out of context,

Yeah, a big time. Yeah. Very big enough context. Yeah, wildly. He was literally talking about vibing. Like, he didn't say a glass of wine, but you kind of imagined him listening to like, you know, I don't know, like modern jazz and drinking wine while he's doing, you know, talking to his coding tool. Yeah, but I think he's coming from a place where he has expertise to look at the code, not read every line, but like get the vibes of the code. And if it looks correct, it's probably correct and committed and move on.

And now it's being taken to mean that you don't need expertise and you can just vibe it out and hope for the best. And a lot of people actually are very successful. That's why Bold and Lovable are doing so well. But I think then they also get into trouble and they don't know how to get out of it. And there will be a lot of wasted dollars on that. Maybe some of that waste is fine because it's so productive when it does work.

But I think like what I'm trying to enhance here is what are the best practices? Like how do you stay on the rails and not go off track when you are vibe coding? Yeah, look, I think that the explosion of vibe coding as like it's clearly touched a nerve in terms of, you know, an expansionary force for who gets to create what.

with it has come this whole new set of challenges and problems which need to be taken on one by one. And, you know, like, we, I think about that a lot in terms of this sort of, you know, how accessible to the enterprise is it? And maybe it's not just vibe coding, you know, with the sort of text-to-code tools, but just this entire new set of

of sort of agentic, you know, or agent-enabled coding environments, you know, they're strange or perhaps, you know, you wouldn't expect resistance to a lot of this inside big companies. And the sort of the illegitimate part of that, I think, is often the

just the desire to not see things change, you know, like engineers who kind of like the speed at which they get to move inside big companies and don't necessarily want to force for, for that to, uh, to, you know, quintuple overnight. Um, but the, the more legitimate sort of critiques are that there are a lot of these things aren't, aren't optimized for, you know, big legacy code bases that have thousands of different contributors. And, you know, the person who wrote the code today might not be there tomorrow. And, but, but it also feels like those like, eh,

every single new challenge associated with this sort of different approach, it seems like every day there's two new startups that pop up to solve that particular challenge. It's like whack-a-mole for these new issues. What do you think about, I guess as a quick preview, what are you hoping to bring with that vibe coding track? Is it just actually getting into what those challenges are and how to solve them? You have a particular take on this. Yeah.

It's more experimental than that. I probably just want to sample a set of the conversation for people to discuss. So I want a live demo of how a good vibe coder vibe codes. Just maybe because people can learn from that. I want to have a negative talk on vibe coding. Like why you should not vibe code or why vibe coding is doomed or whatever. I want to have a talk from someone building a

a vibe coding platform, probably Bolt because I'm closer to Eric from Bolt. And I want to sample that space and allow people to explore because I don't really know what I think about it myself. All I do know is I'm very pro people having more autonomy and power to create software and to...

I have so many designers and PMs telling me that just because of these coding tools, they are able to do the things that they wanted to do without the permission or the prioritization from the engine team. And that's fantastic for them. And it's fantastic for their customers too. And so there's something here. Honestly, I don't know if Vibe Coding is the best name for it, but it's what people have right now. So I got to call it that.

Yeah, it's super interesting. So this is awesome. I love getting to talk to you about this stuff. Maybe by way of closing, as you think about the a lot of the conversations that that we have.

are around leaders thinking about AI and agentic transformation across their whole companies. And, you know, as I was kind of alluding to, one of the interesting tensions is that it feels like outside of the enterprise,

a lot of the use cases that surround coding and engineering are places that have the highest product market fit, or at least there's so clearly that the biggest change is happening. And yet that tends to be sort of a more recalcitrant area when it comes to, you know, the engineers working inside big companies. If you were,

If you were a, you know, a sort of a general leader, right, a CEO of a company who's trying to think about how to help support, encourage, demand that their engineering organization starts to evolve on the basis of these of these changes. How would you think about that? Or, you know, where where would you have them start to pay attention or dig in?

Yeah. This is why we have a leadership track. We've rebranded the leadership day into AI architects because that's what Brett Taylor calls them. Yeah. So it's weird, right? Because on one hand, the engineers have all these terms and jargon they're throwing around. And then on the other hand, I feel like the leaders just have to keep their heads down and mind the stuff that their engineers are not doing, which is like the compliance, the security, the legal, everything.

you know, all that other stuff that people don't want. And then, but there is one meta thing where they have to be on board, which is defining strategy and hiring.

So those are the two things where they really need to be very much in sync with the engineers and the other elements they can kind of dictate to their engineers. And so does that make sense? Is there anything that you want to double click from there? No, I know that totally makes sense. Yeah. So there is a session on defining strategy. I basically always have a hiring talk in every conference.

And I think it's really close to software engineering for now in the sense that you're 90% a software engineer and then like 10% of your interview loop or requirements or whatever will see your requirements of AI. But I think that will separate over time. Once you start

building all these disciplines in tool calls and planning and control flow and authority and all that, that starts to become its own discipline, which is why I think this AI engineer job description, you know, we're still exploring it. We're still building out, you know, three years in and it's, it's, I mean, it's exciting for me because I get to help to define it. And I also, you know, meet everyone important in that field, but it, it,

With everything, it's a nebulous concept. The lines are definitely blurry.

No, it's awesome. Well, look, I'm super excited. I, you know, I use these events as a sort of benchmarks for what I should be paying attention to. I would encourage other people to as well. Thank you for, for coming on the show again and look forward to the next time we get to catch up. Yeah. Thanks for the support, man. Come if you can, come if you can. Yeah. It's in June 3rd to 5th and in NSF and we're, I mean, we're making a yearly thing. So we're already planning 2026. It's going to be outdoors, which is fun. So,

San Francisco is beautiful in the summer. Yeah, love it. All right. Thanks, Sean. Thanks.

MCP, Agents and What AI Engineers Are Thinking About Right Now feat. Swyx 46:02 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

MCP, Agents and What AI Engineers Are Thinking About Right Now feat. Swyx