We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode MCP Co-Creator on the Next Wave of LLM Innovation

MCP Co-Creator on the Next Wave of LLM Innovation

2025/5/2
logo of podcast AI + a16z

AI + a16z

Transcript

Shownotes Transcript

I have to ask a question here. How do you define an agent? Oh, I'm not going to get into that. What do you think is an agent? I think it's a multi-step reasoning chain. It's very simple for me. Okay. Yeah. I can get behind that. For me, agents is potentially more like

Welcome back to the A16Z AI podcast. It's been a while, but here we are again with another great discussion about the fast-moving AI space. This time, it's MCP.

or Model Context Protocol, which has been a major topic of conversation this year as it means to open up new LLM use cases and ingestic behaviors by connecting models to any number of new tools, data sets, and external applications.

And here to talk about it are H16c Infer partner Yoko Lee and Anthropix David Soriapara, who created MCP along with his colleague Justin Spahr-Summers. Among other topics, Yoko and David discuss the MCP origin story, early and popular use cases, important work still to be done, for example, around authentication, and what is the right level of abstraction for carrying out certain types of workflows. It's an insightful and timely conversation that you'll hear after these disclosures.

As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com slash disclosures. So MCP is...

First and foremost, it's an open protocol and does not say much yet. But what it really tries to do, it tries to enable building AI applications in such a way that they can be extended by everyone else that is not part of the original development team through these MCP servers and really bring the workflows you care about, the things you want to do to these AI applications.

And for that, it's like a protocol that just defines how, you know, whatever you are building as a developer for that integration piece and that AI applications, how they talk to each other. And that's really what it is. It's a very boring specification. But then, you know, what it enables is...

Hopefully, at least in my best case scenario, something that looks like the current API ecosystem, but for LLM interactions with some form of context providers or agents in any form or shape. Yeah, I really love the analogy with the API ecosystem just because it gives people a mental model of how the ecosystem evolves.

It feels like API when it first came out is an abstraction on top of a set of things you can do on a different set of servers and services. Before you may need like a different spec to query Salesforce versus query HubSpot.

Now you can use a similarly defined API schema to do that. Not exactly the same because everyone defines query parameters differently. And then when I saw MCP earlier in the year when I was building something with it, it's very interesting that it almost felt like a standard interface for the agent to interface with LLMs.

It's like, what are the set of things that, you know, the agent wants to execute on that it has never seen before? What kind of context does it need to make these things happen? When I tried it out, it was just super powerful. And I no longer have to build one tool per client. I now can build just one MCP server, for example, for sending emails. And I use it for everything on Cursor, on Cloud Desktop, on Goose.

So curious about, I guess, like what's the behind the scenes story? What inspired you when you first realized, oh, we need a protocol for this? And how did you create it? Yeah, thank you. I think that's an interesting question. And with all of these type of ideas, I think they never like happen in a vacuum. So I was like, I joined Anthropic about a year ago, pretty much actually a year ago. And I was working mostly on...

how we can use cloud more internally to accelerate ourselves. And as part of that, one of these original ideas I was thinking through is I cannot be the person who builds for everyone, their specific workflow, their specific things.

But I need to enable them to build for themselves because they know best what they need and how their workflow and their like agentic bits that they want to build fit into, you know, their system on the ecosystem that they're working in. And so that was one aspect of that. The second aspect of that was

That at the time I was using both Cloud Desktop, which was amazing with its artifacts that really allow you to greatly visualize things. But it had this like limitation that basically you had to, there was no interaction with anything outside of the text box. You couldn't add, you know, Google files yet or anything like that.

And at the same time, I was using a code editor, which was amazing because it had access to all my code and had access to all these cool things, but it couldn't visualize anything very nicely as Cloud Desktop. And I was very frustrated by just copying...

things from like cloud desktop back into the editor and back and forth. I'm like, there needs to be a better way across these two applications. Right. And then if you take these two things together, like I need some way of enabling people to build something. So like some form of API. But at the same time, I also want to work this across multiple applications, like a code editor. For me, there was the Z code editor, which I really like. And cloud desktop, which is, you know, obviously my favorite desktop application. You look at like,

how do I solve this classic M times N problem of like, I have, you know, M clients and I need like N providers. And the answer is a protocol for that. And there's always been protocols for these type of thing. And there are many patterns in the past that match that. And that's how I came to go, hey, I would really love to have some form of protocol that enables me to tell Cloud Desktop, to tell, you know, Zed, to tell Cursor,

the workflow I care about and the things that I miss from it because I'm a developer and I want to build for this and I know how to build for this. Just let me do it. That was really the origin in that regard. And that was just the idea. And I took this idea to a person called Justin Sparr-Sommers, who is the co-creator of MCP with me. And he took a real liking to that idea and thought it was a good idea.

And it was really one of the key people to, you know, prototype the initial version, really make it work within part of the product side of Anthropic and played a really big role in like making this a rather big thing in Anthropic initially. Yeah. And so we both basically co-created this together until, you know, we release it into the open in November 2024.

I love that. I love the creative partnership here. And then with the framework or protocol, I will have to ask, like, it's kind of a chicken and egg problem. Do you create the instance of, you know, you can implement using protocol first or do you have the protocol in mind? And if you created something of, you know, concrete instance or examples of it, what was the first MCP server or client you created?

Yeah, that's a very good observation. It's a very classic chicken and egg problem. And the way we usually do this internally, and like Justin is amazing at this, is very rapid prototyping. So we did have just a very intense few weeks of writing prototypes, very simple, like simple things that just demo it, like for the most part initially. So one of the first ones was

that we wrote was the Puppeteer server, just like the ability to control a Chrome instance. And one of the reasons you do this is because it is a very active process. There's something happening on the screen and it makes people wow, which is what the effect you want to get to people. You want to be like, how do I convince people that there's a lot of possibility here? Hey, I can control your browser and I can do things you couldn't do before. And Claude is the one doing it, not yourself manually doing it.

But while we're doing this, we're refining the concept and we spend a lot of time discussing

I wouldn't say fighting, but definitely like having an interesting discourse about certain primitives you want to put in and leave out. And there was a lot of changes to the way things were working in the first few weeks. And again, like the first MCP client that I think Justin was writing was he wrote it into Cloud Desktop. I wrote it into Zed. So that happened kind of in parallel. And then I think the real use case MCP server that we really had for ourselves was

one of these very boring ones, like maybe a GitHub integration or something like that to just help me doing my work better, like a Postgres server. Nothing super fun, nothing super creative, just like the most obvious thing you would go and want to do. I would say...

I would say the Puppeteer example is super creative, you know, because you can't really have the agent do anything for you. Recently, I've seen an example of, you know, the gibblification process. So there's the, someone have a MCP server that control the browser to ask for models to generate gibblified images so they don't have to, you know, implement API endpoints. That

That blew my mind. That's pretty cool. That is very good. Yeah. I guess outside of your first initial use cases, since you've probably seen every MCP client server out there in the community, what are some top interesting implementations you have seen for MCP servers or clients? I

I like when people get creative. I think it's great that people build a lot of these integrations that are very sensible and are quite straightforward. Again, like the Postgres servers, the GitHub servers, the Asana servers of the world. But what I really like is when people get creative. I think one of the things that made me just laugh was this person very early on around Christmas just hooked up a cloud and cloud desktop to

to their Amazon account and just had Claude buy their Christmas gifts. And I always thought this is hilarious. It's like so funny. That's amazing. How is that implemented? Does it have payments? I forgot the exact details, but I think it was some combination of basically Playwright or like Puppeteer controlling the browser, but it was like deliberately built around, you know, something from Amazon that I want to buy. Also like the set of like gifts. And so I love these type of things a lot. And yeah,

I like when, you know, I've seen your Morse code, MCP server. I love these kind of things. These like playful, engage with technology. Years and years ago, I was like a pretty active member of local like hacker spaces and these type of things in Germany. And I love the creative way people interact with technology and try to build things.

And so every time I see these kind of combinations, they're just beautiful. And we talked about this a bit later, right? Like when people deal with synthesizers in Unity and Blender. But then there's obviously like a fun, interesting technology part to that. I thought JetBrains did a really good job having an MCP server that can control their IDE. And that is a bit more of a complex setup. And I love that part.

And then there's fields that I didn't even think about. There's a somewhat famous YouTuber called Laurie, who's a reverse engineer, and they used a cloud to help with reverse engineering some files and using MCP.

I thought that was kind of cool because it's like some of these things nobody would ever build. Right. First party. Totally. A reverse engineering tool into their desktop. But that person can just go and build it themselves because of course they have the ability and they have the skill to do that. And so that's the kind of stuff I love. I just love it when a protocol kind of unlocks the long tails. When the long tails is really long because no one, as you said, no one else will...

built it as first party, but now everyone can build the software for one. Yeah. I'm actually a little bit curious, like what are one or two examples that you had that you found quite funny and interesting for yourself? Yeah. So there was one I built. It's actually a very practical use case where sometimes I'm like so into coding, I skip dinner.

So obviously my husband will be texting me. He's like, where are you? Are you home for dinner? And then I just use the recent MCP I built. This is like another beauty of it because with the same MCP server, you can unlock very different experiences by entering different prompts.

So instead of sending him an email, so I asked Cursor Agent to, can you text my husband at this number and explain why we're late for dinner? Because Cursor Agent has done most of the coding I was just reviewing. And I texted my husband. And it's like a number my husband can reply to too. So it's like a very practical use case. That is such a good use case. It's so fun. Right, even explain like we got stuck here. I couldn't debug this. Yeah.

I felt really bad for the agent. I love this. This is so creative. This is like exactly the kind of little bit of magic that people get off using it. Yeah. And so the Morse code example, it was so much fun to build. It was more, there was someone on Twitter who asked, like, I want the coding agent to notify me when, you know, finishes the task. Because sometimes it takes five, ten minutes.

So I thought, what's a really funny way for you to communicate with a human? Obviously, like you can text, you can play some music, but then we have a lot of Philips Hue light bulbs at home. So I thought, what does it take for the agent to get access to my local network? Because it's under the same IP and I just control my lights. And how do you speak with the lights through the Morse code?

So I kind of picked up Morse code that week. It's a lot to debug on like what's long, what's short, what's the interval.

So in the end, the experience is cursor or like cloud desktop. When it finishes the task, it will start a Morse code sequence on whatever it will, you know, have to say through Morse code. And now you just need to listen or see it very closely. So that was a lot of fun to build. And that we have three cats at home and they're all freaking out because the lights are just on and off, turning on and off. Another one, you know, since I started using MCP as a developer,

I was going back to my previous projects I really just built for fun and thinking about how can I rewrite this using like as a MCP client so I can plug in any MCP server on it. So as an example, last year, so I built this Raspberry Pi cat narration project where using the Raspberry Pi camera to detect if my cat is jumping on the kitchen counter.

and they will narrate what the cat is doing or yell at the cat. So I'm actually in the process of converting that agent loop into an MCP client so it can use a 11-laps MCP server to actually yell at the cat. And then it just unlocks net new examples like this. I just love, you know, building and playing on the side. I need that version for my dog. Oh.

I'll send you a Raspberry Pi later. Most of the LLMs nowadays are still too big to run on device. So I still have to call, you know, cloud or some other models to make that happen. But the fact that now I can make the cat detect very extensible, it's very interesting to me. So now not only can I call

11Labs MCP to like, you know, yell at the cat. One, I think underselling feature of MCP I've found is that the client could chain together different tool calls. So not only can it, you know, use 11Labs to yell at the cat, it can also send me an email to say what the cat is doing. So I guess like speaking of underutilized protocol features, most of the people today, they're implementing MCP service with tool calls.

But we know that there are so many other features to be unlocked. So curious about your thoughts here. Like what are some underutilized features that you feel like people should start experimenting with? Yeah, this is an interesting one because when you're creating a specification, you have all these use cases in mind and you think about it in a very...

principled way and out of that comes a set of primitives that you want people to use and then, you know, reality hits you and people use it very differently. I think obviously people use it, as you said, for tools. But there's, I think, two or three things that I really think that are quite underutilized and I wish people...

Would use it more, but I think there's a problem particularly around client support initially. But the one thing that I really love in the protocol actually is a very poorly named feature called sampling. Because it's quite confusing what it does, I think, when you read the name. You want to explain what it does? Yeah. So when you really think about what you're trying to do, it makes a lot of sense. So what it is, what sampling is, it's a way for the MCP server

to say, I want to call an NLLM, but because I'm an NCP server, I don't know what is the NLLM that the client is using. And I could bring my own SDK, but then I'm binding myself to that SDK. And, you know, that might be an Anthropic SDK, that could be an OpenAI SDK. But now I'm expecting an OpenAI API key or an Enclaude API key from the user. And that's really not great. And maybe they use a different model in Cursor.

And so sampling is a way for the MCP server to go back to the client and ask the client, "Hey, can you give me with the current selected model, you know, a completion, like a sample from the LLM?" That's where the name comes from. "And give that back to me." And that way I can do MCP servers who would go and, you know, summarize a Reddit post or summarize whatever I might want to do, or even have their own agentic loops themselves.

But the controller of the LLM inference is still declined. And so there's a lot of, I think that's the really cool bit, that you can build these MCP server that are very rich, that go way beyond tool calling and have them all completely model independent. And that's really what it's for. And we can talk later, we've combined them in the right way. This has a lot of cool properties. Right.

But that's not one of the features I would love to do, has seen more people use. But again, this is a matter of clients don't support this very well or at all. And so I wish more clients would support it and then more people can build it and can build these more richer things that go beyond just tool calling and be agent loops, be summarization bits and so on and so forth. So that's one of these features.

This is so interesting. I guess one very concrete example I always wanted to build with this model you mentioned with sampling is actually code review agent. So in this case, like, oh,

I will want to build, you know, like a server that does code review, but it may want LLMs to complete this valid syntax since it doesn't want to, you know, bring its own LLM. So it feels like a very natural jump off point. Yeah. What does it take for clients to support this? They just need to do it. There are obviously reasons why certain clients wouldn't want to do it. You know, particularly people like clients with fixed subscriptions might have, you know, might prefer not to do this because, you know,

you know, it suddenly becomes an API. But other than that, I think it's just a matter of client support and priorities. Obviously, clients support what people do. And so they are mostly focused on tool calling. There's so much going on in the spec that needs to be added. And the heavy lift in all the MCP land is very deliberately actually on the client side, because we expected way fewer clients and servers.

And so we want to make it very trivial to build a server. And so every complexity that we could shift to the client, we put to the client. As a result, it's just hard to build a really good full spec compatible MCP client where it's very trivial.

to use any feature you want on the MCP server side. And so just like, they're just a little bit behind and it just will probably take time. And for some of them, it might just not make sense out of like the way they deal with inference in general. But at the end of the day, it's just a matter of like, just waiting and seeing that some people like implement it. That's the end of it. Right. Sampling is such an interesting concept to kind of, at least when I first saw it, I was like, oh, this is so powerful because the divide between client and server

It's less of a physical one, but more of a logical one. Yeah. So technically you could write a server that's sampling with another client that's also a server. I know it's kind of like complex when you describe it, but like, can you give us an example of how to best use these kind of chained server, you know, client combo? And how does that relate to sampling? Yeah, I think that's a, you're alluding to a very interesting piece. Interesting enough, we have very early in the process,

build ourselves. So we had the prototype for what you're describing. I'll go into detail in a second. We had prototypes of this actually before we even released it to the public. But what you're describing is, you know, you take an application that is an MCP server that exposes tools to an MCP client, but you also, within that MCP server, you are using an MCP client. And so you can also use other MCP servers downwards. And so you have this little program that

which is an MCP client and an MCP server at the same time. And what I think about this as upstream and downstream connections. And now you can chain these things indefinitely long. It's probably not very practical to do them indefinitely, but you can definitely think about it a few chains. And you can even go as far as create holographs out of this. And you can very quickly envision worlds where there's an MCP server that has an agentic loop that orchestrates things

two or three other MCP servers, their tools, do a really good agentic loop. And then you can have this like entity out of like three or four servers and give them to like a client, like a cursor. And I think that's a very interesting concept that feels very agentic. Particularly if you then use additional primitives to go beyond tool calling, such as resources or prompts where there's like additional data streams, basically that MCP servers can expose or data that they can expose upwards and downwards.

And I think then you can actually model quite rich interactions. And I would love to see people play around with more of that and use, for example, like

an AI framework like that, you know, a Pydantic AI or a Langchain, whatever, to build a connection of like client upwards, client downwards, server upwards, and then change these things and see what happens. And then you are suddenly free and you can go to a user and say, hey, which five MCP servers do you want this agent to control? And you might have a very general agent loop

And people can go and experiment and they can suddenly have, you know, cat monitoring software connected to an agent that also speaks, you know, email, WhatsApp, whatever it might be. And as you mentioned before, there's a lot of power in using LLMs for these orchestration tasks. And so you can build these complex systems, these complex algorithms.

agent graphs using that technique you described quite quickly. Yeah. Since you also mentioned resource and prompts, which are the other two very powerful and underutilized functionalities today in the spec. I really think these are the sleeper hits of MCP.

Do you want to briefly explain how, you know, how does the developer leverage resource and what is prompts as a concept? Yeah, I'm happy to do that. It's one of the things to understand when we think about MCP is that MCP is focused on

how the primitive that you're exposing interacts with the other side, usually the user, but it could be an agent. And prompts are meant to be driven by the user. For example, that the user explicitly adds it to the context of a call. And so prompts are textual

that people can insert. But the interesting bit is on one hand side, they can be very static templates, you know, an example of how to use this MCP server, but they can also be very dynamic. They can be just as much static

API calls under the hood. So we had, for example, an MCP server that exposed prompts that downloads a stack trace from like a Sentry API. And so now that goes into the prompt. But I'm, as a human on the other side, I say, I want this in the context now. I don't let the model decide it. I decide it. And that's the difference between a prompt, for example, and a tool. And so resources, on the other hand, they are quite unique because resources are...

They're just like blobs of data and they were, for example, can be very easily used to model something like a file system towards the MCP client. And in this interaction model, it described user-driven, model-driven, tools being model-driven, problems being user-driven.

Resources sit in between by being application driven, whatever that might mean. And so an application, for example, Cursor could choose to say a resource can be added to an agent, similar as you can add a file to an agent. But it could also, for example, do things like ingest a resource into a rack system first.

and do retrieval before, right? Because these resources could be arbitrary long. So one of the things we thought about of like, you know, very early on in the piece, do you actually need to build something for retrieval into this? And we came to the conclusion, hey, if the client controls the retrieval bit,

Resources can just go into this and into this retrieval system and can be used that way. And if you wanted to do it on a server side, you would use a tool. So those are these distinctions that I think people have not really caught on yet to. They're also like, these things are fairly rich. Both tools and resources can be audio in the new spec. They can be images. So there's a lot...

people could do you know you could expose your current screenshot as a resource these type of things that i think leave a lot more use cases open and to explore that mcp can has to offer but i understand people do tools because it's the most obvious thing to do this is such an interesting point when i first look at resources i almost felt like it's a mind shift i just

Like traditionally as a developer, I always thought resources will be on the side of clients. So the client will expose resources and then query it locally.

But in this case, it's almost like the MCP server is exposing a file system that client can't query. Yeah. Curious about your thoughts behind that. How did you think of the model? Like, how did you decide that it's going to be like a server-side versus a client-side thing? And what does it entail for the transport layer? So for the...

I think the initial model was like, MCP was like, how do I provide context in these different user interaction models? And so for that, resources came quite naturally actually out of the need of like, how do I actually...

enable an MCP client that doesn't have access to the local file system by itself, but I want to give it access to the local file system soon. And now a bit of history, looking back into July, August 2024, a cloud desktop would not have access. You can upload files and these type of things, but it's not as natural to add a file system to this.

and similar to some agents that we might have internally. And so it felt very natural to have something like that. That was really the genesis of this, of how these servers are supposed to provide context. And so there are some of these. And now for the transport layer, MCP in the end of the day is just transport independent, which was quite important for us.

So initially that came out of the local use case where I wanted to use standard IO, which has a lot of niceties of, you know, the lifecycle of the MCP servers controlled by the client automatically. There's a lot of things they can do. But it also means you just

You can't really speak, you could technically speak HTTP, but really realistically you're speaking something that is like line-based. And say you're speaking something like JSON RPC. And that's very heavily inspired by like how the language server protocol does this, which is very, very similar. It has an interesting property that I'm somewhat ambivalent nowadays today about because it has some drawbacks and requires certain things

that probably would be better in a more classic API-like way on the HTTP layer, but it still enables people to go and at the same time and implement MCP over other transports. You can, if you like, you know, I used to work at Facebook for 10 years and there you use these thrift RPC mechanisms internally and that's all these security infrastructures built around this and you could just build MCP over this and there would be no, you know, no change required. You just

do a different transport and both sides are still happy. I mean, so that's why that's one of these reasons we chose it for that flexibility and partially also because it was an evolution from standard AO to HTTP. Yeah, that's so interesting. One of the top questions from just talking to a lot of developers who are building MCP is,

how do I authenticate MCP both from client to server, also server to tools? Yeah. There's so many different great ways to make it happen. There's also spec involvement. So I guess what's your thoughts around, you know, how auth will shape up around MCP in general? Oh, that's such an interesting and deep topic. I think the interesting bit is that, you know, everybody wants authorization. I think it's clear that

The current implementation that people effectively use with local MCP server, which is just give me an API key or some form of token via an environment variable is usable, but it's not exactly great. And particularly for the case where servers will be remote, it's impossible. And so we are

We have an early part of the specification around authorization, which just uses OAuth. There's some caveats to that. And we're working very closely with the original OAuth authors and experts in the field to make this really go well. But I think there will be, there is an initial focus on how the user authenticates, which is different potentially. Not sure yet. It's potentially different how agents will interact with each other and authenticate with each other.

And for now, we want to solve the user and the human server problem. And for that, we would just use whatever the auth spec in the best possible way. Because it turns out when you innovate on the levels of primitives and other things, you want to stay as boring as possible for everything else. But what authorization does, of course, it enables a very different set of MCP servers.

Because it enables MCP servers that are remote, that are bound to a company account, that are really driven by a professional service offering something for you that you have a subscription to. You can envision, I think PayPal has an MCP server, for example. You can see, I want to use this MCP server. I log in with my PayPal account. Now I can use this MCP server.

and I'm authorized. And it opens like this company and corporate ecosystem that I think will be super important in our day-to-day lives. While at the same time, you know, MCP still retains this like

a bottom-up hacker mentality that it had originally for developers. But it's just like the authorization is the key step to this much, much richer ecosystem for professionally developed MCP servers in the end of the day. I guess when we talk about auth, there's two layers. One is authentication. Do you get access to this thing? And then authorization, which is what are you scoped to get access to?

It's very interesting because I see these concepts sprinkled in different layers on MCP. So for example, you know, you could scope access to certain resources. Say I can only access resources that's this specific folder. And then there's also obviously like third party auth from server to, you know, all the API providers.

How do you think about it when it comes to authentication versus authorization? And what would you want to see from us providers in the wild that, you know, what's what needs the most help when it comes to what makes developers lives easier?

That's a good distinction, good question. I think what we're focusing on at the moment is mostly authorization. So like, am I allowed to access this resource? Because that's what people want. We don't necessarily, and I have yet to see a lot of use cases for this like authentication part, like which identity am I, who am I, and who are like, there's other parts to that.

I think that will come later of who is acting on behalf of whom. Particularly in an agent world, this will be important. But for now, we're tackling one thing at a time with the biggest boulder in the way first, which is for at the moment, how can I get access to something that is behind some form of authorization and I need to do this. And so that's what we're tackling. And so I think at the moment, the focus is like,

100% on authorization. And then I think from there, we will potentially in the future go about authentication and like identity and these type of aspects. Now for all providers, the thing that, and luckily a lot of the big ones are doing this, is just engaging with us and telling us

of what is the common denominator that everyone has that we can build upon so that developers feel they have some safety and that it's not like, oh, you can only use this with this provider. And talk to us, what are you willing to implement where there are potential, like in this like agentic world, things missing in authorization pieces.

And luckily they do this, right? Like the authorization specification development that's currently going on is driven by like a combination of a very engaged people on the security identity side of Microsoft, from Okta, from AWS,

And so it's like the right people in the room already that are in many ways way better suited to help me make these decisions than I myself, because I'm not an identity and authorization expert. And so I just want to hear more people that are experts in this field to tell me what's the right way to do this so that we can all figure this out together. And that's really what I want from people. Yeah, amazing. I love this community-driven development and iteration in the spec too. Every time I check out MCP specs, there's like hundreds of issues.

So a lot of respect on how you guys kind of groom the issues day in, day out. Another topic I want to dive into, we kind of talk about a little bit is in the creativity field. How does MCP work and what are some use cases? Because today, most of the clients we have seen are very developer focused. It's very natural when it comes to new technologies adoption cycle. Because developers, we know how to configure it. We know like to put in a JSON blob.

But then recently I started to see very interesting, cool use cases with more creativity tools like Blender. You can now use words to create a 3D model. And then you can use MCP servers from like a Unity instance. You can have your own synthesizer, you know, so on and so forth. What are some top, you know, creative use cases you have seen or you are most excited about and want people to build more of?

I'm actually curious about your take later, but because you're a very creative person. But for me, I find that goes back to what I love about MCP is this like ability to bridge gaps of what you care about in the world and what you care about in your life. And so when I saw, for example, the Blender MCP server, which I think was one of the first original like big ones, or there's one about like where a person connects cloud to Ableton. Yeah.

I just find it so fascinating and really cool because on one hand side, I'm just astonished that LLMs are really good at this actually. And you're surprised because it's a side of LLMs you would have never seen before without MCP. But on the other hand side, I just love the creativity of connecting these tools and then actually getting something useful out of it.

Of course, you know, to create a creative process and you know this better than others. As a creative person yourself, it's an aspect of control that every artist wants to have. And LLMs and MCP doesn't give that to you, but it gives you a different set of interfaces to something. And I think it's very interesting and creative to play around how you can describe, for example, a 3D environment. And I think that's a very unique thing. Right. Because...

An artist that's an environmental artist in Blender has probably never had the ability to really express itself in words. And maybe, you know, how can you write a poem and have it translated to a 3D environment? And that's super fun. And then, of course, you want to go back into Blender because you need control. But I think it's a great, fun exercise and experimentation bit that I think helps creatives to look at it in different ways, if anything. And then, of course, you know, I'm...

I love synthesizers. I'm a terrible musician myself, but I love them. And I love this idea where people use, for example, Claude to program patches onto physical synthesizers. And that's just fascinating to me that Claude can do it, but also just cool to see that people have thought about

the LLM to something that's a physical thing in the world that makes us sound afterwards. And so I love that part. But I'm curious what you think about this because you're a very creative person. How do you think about this aspect? You know, I've been thinking a lot about kind of along the lines of what you mentioned, the input of, you know, clients. So today the input is mostly words. So we describe what we want to see. But then we know that words

words and actions or visuals are never one-to-one. So it's very cool to have a starter template described by words, but then the later iteration has to be dictated by, you know, the artist's choice. So for example, I'm a huge user of Procreate. And then in Procreate, you don't really describe what you want to see. You just draw what you want to see. And

And then so much of that is controlled by the latent space in my brain. Like my brain is not describing what I should be drawing, right? That's not how like my model works. It's more controlling the muscle to kind of deciding like how to draw this curve of a line. What's the color that looks good to me?

So to some extent, I almost feel like the MCP client really severely dictates what the whole experience will be. For example, if the client sends like some Bezier curve to the server and then have the server decide, is this something that looks good to you?

Like that's not something that we have seen very often yet. So today the input for either code or, you know, languages, that's very common. But later I wonder what kind of experience we'll have if every design tool becomes MCP client. I don't know. I have no clue what this is going to look like, but I think it's a very interesting thought exercise. Yeah. Here comes a philosophical question on agents, just based on everything we've talked about, which is,

What do you think is the ultimate communication mechanism or modality for agents? On the one side, we have natural language. On the other side, we have programming language. I mean, technically, we could frame all the problems in the world into a programming language if that language supports it. And then on the other side, we have input modality as pixels.

screenshots, you know, sometimes videos. What do you think based on what you've seen on MCP servers and client interaction would be the ultimate, like the abstraction layer that you'll be like, this is the right way or a great way to provide all the necessary context for agents? I think the, yeah, such an interesting, it's such an interesting bit. I think the

I think I don't know is one part of the answer, but I think the real answer is there's probably a set of, there's a probably merit between having either of them. I think programming languages are a very good interaction pattern between agents. I guess there's a lot to say about dense mathematical, but, you know, slightly different form of language.

It is very, very clear about its intent and very constrained in a way, which programming languages are. And then there's a very free form of, you know, natural language. I think only natural language will not begin good enough personally.

That's a personal opinion I have. A combination of them might be the right thing. I think what the, so I don't have really an answer because I feel it's a bit too early to tell. And I think I want to see this space explored a little bit more. Yeah. So I'm like, when I look at, you know, development in this field, I'm like, I feel it's a bit too early to really tell what's the right abstraction there. But I think things like MCP enable people to experiment with different things.

and then, of course, other language frameworks that exist and a bunch of other things in the space enable people to experiment. But I think there's a lot more experimentation to be done to really understand what the actual general abstraction should look like. And if you think about MCP, under the assumption MCP sticks around and stays around as I hope it will, MCP is like...

two, three years into tool calling existence already. So it's like, we're like, we have seen a lot of these interactions before we have our somewhat general abstraction. And I think that, you know, we're a bit too early for agents to

see what this is going to look like. But I think your observation around there's so many different modalities and different options. And I just talked about the text side of things, right? And you already had pixels and other bits in there. And so I think there's so much interesting space to communicate. Who knows? Maybe models really like to talk about things over video streams and we don't know. Maybe that's in the animated modality we end up. It's just like,

video streams everywhere because they just like watching pictures of things. That's so interesting. You know, like these modalities bleed into each other as I do a lot of like random projects on the side. One of them is called AI Tamagotchi.

So it's basically like an AI-driven seed for Tamagotchi. So instead of just eating one thing, the Tamagotchi can request like 10, 20, 50 things, whatever the LLM state will let it do. One thing I realized is that I could use most of the models today to generate ASCII art and even ASCII animation. And then when I was thinking about it, it almost felt like a visual task, but like a language model still generates a sequence of tokens.

And then if I give the task to say like a diffusion model, it doesn't really generate a token and generate pixels. So the question is like, what is a better way to generate a sequence of images or sequence of ASCII characters to animate like something like this?

So it really... What have you found? What do you think of this? I actually think ASCII, like I actually am more on the language model side today for these stateful, like very predictable animation sequences. It almost felt like this is, you know, like a modality I didn't think that would have worked before.

But it did. Because predicting next token turns out it also works with predicting the next ASCII character. Like a lot of things, if you think about, you know, like transformer models and attention, you know, it would fit them. Like there's this

Sequential things are probably somewhat good to generate with it. Yeah. Smart observation. Yeah. And then the funny thing is I tried out a lot of different generation tasks, especially generating cats. Search for the internet, ASCII cats is really well represented in the data set. This actually brought me, like our agent chat brought me to this other question kind of on the high level.

When you think about the future for MCP, what do you want to solve and what do you want to keep evolving and what do you not want to solve? Because it does feel like a lot can be MCP specs problem, right? You could implement the rack, you can implement the database, you can implement everything, anything in the world. So I guess, how do you think about it? You know, what kind of things you want to keep executing and what kind of tasks you felt like are just not part of what specs should be taken care of?

Yeah, that's such an interesting question. I think everyone who builds a spec is faced with this type of problem of you need to stick to your guns, so to speak, and focus on the area you want to be good at and not try to boil the ocean, so to speak, and try everything. I think for MCP, there's a few things. I think there's evolution of the current part of MCP. I think there's a very clear path for evolution around authorization, around other parts of that.

But then I think there's potentially still place for a bit more abstractions regarding agents. But that's like a very low conviction opinion yet, because I just, again, back to like, I need to see this a little bit longer and I feel I really want to explore this space.

I have to ask a question here. How do you define an agent? Oh, I'm not going to get into that. What do you think? What do you think is an agent? I think it's a multi-step reasoning chain. It's very simple for me. Okay. Yeah. Okay. I think I can get behind that. I can get behind that. I think there, for me, agents is potentially more like

in this world like agency so something that does some form of autonomous orchestration autonomous task solving and that's usually anything that's a multi-step thing is for me already like an agent right at the moment it does two steps and it reacts to the first step it's basically an agent because it now has some agency over what it's doing and so I think that's

at the end of the day for me to it for the most part. But there's a lot of definitions of agents out there. So I think there's a potential there to think about this. I think MCP is somewhat good position in the sense that it allows for these graphs. I think some of these graph pieces that MCP inherently indirectly enables can also be dynamic, which I think is a very interesting and unique part of it. So maybe there's a little bit around agents

I'm not fully sure yet, but it's something that I definitely take a look at. And beyond that, I think, again, the rest at the moment is just evolution, like streaming and other bits, modalities. I think there's other interesting bits to MCP of like, how does something like that potentially fits into other model types that are not just pure text-based models?

I think that's a long-term interesting question. So what does this look like for video, audio, images, whatever it might be? I don't know if there is a use case for this or something like it. And that does not have to be MCP, but I think it's an interesting question to think about different modalities.

But yeah, again, I think for the most part it's modalities, it's evolution, then maybe there's a big maybe, big question mark next to the, do we need more for agents or can agents be already very well formulated in MCP abstraction? And again, that's back to experimentation. Yeah.

That sounds like such a fun experimentation. It is a lot of fun. Yeah, I often try to refactor my code and then try to refactor the single agent into multiple agent. Like if I agent, I just need multiple calls to make decisions on the chain. And then interestingly, most of the time for the tasks I'm trying to do, which is very simple, you know, send an email or like ping someone and like very long transaction workloads.

A single agent worked just fine. So I haven't really come across a use case myself that requires multiple agent collaboration. It's like a very complex task. But what's your view there? Do you feel like we're going to kind of go pretty deep in a single agent? Like it's almost like a technical detail, like a single call graph with LLM? Or do you feel like it will be multiple processes working together?

For me, one of these observations is that I think agents is less a function of how different of the task, but more a matter of trust boundaries.

If you have a travel agent that needs to have access to your bank or whatever it might be, there might be interesting bits where there are trust boundaries, which is like, that's where a protocol wants to be used in between rather than just being the same framework or whatever it might be. And so I do feel there will be some form of composability based on these trust boundaries because

you will probably eventually want to use whatever interface your bank gives you for agents and nothing else. And so there's a boundary that this needs to interact with something else. And so these things will happen to some of the more part of the world that require a bit more trust.

I think, you know, beyond that, it's a bit tricky to see how these are going to work out. And I can totally see a single agent or agent framework being quite powerful. But again, composability, the ability to switch things out for users that are not developers,

I think can be very useful in a way. And I think there's also like a question of will there be like two or three meta agents that drive other pieces that are MCP shaped or will there be everything be very specialized? And then, you know, you have developers build these different agents. It's a bit of a complex question to pay back to like experimentations.

But for my use case at the moment, a lot of these like single agent, few interactions, do all the things I needed to do. But then I think similar to what you're saying, but then we are also very early. Yeah. Very, very early, right? You know, really on exploring with agents and the models are at the spot where these things become very powerful. So, yeah.

We'll see what this is going to look like in a year. But again, I think that trust boundary is an interesting bit that I look at. And then how does an agent act on behalf of another agents in these types of aspects? And I think there might be protocol narrative might be needed. That's awesome. Amazing. Well, last question, I guess, I just love MCPS open protocol from day one. So as a result, you all amassed like a huge community kind of contributing, giving suggestions.

When you think about where you need the help the most in the next phase of MCP development, can you talk more about where you think you will want more contributors in? How do people reach you? How do people collaborate on the spec or other things related to the spec? Yeah, I think for contributions, at the moment we run this as a very traditional open source project. So,

What we're looking for is people maintaining, helping, writing issues, reviewing issues, reviewing PRs, writing PRs, building trust with us as the maintainers to hopefully help us longer. And so we're looking for people who just want to be active in the community, be this driven by companies, be it driven by individuals. It really doesn't matter to us.

So that's a big part of just going through the Python SDK issues, helping people there, going to reimplement some of the bugs there and see if they're actually a problem and get more detailed information, reviewing PRs when necessary, probably better writing PRs and fixed bugs. I think those are great starting points. When it comes to the specification itself, the lift is a bit higher and the bar is a bit higher.

So there is probably good if you either address a very specific need or write a very detailed RFC for it that might sit there for a while. You might rally up if you're like a company might rally up some support for it and come to us together. I think that helps quite a bit.

And so I think those are good starting points. It works very much like a traditional, again, project. We look into governance models that are a bit more sustainable in the long run, that are a bit more consensus driven. And so we're going to work towards that. But yeah, besides that, just come help out on the code.

The specification is a bit hard to work with, but other than that, if you feel strongly, just go for it as well. And yeah, build trust. We have a lot of people helping us. The Pedenti people, for example, do a great job with the Python SDK. The Microsoft people did a great job with the authorization specifications. Same with the Okta people and the AWS people. So there's a lot of things already happening. We have people helping us so much.

with the inspector. There's some, you know, just community contributors that I really highly appreciate. So yeah, just go and help and work with us. That's really what we need at the moment for the most part. This is awesome. I really enjoyed the conversation. This has been so fun chatting about anything, you know, from Tamagotchi to cat monitoring apps to, you know, MCP protocol and the future of it. Thank you so much for making the time, David. And then until next time.

And with that, another episode is in the books. Thanks for listening all the way through. If you enjoyed this episode, please do rate, review, and share the podcast among your friends and colleagues. And keep listening for more exciting discussions about agents and more, we promise, as well as more insightful interviews with founders and builders across the AI space.