We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Building Developers Tools, From Docker to Diffusion Models

2024/11/15

AI + a16z

AI Deep Dive AI Insights AI Chapters Transcript

People

Ben Firshman

Matt Bornstein

Topics

Ben Firshman: 本期讨论了构建吸引软件开发者的产品和公司的技巧。他分享了在Docker的经验教训，包括自下而上构建开发者业务的重要性，以及过早地尝试向企业销售产品的风险。他还介绍了Replicate，一个蓬勃发展的开发者社区，允许开发者托管和微调自己的模型以支持AI应用程序。他认为，与大型语言模型相比，多媒体模型具有更大的应用潜力，因为它们能够创造出以前无法实现的产品。他强调了API设计、快速运行速度和易用性在开发者工具中的重要性，并分享了Replicate的成功经验，包括如何利用社区的力量和开放源码项目。他还讨论了GPU短缺问题以及Replicate如何应对这一挑战。最后，他分享了构建AI应用的最佳实践，包括探索新的应用场景和避免过度依赖现有产品。 Matt Bornstein: Matt Bornstein 则从投资者的角度分享了他对AI创业公司的观察和经验。他指出，基于多媒体模型的AI应用比基于大型语言模型的应用更加多样化。他认为，现在是构建AI应用的好时机，因为基础模型已经足够稳定，并且对模型的理解也日益深入。他还强调了AI公司增长的不稳定性，并建议创始人保持冷静，避免过度反应。他认为，AI只是另一种形式的软件，许多软件开发领域的经验可以应用于机器学习领域。他建议AI创业公司应该关注如何持续增长，并在主要版本发布之间保持增长势头。

Deep Dive

Key Insights

Why did Ben Firshman focus on building tools for developers at Replicate?

Ben was inspired by the challenges faced by machine learning researchers, particularly the difficulty of turning academic papers into running software. He saw an opportunity to create tools that could bridge the gap between research and production, similar to how Docker simplified software deployment.

What are the key differences between multimedia AI models and language models in terms of application diversity?

Multimedia models like Stable Diffusion allow for a wide variety of creative applications, from image generation to video editing, which were previously impossible. Language models, on the other hand, are more limited in their applications, often resulting in similar-looking chat or code-based tools.

How has the GPU crunch impacted Replicate's operations?

Initially, Replicate could easily access GPUs, but as demand surged, they had to purchase large blocks of GPUs to ensure availability. They now offer a mix of high-end GPUs like A100s and H100s for training, along with more cost-effective options like L40s and T4s for inference.

What lessons did Ben learn from his experience at Docker that influenced Replicate's strategy?

Ben learned that building a bottoms-up developer business requires starting with individual developers, then scaling to teams, and eventually targeting enterprises. Docker's early focus on enterprise sales alienated the developer community, which was the core user base.

What are some common mistakes developers make when building AI applications?

Developers often underestimate the complexity of turning prototypes into real products. AI systems require significant duct tape and heuristics to function reliably in the real world, which can be time-consuming and challenging.

How does Replicate handle the diversity of AI models on its platform?

Replicate hosts over 20,000 models, with many coming from fine-tuning existing models for specific styles or objects. Users also pipeline models together to create unique combinations, such as combining language models with image generators for multimedia applications.

What advice does Matt Bornstein have for founders entering the AI space?

Matt advises founders not to overreact to market fluctuations, as AI companies often experience periods of rapid growth followed by slower months. Staying the course and focusing on long-term vision is key to success in this dynamic market.

What role does open source play in Replicate's ecosystem?

Open source is central to Replicate's multimedia models, with the community heavily contributing to model development and sharing. For language models, proprietary models like GPT still dominate, though open-source alternatives like LLaMA are gaining traction.

How does Replicate balance ease of use with developer flexibility?

Replicate offers high-level APIs for quick integration but also provides open-source tools like Cog, allowing developers to customize models and deploy them on their own infrastructure if needed. This balance ensures developers can start easily but still have the flexibility to scale.

What trends does Ben see in the future of AI development tools?

Ben predicts that AI will become more integrated into the software development stack, with higher-order systems emerging from combinations of lower-level components. These systems will combine language models, image models, and traditional software to create new, more powerful applications.

Chapters

This chapter discusses the experience of building Docker and the lessons learned from it. The main takeaway is that for a bottoms-up developer business, it's crucial to start by building for and selling to developers directly, gradually expanding to larger teams and enterprises over time.

Docker's initial focus on enterprise sales proved ineffective.
A bottoms-up approach, starting with developers, is more sustainable.
Growth should be gradual, expanding to larger clients over time.

Shownotes Transcript

Translations:

中文

I think Docker built this incredible bottoms-up developer motion, but they jumped too fast to trying to sell to enterprise. So almost from day one, they built this enterprise product that they sold top-down to very large companies. And the people inside those companies just didn't know what Docker was. The people who knew what Docker was and was getting the value from it were these people on the ground who were using it in their day-to-day work. So I think the lesson there is if you're building a bottoms-up developer business, build it bottoms-up.

step by step. You know, make something for developers, sell to developers, then maybe sell something that's useful for their team and then work your way up. And then maybe in five years you can sell something to the CTO of Walmart or whatever, but you're not going to be able to do it from day one. You're listening to the A16Z AI podcast. I'm Derek Harris. I'm joined on this episode by A16Z partner Matt Bornstein and Replicate co-founder and CEO Ben Fershman to discuss the nexus of developer ecosystems and generative AI.

Ben previously led open source product development at Docker and created Docker Compose. So he has a well-honed sense of what developers want and how to build tools that deliver on those needs. Throughout this discussion, Ben explores some of that history while digging deep into how developers are using generative models today and how a community approach like Replicates allows more people than ever to access, deploy,

build upon, and even release their own fine-tuned models. Matt also shares some of the things he's seen and lessons he's learned after a solid two years of being neck deep in AI startups. What works, what doesn't, and how to ride the wave of popularity and the subsequent trough that comes with each new model release. If you're working on developer tools in the age of generative AI, you should come away from this discussion with more than a few pointers. So without further ado, here's my talk with Ben and Matt.

As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com slash disclosures.

So Ben, after traveling the world on a bike and in a van for a few years, you started Replicate in 2019. What did you see a few years before generative AI really took off that led you to the conclusion that Replicate was going to be your next company and this was a field you wanted to chase? It actually started with science, with academic infrastructure. So I got really interested in academic infrastructure just as a field. It was just sort of a field that...

operated like it still did 100 years ago. It just happened to be on the internet. I came from the open source world and looked at open source and just all of this collaboration, incredibly fast moving world that is open source and looked at science and just like, why doesn't that work more like that?

That's what got me collaborating with my now co-founder, Andreas, because he was a machine learning researcher who was still called machine learning back then. This was all pre-AI hype. And a lot of his job was implementing papers, because back then machine learning was primarily published as research.

PDF academic papers on the archive, which is this repository of academic papers. A lot of his job was taking these academic papers and trying to turn them into running pieces of software. The tragedy of this is at some point, somebody in this research lab who produced this paper produced a running piece of software. They compressed it down to prose and diagrams of math in a PDF. Andreas' job was trying to uncompress it.

back into running software. And it often was just impossible. And this is what kind of then got me on the track of like, oh, actually machine learning is this really interesting subset of science that is really fast moving. It's software, so it connects back to my software world. And that's what got us building tools for machine learning researchers. MARK BLYTH: How did your earlier software experiences actually help inform that decision? Because you built Docker Compose, which obviously was no small feat in the world of software engineering.

Did the other stuff play into it in the sense of, you know, just general interest in terms of, okay, I have the skill set that I probably could apply to this? I think at its heart, I like building things. And I particularly enjoy building tools for people. Building tools for developers comes very naturally because I am a software developer, but I just generally like building tools for people. And I think particularly my experience at Docker informed this startup because I kind of connected this...

thing that Andreas was doing. So Andreas used to work at Spotify. I was connecting this thing that he did at Spotify of like taking these machine learning models and trying to get them running in production and connected that back to what we did at Docker. Because at Docker, we kind of sold this problem for normal software by telling software developers, hey, put your work inside this metaphorical shipping container. And then you can know that other software developers are able to run

this piece of software. You can ship it to test environments. You can ship it to all sorts of different clouds. And we just took that analogy for machine learning. We were like, what if these researchers put their work inside this box? Well, other researchers can run it. And software developers can then run it to deploy it into production and things like that. It was that line of thinking that led to what Replicate was.

What was the delta, I guess, between doing that for software overall and then doing it for these AI models to make that jump into AI specifically? One of the main things is they have to be hooked up to GPUs, unlike normal software. So that is now difficult to procure. It was less difficult to procure when we started. But for a start, that's difficult hardware to procure. There's also just lots of plumbing to hook that up, like lots of really complicated CUDA things that I think is the bane of the life of machine learning researchers, CUDA errors. But it also just behaves

quite differently to normal software serving as well. Like specifically, they're quite often longer running. They quite often need batching systems so you can run multiple requests all at once on a GPU.

You often need queue-based systems rather than, they're quite often like batch workloads instead of these kind of round robin serving systems that are typically used in web systems. They're also a much more kind of specific and narrow type of software as well. So when you're packaging up something in a Docker container, it's really just like arbitrary software running on a computer. Whereas a machine learning model is really just a single function call.

So it's quite often just like a single pass through this model with some arguments and a return value. One more Docker question, and then we can move into Interreplicate. But I'm curious, because you joined Docker during this period of hyper growth, and I

I wonder what you learned during that experience or what that experience was like, especially in terms of productizing developer tools and open source products and then also about running an open source business. Yeah, for sure. I learned an awful lot from that. And it was just quite a wild ride, honestly. I think I joined as employee 20-something and left when it was almost 300 people. They obviously struggled with the business, but they just built this thing that's now a core part of what we use now to build server applications.

And the thing they did there was just build a really great tool for developers. And I think that's something we've really taken to heart in how we've built Replicate. And also just the power of community. They had Docker Hub, which is where all of these pieces of software were shared, which is very akin to Replicate. But Docker itself as a piece of technology was just inherently about sharing pieces of software with other people. We've taken that as well in what we're building at the core. We've got this core open source project called Cog, which is-- we call it containers for machine learning. We've been very inspired by what Docker did there and trying to apply that to machine learning.

There's lots of hard lessons to learn from Docker as well. I think the core thing we should really take into heart is I think Docker built this incredible bottoms-up developer motion, but they jumped too fast to trying to sell to enterprise. So almost from day one, they built this enterprise product that they sold top-down to very large companies. And the people inside those companies just didn't know what Docker was. The people who knew what Docker was and was getting the value from it were these people on the ground who were using it in their day-to-day work.

So I think the lesson there is if you're building a bottoms-up developer business, build it bottoms-up step by step. Make something for developers, sell to developers, then maybe sell something that's useful for their team, and then work your way up. And then maybe in five years, you can sell something to the CTO of Walmart or whatever, but you're not going to be able to do it from day one.

Taste and aesthetics have become a really big deal for developers now. I think in the old days, like nobody cared or nobody thought about it. It's like developers weren't real people. They should just, you know, use whatever clunky tools they had. That's clearly changed. And Docker was part of that change. And I think you were part of that change. And you were in a really interesting position where you weren't actually an employee of Docker, right? Correct me if I'm wrong. You were just

Some guy who wrote this tool that turned out to be the best way to sort of spin up docker containers locally How did you come up with that idea? And where do you think taste comes from for developers? I'd be super interested to know just like where that idea came from I think at the very heart it comes from the fact that developers pick the tools they want and developers will be more likely to pick high quality tools like that's that's the heart of it and

And I was actually heavily inspired by Heroku early on. So my first startup was a Heroku clone for Python back when Heroku was Heroku for just Ruby. And that startup didn't go anywhere because unfortunately Heroku became Heroku for Python and their product was better than us. But that's really where it all started. You know, I was used to the days where you had to log into servers and

and FTP code around or SEP code around to deploy it and manually install Apache and this kind of thing. And just being able to get push was just, this is fantastic and just saves so much time.

And actually on the taste thing, like thinking about it, I think another really interesting area where taste comes from is I think it was around this time as well where-- it was around the time when Heroku first came out that a lot of developers were switching to Macs as well. Like traditionally, developers were building on Windows. They were building on Linux. And Mac OS X was based on BSD. And I remember when that came out just being like, holy crap, this is amazing. I was developing on Linux at the time.

And it's like, oh, this is like Linux, but not a complete pain in the ass. And so I, like pretty much every other developer in the world, switched over to Macs if they had the choice. And Macs are obviously just incredibly well built. Apple has just like great taste for what high quality tools are. And I think that influenced a lot of that world as well. I know Ruby and Heroku and GitHub and all these kind of people were very influenced by just Apple's high design. And I think that...

culture permeated developer tools. Stripe was next, then Vercel and us as well. But anyway, the story of where Docker Compose came from is we built this Heroku clone. I went off to do another startup

I realized that deployment was still a complete pain because that startup particularly couldn't be deployed on Heroku because we had to route to particular nodes for live document editing, and you couldn't do that on Heroku. I was like, oh, I wish there was a better-- like somewhere between EC2 servers and Heroku, like somewhere in between that just gives me the right abstractions. I kind of want processes. I want load balances. I want networks. I want volumes, all this kind of thing. But I don't want to manually install stuff on servers and auto scaling and all this kind of stuff.

So we then started building up that platform, like a more advanced Heroku. And then Docker came along and we were like, oh, damn it. Like Docker is the thing. That's exactly what we wanted to build. They're operating at the process level. What's kind of stuff? Very light. We called them very lightweight virtual machines. And they, you know, then latched onto the container and stuff.

So we just pivoted into building tools around Docker. And at the heart of it was we had this YAML file, which was the thing you deployed an application onto our Heroku PaaS with. And we just translated that YAML file into deploying on top of Docker instead. And that became... When we first created it, it was called Fig. It then got really into Becompose, but that then was the Docker Compose file. So it was that configuration file for that Heroku competitor that wanted to exist. And so you've sort of done this...

twice now. I mean, if you think of Docker Compose, where you didn't invent Docker, but you created the tool that everyone used to orchestrate Docker containers in a development context early on. And now with Replicate, I would argue, you haven't trained Stable Diffusion or Lama yourself, but you've created the tool that arguably is the easiest to use and that developers often gravitate to first.

Can you generalize it all across these experiences of what makes really great developer tools, how you develop taste, how you develop aesthetics, that sort of thing? Yeah, I haven't got a really well thought out recipe for this, but three things come to mind. I think one thing is it just needs to be at the heart of a developer tool is an API, and that API needs to be really well designed. When you're using Docker, it doesn't feel like you're using an API, but really there's a data model behind it.

And you as a developer are kind of interacting with data models to kind of mold it into the thing that fits into your system. And you need to fully understand that data model. You need to design really good primitive

primitives, how those primitives interact with each other, and expose that really clearly through the products. I think that's the heart of it. It's all about API design. It needs to be really fast. I think that's something Docker got incredibly well. And the thing that was mind-blowing about Docker was not that you could create these isolated environments like we had virtual machines. The mind-blowing thing about Docker is that it could boot this thing up in 100 milliseconds. And thirdly, I think just make it incredibly easy to get started, incredibly easy to integrate.

Part of that is just making a really simple product, but part of that is just explaining it really well. I'm picking the right primitives, but part of it is just explaining it really well and making sure that if, as a developer, I can't get some value out of something in 30 minutes, then I'm just going to give up on it. Ideally, something should be working in five minutes.

And this is the magic of Docker, in that you just run one command. I've got a blank Linux machine. This is the magic of Heroku's just git push Heroku on my Ruby on Rails app, and it just works. And this is the magic of Stripe. You just copy and paste this line of code, and you're making payments. And this is the magic of replicate as well, I think, because you can just copy and paste this line of code, and you're doing AI in a few minutes. And I think that's really key to good developer tools.

How did that look at the beginning when you started to build Replicate in the sense that one, there weren't a lot of like open source or open models out there to even host or to deploy. So like,

What was that early work to say, hey, listen, this is what I know the product should look like, but here's actually the state of these AI models at whatever five years ago now? MARK BLYTH: And when we were getting started, it was relatively simple. There was lots of really interesting work going on, actually. So everyone was obsessed by ImageNess and identifying things in images and image segmentation and things like that, which is very advanced stuff. Just a few years before that, it was impossible to identify what was in an image.

There was plenty of other stuff going on there that was really exciting. It was obviously useful to build products out of, but it was just really hard to run these models and use these models. It was the same problem, I think. It was just on a smaller scale, I suppose. The thing that really got things going for Replicate is we noticed all those kind of models were still...

I'm thinking of image segmentation, I'm thinking of embedding models and all those sorts of things. They were all kind of academic pursuits. But something we noticed that was really interesting was happening in, God, when was this? Early 2022, I think. People were building some early text-to-image models. Big sleep.

which was taking Clip, which was a text embedding model produced by OpenAI, and combining that with a GAN, which generated images, and smushing these things together in a very kind of like naive way to turn text prompts into images. And they were not very good at all, but they had sort of interesting aesthetic outputs. And if anything, it was sort of that caught the imagination of like a sort of technical artist crowd or, you know, creative coders and things like that.

What was really interesting about this is that it wasn't academics working on it. It was just a bunch of people in discords and on Twitter who were sharing these things with Colab, which is a Google-hosted notebook service.

And what was really neat about it is they were sharing them on Collabs, posting them in Discord, and somebody being like, oh, that's cool, but I've got this other idea for how to improve that. And they were pressing like file, save as copy, because it worked like Google Docs, you know? File, save as copy. And then they got a new Google Doc Collab notebook and then tinkered on it and made a variation on it. And it just created this community of people just like,

Forking, editing, forking, editing, sharing things. That stuff was happening really quickly. Instead of things being published every six months, people were just doing things hour by hour. I looked at this and I was like, "Oh, this looks like open source software." It was still very early, but you could tell the rate of change was incredible. Then we got things like DALI Mini, which I think was a very visible thing that lots of people saw. That eventually led to Stable Diffusion. Stable Diffusion came out of that community.

But we just saw this community super early on the Discord, and just starting building around that. And that's really how the whole thing started. MARK MANDEL: So my Twitter profile picture to this day is a generation from one of the early GAN-based models on Replicate. MARK MIRCHANDANI: Pixray, I think. MARK MANDEL: Yeah, Pixray. That's right. That's right. That's right. It was like a pixel art generative image model. And I was just so amazed that it worked. And like, it's great. I haven't changed it ever since.

MARK BLYTH: So if your early users were like this Discord community and building for that crew, what does the typical Replicate user base or the typical Replicate user look like today? MARK BLYTH: We've got on the order of-- we've got more than 3 million users. We've got about 20,000 models on the platform. We've got on the order of hundreds of thousands of paying customers. So we've got kind of two sides of Replicate. There are people who are making models and publishing them to our community and people who are building products out of these models.

And the vast majority of people are building stuff, building projects and features and products and things like that. It's a mix of all modalities, really. I think we're really seeing the sweet spot is people building multimedia applications with image, video, audio, 3D things, and smashing all of those together in pipelines, often combining that with language models together as a sort of glue. We're also seeing some people build pure language model products as well. But typically, we see people use the big

for privacy-based models for that kind of stuff. And something that works really well on Replicate as well is that we make it really easy to customize those models as well. So more often than not, people building these multimedia applications need to fine tune the models somehow. They need to tinker on the code. They need to pipeline models together. And more often than not, you need to tinker on the code and customize stuff to be able to do that. And that suits the platform really well. When Stable Diffusion got started, I think it was like people building a lot of consumer applications

you know, just being able to run these models was magical enough to start with, being able to conjure up an image out of nothing. And that's still a large part of what we do. That led to things like image editing software, where you can kind of add things to your image or fill in things in your image using these models. Next big thing was AI avatar apps, where you take a picture of your face and it can generate pictures of you. That's still a huge use case. And then we just see lots of people building really interesting consumer apps like chat apps.

building stuff in sales and marketing where you're automatically generating ads, you're automatically generating sales material. We're seeing people who like generate talking head avatars, which is like this whole pipeline of video, audio, language models. We see people generate sales, like games content, things people are doing inside businesses where they're trying to like annotate content to like turn unstructured data into structured data. We're seeing large companies build marketing apps. We've got advertising agencies using us to generate content. It's just like a whole ecosystem

You can just see this stuff being adopted all over the place for all sorts of different use cases. MARK BLYTH: So 20,000 models-- if I were just following the headlines of AI, I might be able to name, I don't know, 10, maybe a dozen models.

If I go into replicants or perusing the community of models, what am I going to see there? How is the community tweaking these things to create these distinct models? Most of those are coming from fine-tuning things, particularly fine-tuning image models, which just works unreasonably well compared to language models. So you can stuff like 10 images into one of these models of either an object or a style or something like that.

And then you guess another model which can just flawlessly output that object in images or output things in that particular style. So you might be, you know, people are fine-tuned on the style of GTA so you can make images that are, you know, in GTA and then you can put your face in it so you're in GTA or whatever, you know, or in a particular style. And like our customers use this for like, obviously these avatar generators, but people are generally...

making things that are in the style of their game or a particular style that makes sense to generate in their application. But we also see lots of people who are pipelining those models together to create interesting combinations of models. So you might want to use a language model to generate a better prompt that you then feed into an image generator that you then want to--

apply some kind of corrections on and then you want to upscale this and that's that's like some pipeline that you would create in these sort of you know in these systems that generates videos that these big pipelines of language models video models audio models all this kind of stuff there's just a huge variety of all sorts of different models that people are creating there's like the big like large language models there's the big image generation models whatever that get all the

all of the headlines these days. But people are still creating lots of useful smaller models as well still in the academic community on replicates. And quite often, they do a better job of these huge models at specific use cases for much, much cheaper. And there's a lot of those kind of things on replicates.

Really, really what these things on Replicate are is just arbitrary software hooked up to GPUs. So you can make some new interesting piece of software that happens to be hooked up to-- or not hooked up to a GPU. You can actually just deploy plain code onto Replicate on a CPU as well. People are making these things and publishing them to Replicate. MARK BLYTH: One of the bigger developments over the past year or so seems to be the influx of open source models. There are any number of open source language models now. There's Flux and these other large scale image models.

We see a shift in how people are thinking about or utilizing open models versus proprietary or closed source models. Depends a lot on the modality, I think. So for multimedia models that actually started with open source. There's never really been any great proprietary image models, for example. That's always just been an open source. You mean for developer, you know, because mid journey, I guess, on the consumer side, but you mean for developer.

For developing products, yeah, exactly. And that community has really been entirely open source. And for language models, like open source came later to some extent. It really started with GPT. And then Lama was really when open source really started getting going for these large language models. These proprietary models actually work really well for large language models. And it's much easier to prompt large language models than it is with these...

with these image models and things. There's definitely lots of use cases for these open source language models, but I think we're finding that there's not been a huge shift from like proprietary models over to open source models in language. But for the multimedia models, that's really the core of open source is really the core of what it is. Like you mentioned all these different modalities too. Have you seen a spike in that as more models have come online or as more people doing more stuff? And I'm curious also how like

usage changes, right? When something like flux comes out, how does that impact, say, stable diffusion usage? Or what do you see from the inside? It's grown pretty steadily over time, I think I'd say, as the models have got better and better, as the models have got faster.

As people have figured out what they can do with these things, I think there's still so much unexplored parts of the map of just what you can do with these models that's slowly being chipped away at. But new models change things all the time. When a new model comes out, they're

It's so much better at particular things, but it also makes new things possible as well. I think Flux has certainly been one of those moments in image models. I think we just see tons of people using Flux right now. It's just so much better than anything we've seen before. Yeah, lots of people switching over from Stable Diffusion to Flux at the moment, but I know Stability have got a new model coming up soon as well. So maybe it'll kind of flip back and forth, but we'll have to... It's always just like this...

this race, you know? What I can say very consistently is that almost all AI startups are highly responsive to new model launches right now. So this is true for infrastructure hosting companies like Replicate.

This is true for model development companies like OpenAI and Anthropic and so forth. This is also true for application level companies like UDO, which is one of our portfolio companies, or Ideogram or Midjourney and companies like this. You can actually track this very closely. Whenever a new model comes out, there's a huge spike in usage because people just want to try the exciting new thing. And the difference between V1 and V2 of any given model

is massive these days. If you think back to the old days, you would upgrade to Photoshop version 8 because it was the new thing, and V7 was kind of the same, but it was just the old thing. Now there's actual major improvement from one model to the next, and these are capabilities computers have never had before. So there's a huge surge of excitement no matter who you are when a new model comes out.

The trick is, how do you sustain and ideally continue to grow between major releases? This is something where it's great for Replicate to be able to keep adding new models continually to their platform. They're not tied to just one model development cycle. But it is a dynamic of the industry that everybody confronts. And I think the key is really to--

you know, for all the AI founders out there is to kind of embrace it and just know what the shape of the market is and know what user demand looks like and kind of build around that. Do you think there's a disconnect to some degree in terms of how fast new models come out and the ability of users to like keep up?

And learn and really wring all of the interesting uses out of that before the next thing comes and their attention is distracted. Just to your point, like it used to be a release was a release and it might have some incremental differences. You basically knew it and now it's like a new thing. Oh, 100%. There's still so many unexplored parts of the map. If anything, I think the capabilities are increasing faster than we can catch up and build things with them. I think this was...

This was actually quite well achieved by Nat Friedman, who works on co-pilot at GitHub. And he said they built-- when "GPT-3" came out about that time, I think,

They built Copilot from that, and they were just astounded by the capability of these things and the products they could build with it. And obviously Copilot has become a core part of how developers work now. And he's like, well, obviously there's going to be hundreds, thousands of other people using this model to build incredible products. And then they waited like a year or two, and then nothing happened. And this was the premise of him starting the AI Grants Accelerator, which we actually went through a bit later.

And the premise of that is we need more product builders. We have all of these capabilities, and we need more product builders to build around this stuff and get excited about this stuff to be able to use it and put it in the hands of users. And that's still happening. If anything, product builds are piling in, but the capabilities are improving even faster.

Product builders, please build with AI because there's so much gold just lying on the ground that can just be picked up if you build a good product with this stuff. A lot of our role as replicators as well is helping these software developers use AI as well because they're not going to be able to retool into machine learning fast enough. So we want to take machine learning and bring it to them and show them how to use it and show all these interesting things they can build with it. And yeah, hopefully lots of cool stuff will get made.

I've noticed there's a lot of really diverse multimedia AI apps out there. Meaning when you give someone an amazing primitive, like a Flux API call or a Stable Diffusion API call and replicate, there's so many things they can do with it. And we actually see that happening. Versus with language, like you said,

All LLM apps look kind of the same if you squint a little bit. It's like you chat with something. And there's obviously-- there's code, there's language. There's a few different things. But I've been surprised, too, that even today we don't see as many apps built on language models as we do based on, say, image models. Do you think that's true, or am I just kind of seeing a narrow slice of the world?

I think we also see an arrow slice of the world because a lot of people using Replicator building these kind of multimedia applications. But it certainly maps with what we're seeing as well. I think these language models are beyond just chat apps. I think the thing they're particularly good at is just

turning unstructured information into structured information, which is actually kind of magical. And that's like-- computers haven't been very good at that before. And that is really a kind of core use case for it. But with these image models and video models and things like that, people are creating whole new-- lots of new products that were just not possible before and these things that were just impossible for computers to do.

Yeah, I'm certainly more excited by all the magical things these multimedia models can make. So I want to switch gears a little bit. I would love to get a sense of the GPU crunch problem as like how you experienced it. I mean, does it vary? I just love to get a sense of how you think about that because it does seem like that is not necessarily a product building or developer centric frame of mind to begin with, but it definitely is a key part of running an AI business today. When we first got started, there wasn't a GPU shortage.

We could just get a hold of-- we're an inference business, so our usage goes up and down. We just ran the whole thing on Spot Instances. And you could get a hold of A100 Spot Instances no problem, because nobody else wanted them. And then obviously they became harder to get a hold of. There was a big crunch last year, particularly.

And we then had to start kind of buying big blocks of GPUs, because that's really the only way to get hold of them. If you go to AWS and ask for one H100, they'll just be laughing you out of the room. But if you go to them and say, I would like 500 H100s for three years, they'll be like, of course, here you go. And that's how they sell their GPUs, because they just want to pack in as much usage as possible.

But not great for us, because we have lots of varying usage. It's fine for training, because you do buy these big blocks of GPUs for training. But for inference, it's up and down. So to some extent, part of our job is just making that market liquid. We buy these big blocks of GPUs, and we sell them at a higher price, just so people can get hold of one H100. A100s and H100s are great for training big models, but not necessarily the right job for running models.

So we run lots of these models on smaller GPUs, on GPU-GPUs. L40S is like the state of the art inference GPU, for example. We also run things on some sort of older hardware like T4s.

run things on A40s as well, which weirdly is like a workstation GPU for like Pixar who are doing 3D graphics on the workstations, but works very well for image models. So we're lucky in that sense. And there's never really been a shortage of those GPUs. They've been very, very easy to get hold of. But this year the market's loosened up a lot. So it's much easier to get hold of

H100, for example, I think this is quite common knowledge now that that's just much easier. Yeah, it's just up and down. MARK MIRCHANDANI: It is funny. I mean, we're talking about the most advanced, sophisticated computing capabilities that humans have ever been able-- like we're conjuring images out of thin air.

And yet, the limiting factor is still the supply chain whiplash, where it's up, it's down, it's up, it's down. It's expensive, it's cheap, it's available, it's not, which is the oldest problem in human history. If you stop and think about it, it's like, how do you feed a village? How do you provision an army?

Like, this is like thousands of years old, and it's something we're still dealing with, and with all our new fancy AI stuff. MARK MANDEL: Yeah, I'm curious. When the serverless movement comes for GPU access, right? MARK MIRCHANDANI: Well, replicate is sort of serverless in a sense, right? I mean, from a developer standpoint, they don't even have to scale down to zero. They just either make an API call or don't. And when they do, they get a result back.

And we do actually have a serverless product as well. So if you're deploying custom models, we do effectively let you scale to zero on your own custom code. So yeah, we've done that basically. Do you think developers want more serverless kind of flexibility over time that you've observed? Or do they want more control and more lower level access? Because there are many more developers coming to this market now, and they're getting smarter and more sophisticated. So I'm super curious what

abstraction they really want and how you've seen that change. I think this goes right back to the start of my career when I was building this Heroku clone. An EC2 server is too low level. Like, developers-- unless you're doing something very esoteric, like, developers don't want a blank EC2 server. But they also don't want to feel like they're being constrained. They don't want to feel like they're using a toy. They always want to know that they can open up the box, and they can dig down, and they can do more complex stuff.

I think this is like a design philosophy that we've taken very much to heart with Replicate, where we have these very high-level APIs that you can get started with very quickly. So you can copy and paste this API and use AI and just get started in a minute. But if you want to be able to customize this stuff, those models are open source. You can take that source code and publish your own customized versions to Replicate.

where you just have complete control over what that source code is doing. And you can upload custom weights and all this kind of thing. We've tried to find just that sweet spot where there's enough flexibility that you can do 95% of what you want to do, but with none of the pain of building this infrastructure out from scratch. And I think that's really important. And then if you want to do really complicated stuff, there's always an escape hatch. So

On Replicate, the technology powering Replicate called COG, this containerization system for machine learning models, is open source as well. You can just take this and deploy this on your own Kubernetes cluster. If you want to do this thing from scratch for whatever reason, you can do that. And I think that's a really important part of building a developer tool like this, is at no point do you feel like I'm being constrained. At no point I feel like I'm being locked in.

Because those are very real reasons why people won't pick technologies and why people move off as well. This is actually named after Heroku. Sometimes there's this problem called the graduation problem where people are like,

hit a wall with the products and they have to build it themselves because they're doing something too complicated. And it's sometimes called the Heroku problem. And we've intentionally been trying to avoid this. And we very rarely suffer from the Heroku problem because we've made the platform flexible enough that people can scale and grow. And it doesn't get too expensive for whatever reason, you know, the pricing scales non-linearly and all this kind of thing. We thought really carefully about this. We see very few people move off. Like how far are we from

or generative AI in particular becoming, you know, quote unquote boring tech from a system standpoint in the sense that people understand it. There's a market full of products and tools and just like, it's just another part of the software development stack or the software development cycle. I think we're already there basically. Yeah.

I think there's a lot more work to do to make it more accessible. But for using AI, you can just run GPT-4 with an API. You can run Flux with an API. It's all there and ready to use. I think the key bit is currently it's quite low-level components, these people that you're using. And more and more things that developers will be using will be higher order systems built out of combinations of these lower level components. This is a thing we see in software. It's just like--

We started with assembly and then we built compilers and we had high level programming languages. We started with TCP/IP and then we built Apache and we built JavaScripts and React and Next.js. We just build these higher order abstraction such that you get more and more capabilities and you can do more and more things with these systems. I think a similar thing is going to happen in AI as well. I think it's going to be built out of

combinations of these systems. It's going to be prompting language models in interesting ways. It's going to be plugging language models into an image model in an interesting way. It's going to be combining pipelines of these models together to build high-order systems. It's going to be combining it with normal software as well. I think something that people underestimate a lot is that you can't just use a model out of the box

90% of the work is like a bunch of duct tape and heuristics to like get it into a usable product. And we imagine those being published as these systems as well. Yeah, I think we're just going to see a lot more of that. And I think that's something that we see already on Replicate, you know, people publishing like pipelines and models and things like that. And it's something we're just going to see more of, I think. Knowing what we've just discussed,

MARK MANDEL: Are there best practices for developers if I'm trying to get started writing an app using an AI model, using an API? Are there just kind of things that everyone should know how to do off the bat or mistakes to avoid that you kind of see repeatedly happening? MARK MIRCHANDANI: I think my biggest bit of advice is I think people don't know most of those things yet. And I think one of the most exciting things I think about being a developer building on AI right now

is that there's just so much unexplored green space. The way to build great AI features, the way to build great AI products right now is not to copy what somebody else does. It's for you to just tinker about and see if you can find something new that applies to your

product. Please don't build another chatbot. There's plenty of them. There's some new interesting thing that applies to your products or your problem space that you can make possible with AI now. So just tinker with it, experiment and just like don't get too attached to things. Try like 50 different things and see what works and what doesn't. I think that's the way to

I find the really interesting things right now because 90% of what is possible just hasn't been discovered yet. And then I think what I would, this is something we kind of touched on as well, is something is that it's really easy to build prototypes. It's very difficult to build real products with these AI systems because

They're so unpredictable compared to normal computer systems. Just be prepared that once you've tried these 50 different prototypes and found this one neat thing that works really well, be prepared that you've only done 10% of the work by that point. There's the 90% of duct tape and heuristics and prompt engineering to get it to behave well with the mess that is the real world. But once you've got through that gauntlet, then you'll have something really interesting on your hands.

I'm now picturing a roll of duct tape that just says AI everywhere. So wherever you see a problem in the real world, you can just tear off a piece of your AI duct tape and tape up the problem. And then finally, Ben and Matt, I'd love to get your insight on this too. Now that we're a couple of years, at least, if not more into this era of like commercially available generative AI and foundation models, what have been some of the big lessons you've learned in terms of running an AI company?

MARK BLYTH: And then Matt, I'm curious to see what you've been seeing and how you've been seeing this mature as we came from out of nowhere a couple of years ago. And now it's just this huge market on the tip of everyone's tongue. What are we seeing or what's shaping up in terms of what's the right way to build a company in this space? MATTHEW WALKER: I think for building developer products

I think something that we say a lot at Replicate is that AI is just software. Like it's an incredibly extraordinary piece of software that is doing things that we didn't think were possible with computers before and frankly, superhuman. But it really is just a form of software. And at its heart, this machine learning model is just...

We like to say it's a forward pass on a machine learning model that you... An inference on a machine learning model that you pass params to or whatever, but it's really just a function call with some arguments that has a return value. It just happens to be this model running on a GPU inside. A lot of the same problems that apply to software also apply to machine learning. And this is certainly something that we...

we've been just pattern matching with, OK, what tools have been built for normal software that we can apply to machine learning? I think replicate is just like we kind of smushed GitHub and Heroku together, and that's really-- and Docker. And that's really where a lot of replicate came from. And you can apply-- just look at everything else that's happened in normal software and be like, hmm, does this thing need to exist in machine learning? There's some new problems in machine learning.

Like you can't review the code in machine learning. So the only way to understand the behavior of the system is to pass data through it and just see how it behaves in the real world. And that's like a new thing about machine learning. You need new tools there, but like so many of the tools are just, we can just map from normal software as well. Advice for founders getting started now. It's definitely not too late. I think there's constant fear as a founder or an aspiring founder to think, oh, I missed the boat.

all the great stuff's been invented already. And it's just obviously not the case right now. And this is a time especially where building applications is now a very real opportunity. I think a year ago, the foundation models were really being trained for the first time. And it was hard to know how to build an application because the ground was shifting underneath you. Now, I think there's enough stability that you can actually build an application. And there's enough understanding of how

these models operate and what kind of their performance characteristics are that you can integrate with them pretty deeply and go beyond just kind of basic wrapper applications. You know, some of the things Ben's talked about, for instance, about

pipelining multiple models together and sort of building this application around it. So it's definitely not too late to start now if you're thinking about getting into this world. And the second thing I would say is just not to overreact. This is the thing that I've consistently seen across our AI companies. All of our AI companies have periods of incredible growth. They grow 300% in one month.

and then some months where they don't grow as fast. And this is a characteristic of an early market, and the whole market is kind of expanding and contracting. It's like the early days of the universe. The clouds of gases are like compressing and then exploding and then compressing again. And the founders like Bennett, Replikate, and like many of the others in our portfolio that have done the best

are those that don't overreact to either of those changes, meaning stay the course in a tough month to make sure you're staying true to your vision. Don't change what you're doing if you happen to grow a million percent in one month either because that may be fleeting. And so a bunch of the companies, our portfolio, including Replicative, really weathered that storm well. And so that's sort of my piece of advice for founders who are getting into this space. ♪

With that, another episode is in the books. If you enjoyed it, or at least learned something about how this current wave of AI models is shaping the developer toolkit, please do rate the podcast and share it far and wide. Until next time, keep building.

Building Developers Tools, From Docker to Diffusion Models 41:49 Share