We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Best of the Year: Building AI Companies

2024/12/27

AI + a16z

AI Deep Dive AI Insights AI Chapters Transcript

People

Andreas Blattman, Patrick Esser, Robin Rombach

Ben Fershman

Jesse Zhang

Mohamed Neroosi

Nikhil Bhaduma

Robert Nishihara

Tony Holstock-Brown

Topics

我从Docker的经验中吸取了很多教训。核心教训是,Docker虽然建立了令人难以置信的由下往上的开发者生态,但他们过快地转向尝试向企业销售。几乎从第一天起,他们就构建了一个自上而下地销售给大型公司的企业产品。而这些公司内部的人根本不知道Docker是什么。真正了解并从中获得价值的是一线使用它的开发者。因此,我认为这里的教训是,如果你正在构建一个由下往上的开发者业务,就应该一步一个脚印地构建它。为开发者制作一些东西,卖给开发者,然后也许可以卖一些对他们的团队有用的东西,然后逐步向上发展。也许五年后,你可以向沃尔玛的首席技术官出售一些东西,但你不可能从第一天就能做到。

Deep Dive

Key Insights

What lesson did Ben Fershman, co-founder of Replicate, learn from Docker's early enterprise strategy?

Ben Fershman learned that Docker built a strong bottoms-up developer motion but made the mistake of jumping too quickly to selling enterprise products top-down. This alienated the developers who were actually using Docker, as the enterprise buyers didn’t understand its value. The lesson is to build a developer-focused business step by step, starting with individual developers, then teams, and eventually scaling to enterprise customers.

Why does Ben Fershman emphasize experimentation in building AI products?

Ben Fershman emphasizes experimentation because 90% of what is possible with AI hasn’t been discovered yet. He encourages developers to tinker, try 50 different things, and avoid copying existing solutions like chatbots. The goal is to find unique applications of AI that solve specific problems in new ways, even if it involves a lot of trial and error.

What challenges do AI systems present when transitioning from prototypes to real products?

AI systems are unpredictable compared to traditional software, making it difficult to transition from prototypes to real products. While building prototypes is easy, creating reliable products requires significant effort in prompt engineering, heuristics, and duct tape solutions to handle real-world complexities. Only 10% of the work is done after finding a promising prototype; the remaining 90% involves refining the system for practical use.

How does Tony Holstock-Brown of Inngest compare AI infrastructure to pre-AI systems?

Tony Holstock-Brown notes that much of the infrastructure needed for AI, such as orchestration and queuing, is similar to what existed 5-10 years ago. While AI-specific telemetry and tools are emerging, foundational concepts like orchestration and queuing theory remain unchanged. He suggests that those who solved infrastructure problems in the past are well-positioned to address AI infrastructure challenges.

What pricing model does Decagon use for its AI customer support agents, and why?

Decagon uses a per-conversation pricing model for its AI customer support agents. This model avoids the misaligned incentives of a per-resolution model, where the AI might prioritize quick resolutions over customer satisfaction. The per-conversation model simplifies pricing and ensures predictability for both Decagon and its customers.

What advice does Nikhil Bhaduma of Ambience give for balancing AI expertise with industry knowledge?

Nikhil Bhaduma advises AI engineer founders to deeply understand their users and industry while leveraging state-of-the-art AI models. He emphasizes focusing on high-value use cases where expensive human time is wasted or where inconsistent quality can be improved. Founders should integrate AI into workflows and invest in delivery and change management to ensure customers realize the full value of the product.

How does Socket use a hybrid approach to manage AI model costs in cybersecurity?

Socket uses a hybrid approach to manage AI model costs by first running data through smaller, less expensive models. If multiple smaller models agree that a package is risky, a more sophisticated model is used for further analysis. This approach balances cost and accuracy, ensuring efficient scanning of open-source packages without overspending on expensive models.

What does Dean DeBeer of Command Zero recommend for early-stage AI companies regarding model hosting?

Dean DeBeer advises early-stage AI companies to avoid hosting their own models or building infrastructure from scratch. Instead, they should leverage existing platforms like Azure, Amazon Bedrock, or GCP to prototype and prove use cases quickly. Focusing on solving the core problem, rather than managing infrastructure, allows for faster development and cost efficiency.

What does Mohamed Neroosi of Ideagram highlight as a key difference between research at Google and building products at a startup?

Mohamed Neroosi highlights that at Google, the focus was on research novelty and innovation, whereas at Ideagram, the focus is on building products that meet user needs. Startups require a more holistic approach, considering community, technology, and product, and must efficiently manage resources to deliver value quickly.

What does Naveen Rao of Databricks emphasize about the economic feasibility of AI technologies?

Naveen Rao emphasizes that AI technologies must be economically feasible to succeed and proliferate. He values building technologies into products because when customers pay for them, it signifies that the product solves a meaningful problem or adds significant value. This economic validation is crucial for AI to have a lasting impact.

Shownotes Transcript

Translations:

中文

Thanks for listening to the A16Z AI Podcast. I'm Derek Harris, and today we're doing a best of episode featuring advice and anecdotes about building AI companies. We've had some great founders on the podcast this year with a wide range of experiences from product leadership at Docker during its heyday to building state-of-the-art models inside Google. And during those discussions, they've shared some of their personal thoughts, journeys, and insights on building products, spotting opportunities, and so much more.

and we've compiled them for you in this episode. You'll hear from the folks behind the Flux family of image models, Ideagram, Replicate, and more after these disclosures.

As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com slash disclosures.

One of the major shifts in the AI landscape over the past year or so has been the market's maturation from experimentation with and education about gender models to a solid understanding of how they work, where they excel, and how to build around them. Along with a plethora of widely used tools and products, we've also seen an influx of systems and developer tooling experts, as opposed to AI experts and researchers, move into the space.

One of those people is Replicate co-founder and CEO Ben Fershman, who appeared with A16Z partner Matt Bornstein on a November 15th episode titled "Building Developer Tools: From Docker to Diffusion Models." Ben has years of lived experience in the world of developer tooling, including as the creator of Docker Compose and a key member of the Docker team during its hyper growth phase about a decade ago. Here's what Ben had to say about being realistic early on when building any products for developers.

There's lots of hard lessons to learn from Docker as well. I think the core thing we should really take into heart is I think Docker built this incredible bottoms-up developer motion, but they jumped too fast to trying to sell to enterprise. So almost from day one, they built this enterprise product that they sold top-down to very large companies. And the people inside those companies just didn't know what Docker was. The people who knew what Docker was and was getting the value from it were these people on the ground who were using it in their day-to-day work.

So I think the lesson there is if you're building a bottoms-up developer business, build it bottoms-up, step by step. Make something for developers, sell to developers, then maybe sell something that's useful for their team, and then work your way up. And then maybe in five years, you can sell something to the CTO of Walmart or whatever, but you're not going to be able to do it from day one. Later in his episode, Replicate's Ben Fershman discussed how, from a product perspective, AI is not too dissimilar from other types of software.

The key is to get building and be prepared to use a lot of duct tape in the early going. I think one of the most exciting things I think about being a developer building on AI right now is that there's just so much unexplored green space. The way to build great AI features, the way to build great AI products right now

is not to copy what somebody else does it's for you to just tinker about and see if you can find something new that applies to your product please don't build another chatbot there's plenty of them there's some new interesting thing that applies to your products or your problem space that you can make possible with ai now so just tinker with it experiments and just like don't get too attached to things try like 50 different things and see what works and what doesn't i think that's the way to

I find the really interesting things right now because 90% of what is possible just hasn't been discovered yet. And then I think what I would-- this is something we kind of touched on as well, is something is that it's really easy to build prototypes. It's very difficult to build real products with these AI systems because

They're so unpredictable compared to normal computer systems. Just be prepared that once you've tried these 50 different prototypes and found this one neat thing that works really well, be prepared that you've only done 10% of the work by that point. There's the 90% of duct tape and heuristics and prompt engineering to get it to behave well with the mess that is the real world. But once you've got through that gauntlet, then you'll have something really interesting on your hands.

I think for building developer products particularly, I think something that we say a lot at Replicate is that AI is just software. It's an incredibly extraordinary piece of software that is doing things that we didn't think were possible with computers before and frankly, superhuman. But it really is just a form of software. And at its heart, this machine learning model is just...

We like to say it's an inference on a machine learning model that you pass params to or whatever, but it's really just a function call with some arguments that has a return value. It just happens to be this model running on a GPU inside. A lot of the same problems that apply to software also apply to machine learning. And this is certainly something that we...

we've been just pattern matching with, OK, what tools have been built for normal software that we can apply to machine learning? I think replicate is just like, we kind of smushed GitHub and Heroku together, and that's really-- and Docker. And that's really where a lot of replicate came from. And you can apply-- just look at everything else that's happened in normal software and be like, hmm, does this thing need to exist in machine learning? There's some new problems in machine learning,

you can't review the code in machine learning. So the only way to understand the behavior of the system is to pass data through it and just see how it behaves in the real world. And that's like a new thing about machine learning and you need new tools there. But like so many of the tools are just, we can just map from normal software as well.

Also on the topic of the systems level thinking that's helping us take AI products into production, we have Ingest co-founder and CEO Tony Holstock-Brown from our June 14th episode titled Building Production Workflows for AI, which also features A16Z partner Yoko Lee. Tony also spent time at Docker during its peak, and here's his response to my question about how folks like him are helping advance the utility of AI.

So up the gamut, there's a few different things that need to be in place. You've got your PhD folks that do auto differentiation and do things like MLIR to LLVM, which is super helpful for running AI workloads at a foundational level that actually build the AI stack. And those are the NVIDIA folks that are super cool, that are doing a lot of really interesting things.

And then you get the people that actually create the infrastructure to run these things. You could literally be dockering Kubernetes clusters or running connected to varying different computers that have your NVIDIA GPUs attached. And that is really similar to infrastructure that's already existed 10 years ago. The stuff that we do, orchestration, making it super easy for you to productionize application level code that calls LLMs across multiple different containers or applications without worrying about queues and infrastructure.

also similar to stuff that you'd have to do pre-AI. And so one of the interesting things and observations is that a lot of people are trying to create AI-specific infrastructure. And while that is potentially cool, it's not dissimilar to what we were doing five or 10 years ago. And maybe there's a few consts that are added to do like AI-specific telemetry. But at the end of the day, telemetry for open AI is still open telemetry. And it hasn't necessarily changed too much.

And so foundationally orchestration is still orchestration. A lot of the queuing theory stuff still applies whether or not you're doing it for AI or you're doing raw orchestration. And so I genuinely think that a lot of the infrastructure lessons people are trying to learn with AI may have already been learned five or 10 years ago. And the people that have already solved those problems are in a good place to continue to solve those problems for AI.

Relatedly, Decagon co-founder and CEO Jesse Zhang explained on a December 18th episode titled "Can AI Agents Finally Fix Customer Service?" on which he appeared along with A16Z partner Kimberly Tan, that he's seeing enterprise buyers also looking at AI tools the same way they look at other software. Namely, the primary concern is whether it delivers value. If it does, they'll learn to dull some of the sharp edges.

I think people do care about hallucinations, but they care a lot more about the value that can be provided. And so pretty much every enterprise we work with cares about the same things, like literally the same things. It's what percentage of conversations can you resolve? How happy are my customers?

And then hallucinations might kind of be lumped into the third category, which is like, what's the accuracy? Generally, when you're evaluated, the first two matter. And let's say hypothetically, you are talking to a new enterprise and you just like completely knock it out of the park on the first two. There's going to be so much buy-in from the leadership and from just everyone in the company that like, holy crap, this will not only transform our customer base, it's

It's like the customer experience is different. Every customer now has their own personal concierge in their pocket. They can ping us anytime. We're giving them good answers. They're actually happy, any language, 24-7. So that's like one piece, and you're saving a ton of money. So there's a ton of buy-in, and there's a lot of tailwinds into getting something done. Hallucinations obviously has to be solved, but it's not really like the top thing on their mind, right? So the way you kind of address hallucinations is the things I mentioned before. Like people will test you.

There will be probably a sort of proof of concept period where you're actually running real conversations and they have agents on their team monitoring stuff and checking for accuracy. And if that's good, then generally you're in the clear. And as I mentioned before, there's a bunch of hard protections you can put against the sensitive stuff. Like, you know, you don't have to make the sensitive stuff generative. So it's a talking point for most deals where it's not a...

unimportant topic and you'll go through that process, but it's never really the focus for any of the conversations.

Jesse also weighed in on how Decagon thinks about its business model and how one key to selling a new service, like AI agents, is to figure out what they're replacing and what is their actual value and to price accordingly. Selling per seat, for example, as with traditional software, doesn't scale in customers' favor and selling per resolution, in the case of Decagon, which is selling customer support agents, can introduce misaligned incentives.

Our view on this is that in the past, software is based per seat because it's roughly scaled based on the number of people that can take advantage of the software. With most AI agents,

the value that you're providing doesn't really scale in terms of the number of people that are maintaining it. It's just the amount of work output. And this goes in line with what I was saying before, where if the ROI is very measurable, then it's very clear what level of work output you're seeing. Our view on this is, OK, proceed definitely doesn't make sense. You're probably going to be pricing based on the work output. So it's like the pricing that you want to provide has to be a model where the more work you do, the more that gets paid.

So for us, there's two obvious ways to do that. There's like you can pay per conversation or you can pay per resolution, like a conversation that the AI actually resolves. I think one fun learning for us has been that most people have opted into the per conversation model. The reason is that per resolution, the main benefit is you're paying for what the AI is doing. But then the immediate thing that happens next is what is a resolution?

First of all, no one wants to get into that because then it's like, "If someone came in and they're really upset and you sent them away, why are we paying you for that?"

Right? So that's a weird situation. And then it makes the incentives a bit odd for the AI vendor, because then it's like, OK, well, we get paid per resolution. So why don't we just resolve as many as possible and just deflect people away when there's a lot of cases where it's kind of a toss-up and the better experience would have been to escalate, and customers don't like that. So it just creates a lot more simplicity and predictability on the per conversation model.

Up next, we have a collection of clips generally focused on how to find and then execute on good opportunities for AI-powered startups. Although they vary in scope and topic, there are some common threads around the idea of seeking out opportunities in domains you know well and then working to solve tangible problems for customers without getting hung up on solving novel engineering puzzles. It's a mindset that involves thinking deeply about a number of concerns that all impact both your company and your customers.

including infrastructure, capital and operating expenses, and customer enablement. We'll start with Nikhil Bhaduma, co-founder and chief scientist of a healthcare startup called Ambiance, replying to my question about how AI engineer founders should think about balancing the desire for state-of-the-art performance with just using what works.

The discussion then continues into how Nikhil would advise people trying to start companies that use AI to tackle problems specific to particular industries, including those involving human nature. I spoke with Nikhil in our September 13th episode aptly titled Balancing AI Expertise on Industry Acumen and Vertical Applications.

I can speak to our personal experience, which is that a lot of the kinds of tasks that are important to us are at the border of what machines can versus cannot do today. As a company, we actually obsess over the frontier.

And that's a combination of both what's being published and what people are actively sort of talking about on the Internet, as well as sort of what are some of the best researchers in this field thinking about and how do we anticipate the field is going to move over the next 12 to 24 months? Because the kind of R&D investments that we make ourselves have to be complementary not only to the state of the art today, but how we expect that frontier to shift over the next 12 to 24 months.

That being said, you know, if the techniques that are available are serving you well for use cases today, you can operate at a level where you just say, look, the foundation model makers are going to continue to make the next generation better and better and better. My goal actually is just deeply understanding my users, my customers, their business, and continuing to build out the rest of the connective tissue around the product that's required to make sure that as those models get better and better and better, I've got the chassis to then deliver it.

But I think it really depends heavily on the kind of use case. I obviously have strong opinions as to what companies are going to end up becoming valuable when all is said and done. But I think that's probably the lens I would look at that question. Yes, let's dig into that. I mean, because it sounds like maybe the difference is, are you building a vertically focused application or company versus are you building a horizontal? Are you building what you might call an AI company versus a company that uses AI?

I think in some ways, building an AI company versus building a traditional enterprise SaaS company, there's a lot of parallels still, right? Which is, first, you got to pick the right set of use cases to go after. Figure out what is that burning payable need, that hair on fire need that people are going to be able to allocate budget to today. Two questions we oftentimes ask internally that helps us kind of solve that question is, where do people whose time is expensive...

end up wasting or spending a lot of their energy and attention. That tends to be a really good place to look. And the second question is, where is there inconsistent quality in work product where if you could actually fix that problem, if you can get to consistently high quality, where that actually has high economic value? If you can get to a place where you're solving something that fits one or both of those shapes, can likely be a really high value use case.

And then once you've discovered a high value use case and the next question is, all right, well, how do I make sure that the models perform well in that use case? And you're going to have a range of companies that go from, hey, the off the shelf GPT-4 models just like knock it out of the park all the way through to the models have no idea where to even begin on this problem. And that dramatically changes sort of the shape of the team that you need to build.

And then I think one cannot underestimate the importance of integrating with the right sources of data to make sure that models are able to reason over the problems that matter, making sure that you nail the design, user experience, the workflow, the change management, and actually build out a strong delivery muscle because chances are you're changing the way people work and

That means that your customers may need more support than you might realize to actually realize the full value of what you're building. And so my sense is a lot of these machine learning companies and modern AI companies actually are going to be investing more heavily in the delivery muscle. And it's probably familiar to a lot of enterprise SaaS motions, but I think this is especially important for machine learning and AI companies.

So I think all of those pieces actually do have to come together. I think there's a lot one can do in terms of being thoughtful about how you deliver new technology and the sequencing with which you deliver new technology that can make it much easier to cross the chasm, so to speak. I think one thing that's definitely true is that if you're somebody who's sort of the middle of the pack and you need to hear more evidence of something working before you're willing to adopt, being surrounded...

by others who've actually tried the technology, use it every single day and speak its praises can mean the difference between actually being willing to give it the time of day versus just saying, you know what?

I don't know if I've got time for this." One of the things we think a lot about is how do we figure out who are the right users to start off deploying with before we start going after that middle pack? Because you want to create these nuclei of successes where people start talking about what their life is like after making the change to eventually create a wave of increasingly more believers within the institution.

Inja CEO Tony Hildestock-Brown also weighed in on this topic during our June 14th episode, drawing back to his time working for a medical software company compared with his ample experience working on systems and developer level products. I think like this is a super interesting question and also very relevant to DevTools founders/infrastructure founders/people that build things for engineers. It's really common for engineers to get hung up on the problem that they are solving.

versus the problem that their users are solving. And that's a really big difference, because if you're solving this orchestration layer, you can think about how to solve it in a really nice way, like Mesos, and how to solve all of these distributed systems problems. And foundationally cool, but at the same time, your users don't actually care. They care that their problems are solved and that the APIs are really smooth and easy to use. And so working in a particular application of engineering of a product is super good, because you get to see that the engineering is basically a tool to solve problems.

And what you do as an engineer is super important in order to make it work. But ultimately, the end users don't necessarily care. Like when we were building medical record systems and treatment planners, the doctors didn't care that we had like this state of the art algorithms to connect your teeth and the roots and the CBCTs and the x-rays and all that stuff. They just cared that the treatment plans were the most accurate treatment plans in the world. And we built some really cool technology to make that happen. But that was a byproduct of solving the problem for the end user.

On May 3rd, we published an episode featuring Socket founder and CEO Farasa Bukhundi Jai and A16Z partner Joel De La Garza titled Securing the Software Supply Chain with LLMs. We discussed the application of generative AI models to cybersecurity and, specifically, the software supply chain.

Here's what Frosted is saying about how Socket adopted and then began utilizing AI models in this product, a process that began as the company was trying to address customer feedback about how they preferred to receive security alerts and culminated, at that point at least, in a diverse set of models designed to right-size API calls and keep costs down.

Think about how a smartphone app works. When you install a new update and it suddenly needs your camera or your contacts or your microphone, it doesn't just get to use those permissions. You have to approve them. It has to ask. Unfortunately, that's not how open source upgrades work. And so our idea initially was we'll just tell the developer and the security team whenever the permissions change and also whenever they're bringing a new package kind of what those permissions are. But then

We, we kind of quickly encountered like teams where they, they just said, look, like we don't have the capabilities or the sophistication to be able to actually interpret that level of alert. Like tell me if it's bad or not, and just intervene in the case that it's bad and otherwise get out of my way. We were in the process of actually building out some AI, not non LLM based that would have taken kind of a synthesis of all these different signals that we were

are looking at and determine whether something is safe or not. And then kind of right into our laps dropped like the, you know, the chat GPT API right at the right moment. And so we built a prototype using it where we just kind of, when we identify these risk factors within a file, within a package, we'll ask the LLM to kind of do train of thought reasoning and explain like, what is it doing? What are the risks of this? And make a determination and

Unfortunately, it actually was way too noisy and unusable with GPT 3.5. But then GPT-4 came out, and we actually had something we could ship to users. And so basically, that was the key moment for us, was when we got GPT-4 access. To the unreleased GPT-4, actually, we were able to get access. And then we realized, oh my goodness,

Like this thing is just popping out malicious packages, like a hundred per week, right? And so we were like, this is incredible. This thing's working. And, but now it's obviously evolved beyond that. And we're kind of looking at a whole bunch of approaches, other competitors besides OpenAI, as well as kind of our own models. And then there's also multiple,

levels. You don't want to just use everything. Shouldn't just use like the most expensive approach because you'll just go bankrupt trying to scan like every, you know, every open source package out there. There's just so much code. And so we ended up with more of a hybrid approach where we'll like put it through kind of a dumber, smaller model, and then have almost like a consensus model where like, if enough of the dumb models agree that this is risky, then we'll ask a smarter model. And there's like a lot that goes into it actually over, over time. It's quite, it's quite cool.

Another security founder, this time Command Zero co-founder and CTO Dean DeBeer shared similar insights in a July 26 episode titled Augmenting Incident Response with LLMs. In this clip, Dean goes into some depth about the model adjacent considerations that come with building vertical applications using LLMs, including hosting, cost and latency. As he points out, giving customers what they want and expects trumps your concerns about controlling every aspect of the tech stack. Don't spend time on

on hosting your own models. Don't spend time on scrounging for GPUs or building infrastructure in AWS and building out sets of APIs around the implementations of the models you hosted. Don't spend time attempting to train models early on until you truly, deeply know your data and your use cases.

And quite honestly, like how models function and the expectations around how you would build appropriate training sets for that data if you feel that you need to train. There is more than enough infrastructure available today where you can prove out your use cases and

build a phenomenal product without spending time on building infrastructure to run your models. Whether it be in Azure or Amazon Bedrock or GCP or getting direct with one of your providers out there, take advantage of that today. It's certainly a lot more cost effective. You will be able to prototype and

prove out use cases exponentially faster than spending your time and bringing in an ops guy to manage and run your model infrastructure for you. That's not what you're solving for, right? You're solving for, in our case, a security problem. So focus on the problem and not the shiny part of like, well, we run our own models. Ultimately, people out there care about results. They care about outcomes. Certainly latency. You feel like I come back to that a lot, but it exists in everything you're going to do here.

And especially when you start operating in scale, latency plays a very real part. Scaling out infrastructure has a lot of limitations. The APIs are using tokens inbound and outbound, the cost associated with that.

The nuances of the models, if you will, right? And auto models are created in both. And they oftentimes are very good for specific use cases. And they might not be appropriate for your use case, which is why we tend to use a lot of different models for our use cases, whether it be a data response in a few seconds, because it's a small amount of data and the user will never see it. Or I need to be able to take 10, 20, 30 different data sets, reduce them, or

map them and then reduce them, applying sort of a MapReduce approach to how you build out a report or a refinement approach if it's a series of interrelated questions or how we term it a facet, right? So your use cases will heavily determine the models that you're going to use. Very quickly, you'll find that you'll be spending more time on the adjacent technologies or infrastructures.

Next, we go back to Nikhil Baduma for our September 13th episode to talk about how one might go about building a founding team and a company culture that takes into account all of this technical, financial, and product complexity. In his estimation, that requires having all the technical and industry expertise you might expect, as well as an ability to navigate ambiguity and uncertainty. On that latter point, he suggests that utilizing generative AI tools internally can help keep teams small and agile enough to keep up.

If you're a founder today trying to build a company, if you believe that the most valuable companies are going to fall out of some of this, some level of vertical integration between the app layer and the model layer, this next generation of incredibly valuable companies is going to be built by founders who've spent years just obsessively becoming experts in an industry.

I would recommend that someone actually know how to map out the most valuable use cases and have a clear story for how those use cases have synergistic compounding value when you solve those problems increasingly in concert together. I think the founding team is going to have to have the right ML chops to actually build out the right live learning loops, build out the ML ops loops.

to measure, to close the gap on model quality for those use cases. And then I think as we kind of talked about, the model is actually just one part of solving the problem. You actually need to be thoughtful about the product, the design, the delivery competencies to make sure that what you build is integrated with the right sources of the enterprise data that fits into the right workflows in the right way. And you're going to have to invest heavily in the change management to make sure that customers realize the full value of what they're buying from you.

That's all actually way more important than people realize. I do think that there's something really interesting about building at this point in time. In some ways, like the world is changing more rapidly today than potentially any other time in technology history. I think most people will probably say that this is the last time we saw anything quite like this was the proliferation of the Internet.

And I think one consequence for founding teams is you're actually navigating a crazy amount of uncertainty and ambiguity. And so you think about the teams that are going to succeed in this environment, it's going to be the ones that are disciplined enough to keep track of how the world is changing and have the internal aptitude to be able to respond and to be capable of shifting strategy as new information arises and as you have to challenge some of the assumptions that have defined how you've built.

I think part of what makes this challenging is as companies scale in size, you get less and less agile. I think most companies, I think the question you're going to have to ask is how do you do more with less? Like how can you have outsized impact as a small team? Because keeping your team small may be critical to survival and critical to preserving that level of agility. All these AI productivity tools that exist that are available today, you can not only like build them to sell them, but you

you should likely also be a massive consumer of them as a company yourself and use them for company building. In fact, I think if you walk through the Ambience office today, my guess is that every single person, regardless of function, has some sort of AI productivity tool open on their computer. That could be ChatGPT, it could be Cloud, it could be Cursor, it could be something else. And my guess is that they're probably using these tools multiple times an hour.

like multiple times every 10 minutes to do things that otherwise might take them two to 10x longer to do, which is kind of exciting because I think, you know, leaning on smaller teams, it might not just be a relative competitive advantage anymore. It might be like critical to long term survival and success for a lot of these companies, given the way the world's moving.

Of course, a major theme of AI going back at least a decade is the journey that many AI researchers take from university or industry labs into the realm of entrepreneurship. It can involve a stark change of priorities and skills and isn't always easy to nail. We've had some great guests on the AI podcast this year who touched on this transition, including the people behind popular models and tools such as Flux, Ideagram, and Ray. We start with Andreas Blattman, Patrick Esser, and Robin Rombach of Black Forest Labs, the creators of Flux.

We spoke with A16Z General Partner Anjane Mitha for our August 16th episode titled "The Researcher to Founder Journey and the Power of Open Models." In this first clip, they touch on some early work around autoencoders that didn't get a great response from the research community and use it as a jumping off point to discuss the differences between novel academic research and building commercial products that just need to work.

If you look at it from very far away, you were just training an autoencoder and then train your generative model in that latent space. And it's like a very simplified view of it because it's not the entire story. And I think that might be one of the reasons why it was constantly challenged. Why would you work on this? Like,

Why would you do another latent approach again now with the diffusion model? I think we had the debate ourselves. Yeah, I was worried if we can do another one of those. That's always like, that's where you see where the limits of research are. You have to propose something novel. If it just works better and it's not like to everyone clear that it's novel, then it will be questioned in some form.

But as opposed to that, if you're building a business, you just focus on what works, right? The kind of novelty is not as important anymore. It's just like you use what works. That's why starting a business is actually also a really nice experience. Even before you guys got to starting a business, if you just think about the difference between research and product...

and just building tools that people can use outside of a paper. What may have seemed not novel to you while you were in the research community was actually extraordinarily novel to creators and developers around the world. And it wasn't really until you guys put out a few

a few years later, stable diffusion, that may have become clear to the research community. Is that right or is that the wrong framework? No, I think that's exactly right. I think there's a nice intermediate step between doing research and doing business, which is working on models that are being used in an open source context because then everybody will use your models. Right.

And we made that experience pretty early because we were just used to making our models available all the time. And then one of the first ones that got pretty popular was this VQGAN decoder, which, because it actually achieved pretty realistic image textures, it was used in combination with this text-to-image optimization procedure where people use that and clip and optimize the image to match the text prompt.

And because we had put out the model and it was used in this context by lots of people, that was one of these moments where you realize, OK, you actually have to make something that works in general. And then I think it's a nice intermediate step. Because if you want your models to be used in this wide context, then you just have to make sure that they work in a lot of edge cases.

Next up, we have Ideagram co-founder and CEO Mohamed Neroosi, who, along with A16Z partner Jennifer Lee, was on our June 7th episode titled "The Future of Image Models as Multimodal." Here is his response to my question about the decision, as an AI researcher, to leave a leading lab like Google, followed by an exchange between him and Jennifer about what he has learned since then, especially about building for a community of creative users rather than for the financial goals of a giant corporation.

I really think Google created this environment for exploratory research and it was great. Lots of amazing research came out of Google. Some of the work that we are using on diffusion models again came out of Google. But I guess the mandate of the research team at Google wasn't to build products. We tried to do it a bit, but it was not easy in that I guess we are a bunch of researchers. We haven't really shipped products before.

I don't think anybody believed that we could even do a good job. But also...

Fundamentally, I'm not sure product innovation can happen as easily inside big corporations because you have a thriving business and it's working. So I do realize some conversations back then, okay, putting text-to-image into different productivity tools that Google had at the time, like, for example, Google Slide. I don't remember exactly the context. And the question was, okay, is this a hundred million business? Now it turns out it is. Yeah.

Yeah, exactly. But it's harder to see it when it's not a $100 million business. So I think fundamentally, it's just harder to do that kind of product innovation inside a big corporation because there's a lot of low hanging fruit. You might as well spend your time improving ads by like 0.001%. Now that you have been at Ideogram,

more than a year now looking back on the Google experience what have you learned that are lessons which are still instructive and valuable for training these large models and what are the things now that you are building products you're shipping things very fast that is different and a big contrast from what was like at Google yeah I guess now we need to think about this whole pipeline much more holistically and we also need to

make our resources, whether that's compute resource or manpower more efficiently. We are a nimble team and the compute resources that we have are actually amazing, but it's not at the Google scale yet.

So that's one reality that we got to make it work. And I think that might be a good thing to have. Like, you know, lack of resources pushes you to work harder and achieve more. One more thing is the way we pitch the company is we are a vertically integrated AI company. We are AI first, but we are also product first and we are community first. So the pillars of Ideagram includes community, technology and product.

And I think when we were at Google, as researchers, there was a lot of focus on novelty and research innovation, new ways of generating image, text, new ideas. And that's great. But in the context of Ideagram, it has many more pieces in terms of, okay, what do our users want? What do our users come to Ideagram for? How can we enable that even further? And then

How can we create new foundation models that can accomplish a new ways of ideation, creativity, editing? What's the future of all kinds of creative applications? What's the future of video creation? What are kind of achievable milestones before doing general purpose video? So all these questions that we were starting to ask ourselves weren't coming up as much when we had a researcher hat on.

and i think that's really exciting especially like sitting down with our users i have this kind of weekly chat basically it's a complete organic way of interacting with users like people come come in with all kinds of random requests some need hand holding in terms of prompting or using the product and then some come with some feedback or comments and it's really nice to see product love from our users and that was

When we launched the product, we saw so much attention from our users and they are very appreciative. We also have a very generous free tier. So it's really nice to interact with our users at this level. I think it's much harder to do that at a big corporation because you usually have comms and PR and like community and et cetera. Our July 19th episode also featured Jennifer Lee, this time with AnyScale co-founder Robert Nishihara.

Robert helped create the popular Ray framework for training and running large AI models while part of the Rise Lab at UC Berkeley and launched AnyScale with the support of fellow researchers and Rise Lab leadership. Here's what he had to say when Jennifer asked about how important it was to be at a place like that when building Ray, which is for a different set of AI and ML users, and then launching AnyScale to commercially support it. It's an incredible lab.

And I don't think it would have been possible or would have been far less likely if we had been somewhere else. Because, well, when we started Ray,

Remember, we were AI people. We were trying to do AI research. And we wanted to build systems for AI, which was not something we knew a lot about. We hadn't done much system building in the past. So we had a great understanding of the requirements. We had a great understanding of what properties we needed in a system because we were the target users. We were building this for ourselves. But how to actually build production-ready systems

rock solid, mature systems, there was a lot that we had to learn. And the fact that we were in this lab, which was, you know, this incredible mixture of AI people, systems people, you know, researchers with all sorts of different backgrounds, people who had the people who had built Spark and people who had made that project successful and created Databricks. And we were the fact that we were able to learn from those people made all of this possible and provided

a lot you know there's so many lessons we learned from those people that would have been far harder to recreate without them so you know we feel i feel very lucky that we got to work with those in that space before we started the company we knew the open source project was important we knew we wanted to keep working on ray and make it really useful because there was just a huge need people needed to scale their ai workloads but we didn't it was less obvious to us whether it could make sense as a business

And we spent a lot of time thinking about that and asking ourselves, like, could it make sense as a business? Should we start a business? And the example of Databricks, which showed one data point of what that could look like, was extremely informative. And being able to work with people who have started companies before, who have started some very successful companies before, we were able to learn a ton from them.

And then we return to the Black Forest Labs team, who also shared their approach to planning ahead for future products. That might be particularly relevant in the AI space where, in their case, continuous development on a state-of-the-art image model is a natural precursor to a state-of-the-art video model. As a bonus, they were able to ship their Flex image model just four months after starting the company, instead of waiting much longer to ship a state-of-the-art video model.

I think one also shouldn't underestimate the need to think about the development plan in itself because there are, first of all, it takes different amounts of compute to train the image or video model. There's also a bit more experience with image models, which makes it a bit more safer and a bit faster to get started. And so I think that was actually also

part of the decision to do it that way is that you don't want to aim for something where you say this will only be ready in 12 months or something. I think it's super important to have continuous progress where in the intermediate steps you also get something really useful like the image model and from that I think it

It really makes a lot of sense to go that route. There's not much that you lose. You can much better overlap different developments by starting the image training relatively quickly. Then we can work in parallel on all the video data works and

Yeah, that just comes down again to the overall efficiency of us as a team and our model development strategy. So to add to that, you see that, by the way, on the fact that we started our company four months ago and we're already putting our first model out. We had to rebuild everything. But since we, as Robin mentioned a couple of times, we have this really specialized team.

which just optimized all the parts of the pipeline and combined that with the continuous development of features, image features for instance, which can then be reused for video. These combinations led to, I think, a really good progress, which you're seeing right now, because we're putting a really powerful and large model out after four months, which I'm personally very proud of.

Finally, we're going to wrap up this episode with a little inspiration. We begin with Databricks VP of AI Naveen Rao, who was working in AI since before it was the talk of the town, and who joined me and A16Z partner Matt Bornstein in one of our first two episodes on April 12th, titled Scoping the Enterprise LLM Market, which we recently republished in its entirety. Here's what Naveen had to say about his commitment to making AI happen during his tenure as a repeat founder, including at points where people viewed AI as a sideshow.

I have been in this field for a while just from a sheer interest standpoint. I spent 10 years in industry as a computer architect and software architect, and then went back to get a PhD in neuroscience for the reason of, can we actually bring intelligence to machines and do it in a way that's economically feasible? And the last part is actually very important because if something is not economically feasible,

It won't take off. It won't proliferate. It won't change the world. So I love building technologies into products because when someone pays you for something, it means something very important. They saw something that adds value to them. They are solving a problem that they care about. They're improving their business meaningfully. Something. They're willing to partner

ways with their money and give it to you for their product, that means something. I think once you've established that and you can do it over and over again, now you're starting to see real value.

Now, you're absolutely right. I've been waiting to see AI add more value and really work for a long time. I mean, you know, we had all these wild ideas in the mid-2000s around how to build intelligent machines. A lot of those are going to come back around. We come into these like kind of local minima of these paradigms, you know, back propagation, convolutional nets, transformers. They're all sort of local minima, but they're all working toward maybe, you know, some sort of greater value.

greater view of what intelligence could be. So I'm very happy and excited to see the idea of machine intelligence go mainstream. All the discussions we have, even the stupid ones when we're talking about like, you know, robots killing the world and, you know, all the kind of crazy doomer stuff. Even that, to be completely honest with me, honest with you, is like actually interesting to me from the perspective of we've now made this part of the conversation of our

of our normal social construct. It's not a weird side thing anymore. I was always part of the weird side thing for many years, but now it's not. It's something that is going to be big and is going to add a lot of value. I mean, being the sideshow is never as fun, to be honest with you. It's like that you can get passion from that because I think it actually...

When everyone's telling you you're the sideshow, you kind of have to be passionate to keep going. And I think you can use that as strength. But really, the whole point of having that passion, that strength is to make something that's meaningful and something that's lasting, something that really does change the course of human evolution. And that's what I'm here for. I'm here to like...

be a part of building the next set of technologies that really make humans be able to influence the world in greater and more profound ways. And for our very last excerpt in this episode, we turn again to Ideagram's Mohamed Arouzi from his June 7th appearance, explaining how the challenges of running a company can make you a better, stronger person.

To be clear, we didn't start the company because AI is cool. My last day was two days after ChatGPT launched. So we started the company because we felt like there's a big potential in the space. It's really exciting. We want to be at the forefront of it and build the technology and have fun doing it.

And what happened was, I guess, AI got popular too. So that was a good coincidence, but it wasn't a causal relationship. In terms of the company, it's actually kind of interesting. I feel like it's an opportunity for me to learn more about myself, develop more skills and become a better person. Because there's a lot of tension and a lot of complexity aligning different stakeholders and aligning all of these launches together.

And I think it's kind of a personal journey as well. It's challenging me and everybody on the team at a different level. But then the good news is...

you go through some challenges and suffering. And then as a result, you learn more things about yourself and hopefully create some value as well. So that's one way to look at it in that I feel like it's stepping outside of my comfort zone. And that is exciting because it'll help me and everybody else at the company to become more capable, better people. And that's it for this Best Of episode. I hope you enjoyed it.

and we'll catch you in the new year.

Best of the Year: Building AI Companies 46:15 Share