We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Public Markets, Image Gen, and Specialized Models, with Sarah and Elad

Public Markets, Image Gen, and Specialized Models, with Sarah and Elad

2025/4/3
logo of podcast No Priors: Artificial Intelligence | Technology | Startups

No Priors: Artificial Intelligence | Technology | Startups

AI Deep Dive AI Chapters Transcript
People
E
Elad Gil
S
Sarah Habibi
Topics
Sarah Habibi: 我认为图像生成技术的进步令人兴奋,它将世界转化为动漫或漫画,这是人类进步的一大步。图像生成技术正处于快速发展阶段,未来将会出现更多商业化应用。图像生成技术还有很大的发展空间,用户对图像质量和可控性的期望不断提高。当前市场波动对大多数科技初创公司影响不大,特别是那些不涉及硬件的软件公司。当前市场波动对科技行业的影响微乎其微,长期来看,科技行业将继续发展壮大。模型可以根据速度、性能和推理能力进行分类,不同类型的模型适用于不同的应用场景。我认为当前AI领域正处于一个相对稳定的时期,市场正在整合,一些赢家开始出现。 Elad Gil: 我认为当前市场波动对早期市场影响较小,高质量项目仍然能够获得充足的资金。关税在某些特定行业中可能是有益的,例如汽车行业,可以帮助保护国内产业。我们需要更广泛的产业政策来支持我们关心的行业,这需要大量的投资。大型语言模型市场正在趋于融合,但在能力和产品领域都存在差异化竞争。除了大型语言模型之外,其他类型的模型,例如物理、材料、机器人和生物模型,也存在巨大的机遇。大型语言模型是否会成为所有模型的中心,或者是否会形成碎片化的格局,目前尚不清楚。许多模型公司正在专注于高效地收集或生成数据,并利用这些数据来训练模型。在生物学、材料科学和物理学等领域,存在着巨大的机遇,但这些领域的数据收集和模型训练都非常困难。基于状态的模型是一种很有前景的方向,它在处理可压缩数据方面非常高效。我认为未来几年,在AI领域,具有深度学习、领域用户理解和产品工程能力的人才将非常有价值。AI技术栈已经相对稳定,模型层、基础设施层和应用层都将继续发展。模型上下文协议(MCP)将加速模型开发。

Deep Dive

Chapters
This chapter explores the advancements in AI image generation, from early GAN models to the current state-of-the-art diffusion models. It discusses the increasing quality and fidelity of AI-generated art, its potential impact on various industries, and the role of user feedback in shaping its development.
  • Early AI-generated art was considered 'kludgy' but still impressive.
  • Midjourney's aesthetic point of view significantly improved user experience.
  • Ease of controllability will empower users with greater creative power.

Shownotes Transcript

Translations:
中文

Hey, listeners. Welcome back to No Priors. Today, you've just got me in a lot again. It's a favorite type episode. Sarah Habibi, how you doing? I'm great. I'm so excited. Everything is adorable cartoons that are also like slightly nostalgic and sensitive. And tell me about how you react to...

Studio Ghibli and also just better image generation. I mean, I'm a longstanding anime fan. So I think converting the world into everything anime or manga is a very positive step for humanity. So I view this as something I've been waiting for for a while. I feel like every year or two, there's sort of this moment in the image and world where people have a, wow, that's amazing moment again.

And the first version of that was like, oh my God, I think maybe even the GAN wave was the first wave. There was a GAN artwork in like 2019 or so, or 2018 that went to Sotheby's for auction, which was one of the first sort of AI-generated arts back when people were doing these adversarial network-based approaches to generating artwork. And it was kind of these kludgy tool chains, but even then people were like, whoa, look at what AI can do right now. And it was super bad, you know, in comparison to what you can do today. And then there was kind of the mid-journey, you know,

early staple diffusion wave where those models came out and people are like, oh my gosh, this thing is amazing, but everybody has seven fingers in the images, but oh my God, it's amazing. And look at all the things we can do with it and it's going to transform society, et cetera, et cetera.

I feel like we've periodically had these and I feel like this is the latest version of that. And part of it is we're just on this amazing curve of quality and fidelity in this artwork and the ability to do, I mean, even back in the gang world, there was like style transfers and, you know, do this in the style of Van Gogh and et cetera. But the degree to which it does it so well now and so cohesively and in so many styles and with so much,

aesthetic beauty and oversight is really striking. And I think we're just hitting another one of those moments where people are like, wow, this can really do it for

forms of animation and other things. And all this is obviously in the context of ChatGPT and OpenAI and sort of the 4.0 models sort of incorporating a lot of this stuff directly. And so I think it's fantastic. We're going to see another thing like this in another year, I think. And then I think there'll be the very commercial versions of this, which are already sort of happening. But look, we can use it for graphic design completely seamlessly versus it kind of works. And we can use it for all these different use cases. And so

I feel like we're doing the horizontal version of it and soon we'll have the vertical versions all come out. And obviously there's companies like Recraft and others working on the vertical versions directly. But I just view this as a super interesting evolution of the technology. So I think it's super exciting. What do you think? I think it is funny how much at least like our little niche of the technology ecosystem, but like, you know, Amazon...

Anime and manga is pretty popular. The world reacts to like they want more cute. They want more beauty. I think it's really exciting. One of the interesting things this exposes is users, people overall are very good at projecting like where we are in terms of quality and controllability and how much more room we have. Right. I think like going from, you know, it's.

eight bits of grayscale to you have images that might be perceived as photos of real people was a huge jump to your point of people being shocked.

at some point in two generations ago of image generation. Then I think one of the things that Midjourney did was really have an aesthetic point of view and take a bunch of user feedback into account in terms of what was preferred. I actually feel like a lot of people thought of image generation, like end users, not researchers, as a little bit more of a solved problem. I think this is another data point of how much more we're going to get and that people want, never mind in video and everything.

Also text and logos. There's just a lot that's coming that

people haven't done are sort of these truly integrative things where you can start truly clicking into the images and modifying pieces. And there's apps that are doing that, or there's things like VIA that sort of do these real-time modifications as you're working on things. But I do think there's so much room still. We're very early, but it's still so striking. So it's a very exciting area. And I think ease of controllability is also going to give people a lot more creative power. Like one of the things that HeyGen is

Demonstrated is going to come out with, and product very recently, is the ability to use natural language to describe emotion and voice, right? So you can like whisper ASMR and just, you know, say, I want the whole video with this person in this way with three, you know, three words of text description. I think that kind of controllability is going to be really powerful. You can incorporate it into augmented devices. And then I would just be working through an MSR world. That's all I would, I would just live in that. Is that the ideal? No. Okay.

Maybe the Mongo part, but the rest not so much. Are you freaked out about the macro? You mean the NASDAQ or what? The markets? Yeah, the markets. Tariffs, inflation, which part of it? You know, consumer confidence is at a multi-year low. The NASDAQ's down 8%. Yeah.

tariffs on Chinese imports and on autos. I think there are investors and companies in market talking about how stressed they are about that. Yeah, I'm not very stressed about it. I feel like there's a degree of uncertainty in the world right now, for sure. But from the perspective of people building technology companies, borrowing something truly existential happening, it's kind of business as usual. And I've been through a few of these cycles now where markets are way up and everybody's

freaking out in a different direction and markets are way down. And the main place where it impacts the venture world or the startup world sometimes is if it soaks money out of the venture capital ecosystem and therefore valuations come down or there's less funding for the marginal startup or things like that. But other than that, these sorts of cycles tend to really wash away unless you're a super late stage company that's about to go public and there's some issue with your valuation in terms of expectations versus where you'd want to go out or something like that. But for day-to-day

technology startups, particularly ones that are not doing hardware, which would be impacted by the tariffs, right? People who are just writing software. It should really be of minimal actual day-to-day impact, especially if your startup's working, like you'll be able to get customers to pay you or find funding or whatever it may be. I've been through a few of these and every time it's been a bit of a

Of a shrug. I actually remember I went to the Rest in Peace Good Times presentation that Sequoia did in 2008. So back in 2008, there was a great financial crisis. And I was running a startup at the time. I was CEO of this small company. And Sequoia did this big all-hands where they pulled together all their founders and they had people come in and tell war stories from when the dot-com bubble collapsed and how it's time to batten down the hatches and do layoffs and the world will never be the same again and everything's over again.

And they were doing this as a service to the startup community, right? They were trying to help their founders kind of figure this stuff out. And I remember talking to one of Sequoia partners during it. I'm like, we're like a six-person startup. Like, who cares? And he's like, yeah, you're right. You shouldn't worry about this at all, you know? And that's as all these financial institutions were collapsing around us. And so this strikes me as like very small in comparison to that. And I think back then that didn't have that much of a real impact to tech. You know, maybe Google did its first layoff ever. But other than that, tech just kept coming along. And if anything...

The biggest tech companies in the world are now 20 times bigger than they were back then. So I think this is an even more minor blip from a long-term tech perspective. Like, who cares?

Again, barring some unexpected path that's splitting off of this. I don't know. What do you think? It has like almost no impact on me, right? I think especially at the early end of the market, I'm like, well, the really high quality opportunities are just plenty of capital for them. I keep discovering that the capital markets are much deeper, and we should talk about this, are much deeper than I thought for very expensive, for example, foundation model plays. I still expect like...

capital availability and a lot of inflow there. I think it's probably a little different for investors who have more public equities exposure, right? I bet pre-IPO crossover investors are getting more cautious, right? You have those sort of much more long-term issues of liquidity having been starved for, you know, several years now. But I think, you know, return of M&A and like several companies ready to go public is

will help that somewhat. The place where the tariffs kind of matter that I think are interesting is for very specific industries where to some extent it's useful for America or the West to protect themselves. So I think automotive would be a good example where some of the Chinese car companies seem to be getting so good that if I was Europe, for example, and given the industrial basis so automotive dependent, I would probably be pushing for tariffs relative to

Chinese imports of cars, right? Because the internal car industry may not be as competitive. And so I do think there are some areas where the tariffs may be useful. There'll be some areas where they're probably being used as like a negotiation tool. And then some areas where, you know, they may be either net beneficial or net harmful.

in terms of actual costs passed on and things like that. But I think there may be a few areas where we should make sure that we actually have some in place. And then there may be some areas where it's going to be negative or destructive. And then there may be some areas where it's just good for negotiating things

broader policy or relationships with certain external parties. People are kind of using a catch-all for all of them versus, you know, looking item by item. Yeah, I agree with that. And I think the productive version of tariffs is as, you know, I think there's a need for a broader industrial policy that's

that is more supportive of the industries that we care about. And like, that's going to be a big investment, right? If we want to make key components for defense or automotive in the United States, like we are quite behind in many domains in terms of getting competitive from a skill and cost perspective. And some of those things are worth investing in on both the positive and the protection side. Yeah, I guess you mentioned depth of funding for models as part of all this.

What do you think is happening in the foundation model world? You and I were just talking about these artificial analysis charts showing convergence, like kind of monotonically more competitive market for capabilities and amazing improvement over the last 18, 24 months. But you just had the most recent Gemini release from Google. Like they're clearly still in the game. I don't know who was

Doubting that given they have infra, they have researchers, not just researchers, but, you know, very smart people at the helm. They're competing here as well. I think one of the more interesting things is that you have convergence not just on capability, but also in the, like, product surface areas. Like, most people have search. They have a research product.

They have reasoning in the models. I think like a lot of it is going to end up with like consumer surplus and distribution being the question. There's actually a really great website called artificialanalysis.ai that shows different benchmarks that they've run.

against these various models for reasoning or for different aspects of, you know, how you test a model for knowledge base or for, you know, other forms of performance, speed of tokens per time unit, et cetera, et cetera, et cetera. So I think that's really worth taking a look at. And you see that for certain areas, there is really strong convergence. And then there's almost like a cluster of models that seem

reasonably within ballpark and again, certain things spike dramatically in one form or another around coding or around reasoning or other things. Then you have a longer tail of other models.

And so at least for the core language model world, which those benchmarks are for, there definitely seems to be some forms of convergence happening. And then there's outliers, right? Like grok or x.ai coming out of nowhere with a roughly SOTA model in like nine months was super impressive. Or, you know, some of the things DeepSeq or others have been doing. And then, you know, they don't really have benchmarks for Image Channel. Those obviously exist on a variety of sites and other places.

But then there's a whole other suite of models that I think are discussed a lot less. Part of that is just the economic value, part of it's what's in the market today. But that's things like physics, it's materials, it's robotics, certain types of science. There may be things that are more specialized in terms of post-training like health-related data on top of some of these core models. I do think that there's a lot of other types of models that

people spend a lot less time on, some of which are becoming quite interesting. Probably the place that gets the most attention outside of the foundation model world, or the core LLM world, I should say, the language models, is probably actually biology, right? I feel like there's a new biology model every week. But there's all these other fields and disciplines where I actually think there's some very big opportunities. And opportunities, obviously, are both societal in terms of impact, but also

In some cases, actually, there's very big markets behind them. I think often the interest level of people working in the industry to build models is divorced from the economic value of these models. Sometimes that's rightfully so. There may be really interesting scientific applications that aren't very commercially applicable.

And sometimes it's really misaligned where you're like, why are all these things getting funded when there's these wide open spaces for certain types of models that just nobody's working on? And so at least I've been looking a lot at what are these alternative models that are interesting from a market perspective that maybe are getting a little bit ignored right now. And then I guess there's the other question of like, is this, and I'd like to hear your thoughts on this, like how many things do you get subsumed into these core LLMs versus their own standalone thing? Like, do you think it's all,

one ring to rule them all? Or do you think it's going to be a fragmented landscape? And where do you think that fragmentation happens? It's somewhat of like too binary a distinction to say like it's a model company versus not a model company, actually. Even many of the companies that you and I in the industry would consider to be like model research companies, they're starting with some base of pre-training of like existing knowledge, which is more and more readily like

existing knowledge and reasoning that is more and more readily available. In the case of robotics, you start with video pre-training. The case of other domains, if you're going to start separately focusing on code, and we can talk about whether or not that's a good idea, you want both language and code in terms of being able to interact with the model. I 100% believe that there are big opportunities in some of these domains, but one of the biggest distinctions to me is...

What does like the data collection engine for this look like? So if you are thinking about physics, chemistry, biology, robotics, like and maybe even some more near term commercial applications, the data you would want, the understanding for the model to learn from, it often doesn't exist yet. And so I think a theory of many of these companies that is interesting is our job is to go collect or generate it efficiently.

And use that to train the model. And in that case, I think the question of like, does it need to be, you know, will it be in this single model to rule them all? There's a question of, well, is it reasonable to expect one of the existing large labs to go do that?

data generation, right? Like if you have to set up a physical lab with robotics to do experimentation on new chemicals, that feels more far afield than cogeneration RL environments, for example. Anytime you go into the physical world, it's always harder to generate data. And that's one of the reasons that the language models where you just effectively collect the wisdom of the internet digitally are the first places where we've really seen this scale of sort of breakthrough happen in recent times.

And coding is a great example where you not only have a lot of the data resident, either online or digitally, but also you have very clear utility functions or things that you can test against in terms of code and its performance and etc. Is it doing what you think it's going to do? So those are always going to be the easiest areas. It's kind of funny. This is an odd pet peeve of mine, but it always annoys me when people who do really well as founders in traditional software and tech

start telling everybody else to go and do the hard stuff in biology and materials and physics and oh you're you know you need to go do be hardcore and you're like well you made all your money in fucking software what are you talking about and so i feel like there's been a long history of that right like i remember interviews with bill gates from 20 years ago it was like if i was to start today i'd go into biology

So I feel like sometimes there's the model versions of this. You're so funny. I feel like you're the opposite. You're like, I actually have a PhD in biology. Yeah, that's why I know. That's why I know reality. I think the other distinction I would draw is like, is it some like orthogonal, like totally different technical thesis? Yeah.

Do I think there's like a research advance that is just very different, architecturally quite different? I'll like describe categories of companies that could be relevant here. We had Karin and Albert from Cartesia on the podcast. I think states-based models are an interesting direction that are highly efficient for certain types of data that are compressible, right? If you look, there are several plays on like formalism and like translating data

problems into lean and taking that as a path to increasing reasoning capability for math and code.

I think there are a number of companies that are trying to train models that are better at taking actions in software and on the web. This is clearly also right in line of the large foundation model labs, but I think they're at least trying to work on a question that doesn't feel fully answered in terms of

consistent, generalizable RL environments for agents. And so there are spaces where I think there is a theory of why the company should exist, if true, versus just being like straight in line of the

of the OpenAI Anthropic X, like Steamroller, of course, and Google Steamroller. What did I miss? What else do you draw as a distinction or like where do you think there is opportunity? To your point on states-based models, there may be advantages in terms of the speed and size of some of those models on a relative basis for very specialized tasks.

And so usually I think of it as a two by two matrix where you have like one axis, which is sort of speed, performance, cost, because those are roughly the same thing for many of these models is inference time effectively. And then there's reasoning, fidelity, whatever you want to call it. And depending on where you are in those different quadrants, you have one quadrant, which is like it's slow and it's expensive and it's not very smart. And obviously nobody wants to use those models.

It's very slow and expensive, but it's very smart and very capable. And that's where you're like, I'm going to upload a 100-document Supreme Court brief and it'll give me this amazing analysis I can use to argue a case or whatever. So high value. And it'll take a while to process and do it. And then there's the super fast, super performant tends to just be these very specialized niche models for specific applications. And I think some of the space-state models tend to work very well for that, some of the SSMs.

for very specific application areas. And then there's the last quadrant. And based on which of those quadrants you're in, I think it really determines the type of things that you can build. And the really fast high performance tend to be more vertical focused or tend to be more focused on very specific types of tasks. And the really slow, expensive ones that are actually very performant

You could imagine Berkeley's versions, but it seems like the backbone for a lot of those are actually these very generalizable models where a big chunk of what you're getting is the reasoning and broader linguistic capabilities that you then apply to a domain. And then, of course, there's stuff that people build on top of it in terms of orchestration layers and specialized bespoke things that route things to different models differentially relative to your use case. And it seems like everything that's quote-unquote agentic right now is basically doing that across customer success and code. And you go through every domain that has a specialized...

and they always have this sort of orchestration layer built on top, so...

You know, I think it's super exciting to watch all this stuff. And I do think some of the applications and some of the less just purely linguistic domains may be interesting in the short run. I think going back to the question of like, is the macro stressing you out? There's like such a virtuous cycle in technology happening right now. This is actually quite dominated by the fact that M&A is alive again. And so we're going to have outcomes. But like to your point, there's exploding surface area of stuff that...

these models can attack. You have, you know, research progress, people making different technical bets. You mentioned DeepSeek. I think model development and continued increase, like, of just more aggressive use of reasoning and test time compute is quite expensive, and training continues to be more expensive. So I think the fact that there are now people

trying to solve data and scale and latency problems. Like that'll help everybody too. Do you know if it's true that the deep seek researchers are not allowed to leave China? I do not know if that is true. I think you in any country should want to hang out to your best talent, but perhaps not restrict people's movement.

I think we should be trying to attract great talent here. We should keep all the AI researchers in the mission district and just let them leave. Somewhere between mission and dog patch. Yeah. Like, actually, we could just draw a line between our offices. They all have to go to Atlas Cafe every day. Let's talk through the, like, talent categories. Actually, for anybody who is not thinking about their kids in 10 years from now, but just thinking about the, like, next

two, three years? Like, what type of expertise is valued where you should stay between, you know, my office and Elad's office and the mission and the dog patch? Okay. Like, you have researchers, you have infrastructure scaling and efficiency. We welcome all of you. Hardware software co-design.

Right. Like design, you know, the next generation TPU or whatever. There's a special visa for you to move into that region. Yes. We're here to sponsor you. Visa program. Yes. If you are ready to design chips to better handle sparsity or massive MOE models or something like I've got a visa campaign for you. Kind of what you said, right? Like anybody who has got deep learnings.

like domain user understanding combined with the basic product engineering it's not basic but the product engineering sense for this like orchestration applied ml area evals for agents setting up rl environments like still very nascent area of like gather context plan make mod like a bunch of model calls paralyze verify retry like this orchestration layer to describe all of that we

We've got a visa program for you. We're thinking about naming it. We'll hire somebody to run it. It'll be great. We'll call it Gillingo. We're going to work on the marketing. I feel we're in the business as usual phase of AI. I think the stack is reasonably well defined and obviously it'll change and there'll be new things in it. But I feel like if anything, the last couple of months have been very clarifying in terms of the consolidation of the things that are short-term crucial. And there's the model layer of that and all the various accoutrements around agentic stuff and reasoning and etc.,

And obviously that will only accelerate and get dramatically better, and it's on its own scaling curve. And then on the infrastructure layer,

I think that solidified a bit. I remember when RAG was a big deal about a new thing. All these things I feel like are kind of falling into place, evals and how do you do them? And I think things are solidifying there with companies like Braintrust and others. And then I feel like on the application layer side, I think bought into a notion we've been discussing for a year or two now around AI really starting to impact different services related industries and vertical applications and in different use cases.

Then I'm starting to finally see some inkling of consumer stuff again. I think it's nascent and early, but at least people are trying. I feel like there was two or three years where nobody is really trying to do anything consumer, although one could argue that perplexity and ChatGPT and mid-journey and all these prosumery things were early consumer forays. Maybe ChatGPT is the world's biggest new AI consumer product. Google was really the original one in some sense. It feels like a period of brief consolidation.

And in a handful of verticals, I think we're starting to see some of the winners emerge. And so I think it's an interesting, clarifying time. And of course, the thing I say about AI is that the more I learn, the less I know. It's the only industry where I feel like the more I learn about the market,

the more confused I am, I feel like there's this brief moment of clarity. And then I'm guessing in a year, all bets are off and all sorts of things will scramble again. But at least for now, it feels to me like a few things have kind of fallen into place for at least temporarily. This actually feels like a very comfortable time to invest for me, because to your point, it feels more like, I don't know, maybe it's like inning three instead of inning one, where

there's a little bit of stability in the ecosystem. There's a real goodness around standardization, some standardization of integration with different... Like MCP, I think, is going to accelerate a bunch of development for people. Like I'm meeting companies where they set up a data source that is useful to the enterprise in some way that...

these models can interact with well and they're like, oh, MCP server. Do you want to quickly explain to people model context protocol and MCP and what that is and how it works? I'm going to fudge this, but I will try to describe it. So this is an attempt by Anthropic, came from Ben Mann's group in labs. It's called model context protocol, which is a

an attempt to spec out a standard interface for connecting model capabilities to systems where you already have useful data. That could be documents, it could be logging, it could be business tools, it could be the IDE, whatever. And Sam from OpenAI said they're going to support it as well. And I think

This is not a complete solution. It has gotten a lot of popularity with developers over a very brief period of time, but it's just like how you expose your data to the model. And it's an open standard, so it's not proprietary. Anybody can use it.

And it's like a two-way connection between data sources and AI-powered tools. And big companies have done it. Yeah, I think there's still a bunch of work for developers to do in terms of describing their tools and how to use them very specifically and cleanly. But it does make it much easier. And I think it will accelerate agent development a lot. But going back to this idea of what does it mean for the ecosystem, I think the fact that you have...

Like, you're accelerating the ways for models to interact with existing ecosystems. We expect agents to get better. You have a bunch of choices around model availability. As you said, there's this, like, clear pathway about how to automate certain types of work that is orchestration of these capabilities. And I think that's going to be super fertile. I do think it's very unclear what types of winning consumer experiences are possible here. There aren't

consumer agents that don't look just like search or research in the large model products that are really working yet that I've seen, but I expect to see them this year. I'm excited about it. Yeah, I think it's cool stuff coming. When everything destabilizes, Elad and I will be back on NoPriors. We'll talk to you all then. It's going to get destable again, but I think it's a moment of calm. And calm is all relative, right? There's enormous innovation, huge changes coming, big technology waves, new things every week.

But at least there's a little bit more of a view of, okay, who are going to be some of the main players in some of these areas? And, you know, how do all these things fit together? So I think we should enjoy the calm while it lasts for, you know, the next week or whatever it is, the next few hours before the next thing drops. All right, signing off, y'all. Good to see ya.

Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.