We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Questions we’re asking of AI startups in 2025.

2025/3/12

Hallway Chat

AI Deep Dive AI Chapters Transcript

People

Fraser

Nabeel

Topics

Nabeel: 我认为，即使AI产品持续比现有产品好10%，也不足以在市场上取胜，这取决于我们对“取胜”的定义不同。当前AI应用主要集中在容易实现的领域，而真正具有颠覆性的创新往往源于解决现有问题的突破。早期投资更青睐那些在某个方面有10倍提升的创新产品，即使其他方面存在不足。我之前的观点与现在有所不同，是因为我之前假设用户已经广泛使用并熟悉现有产品，而现在我认为用户仍处于探索阶段。目前AI产品市场仍处于早期阶段，用户还在尝试不同的产品，因此即使产品只有微小的优势也能获得成功。我所说的10%优势并非指模型评估结果的提升，而是指用户实际使用体验的提升。早期搜索引擎市场竞争激烈，即使某个搜索引擎持续保持10%的优势，也能在Google出现之前赢得市场。用户更关注产品实际使用体验的提升，而非模型评估结果的提升。在用户习惯尚未养成，且产品切换成本较低的情况下，持续的微小改进也能带来最终的胜利。一旦用户形成使用习惯并产生切换成本，那么即使产品并非最优，也能保持市场地位。推理模型的快速发展令人惊讶，这将为产品构建带来新的可能性。推理模型的训练所需计算资源较少，这将促进更多人参与模型开发和产品创新。我们需要探索推理模型在产品中的更多应用场景。当前的深度搜索产品之间差异不大，主要区别在于推理能力的差异。深度搜索产品的可读性和趣味性也很重要，这会影响用户体验。深度搜索产品缺少对更多上下文和数据的访问能力。计算机使用能力是AI领域一个新的发展方向，其应用前景广阔，但目前仍处于早期阶段。计算机使用能力的应用需要更高的准确性和可靠性，目前的技术还不够成熟。我们需要探索计算机使用能力的应用场景，并思考如何提高其准确性和可靠性。在某些应用场景中，计算机使用能力的误差是可以接受的，例如游戏。计算机使用能力的成熟应用还需要时间，未来可能会出现一些令人惊喜的产品。计算机使用能力的发展需要经历一个类似于大型语言模型的发展过程，需要时间来探索其应用场景。计算机使用能力的应用可能并非直接操作计算机，而是辅助人类完成任务。目前计算机使用能力还不足以完全自动化完成复杂任务，更适合作为辅助工具。计算机使用能力的发展需要经历一个从辅助工具到自动化工具的过程。推理模型的进步使得计算机使用能力能够更好地理解和执行任务。我们需要开发更多能够增强人类思维能力的AI工具，而不是试图取代人类。 Fraser: 我认为即使AI产品持续比现有产品好10%，也不足以在市场上取胜，现有大型语言模型如ChatGPT已经占据了巨大的市场份额和品牌信任度，新产品即使略微优于现有产品，也很难赶超。要建议初创公司在竞争中取胜，应该专注于某个领域实现10倍的提升，而非在所有方面都略微领先。随着用户数量的增长，用户会互相交流使用体验，因此产品无需大幅领先就能获得市场份额。我之前对10%优势的假设是基于模型本身的评估，而现在我认为应该考虑用户感知到的整体产品体验。用户更关注产品实际使用体验的提升，而非模型评估结果的提升。在用户习惯尚未养成，且产品切换成本较低的情况下，持续的微小改进也能带来最终的胜利。一旦用户形成使用习惯并产生切换成本，那么即使产品并非最优，也能保持市场地位。我们需要思考在AI领域，先发优势和后发优势分别体现在哪些市场。一些市场可能由先发者主导，但也有一些市场可能由后发者凭借更优的产品体验获得胜利。早期互联网搜索引擎市场竞争激烈，最终Google凭借更优的产品体验胜出，这说明后发者也可能获得成功。我们需要判断哪些市场适合先发制人，哪些市场适合等待时机，利用后发优势。如果产品体验能够带来显著价值，那么后发者可能拥有更大的成功机会。新兴的推理模型可能会带来前所未有的产品体验，这需要考虑后发优势。评估后发优势时，不应该过分关注收入或市场炒作，而应该关注现有解决方案的不足之处。一些早期AI产品在获得高额收入后迅速衰落，这说明需要关注产品是否真正解决了问题。判断是否适合采用后发优势策略，需要评估市场竞争格局和产品差异化程度。对于已经进入竞争激烈的市场，需要考虑通过差异化策略而非简单的迭代改进来获得竞争优势。一些公司通过提供真正解决问题的产品，在看似饱和的市场中获得了成功。我们内部会定期回顾和反思，并提出一些关键问题来指导投资决策。推理模型的快速发展与以往不同，我们需要快速适应并探索其应用场景。推理模型带来的产品体验提升是显著的，这与以往模型的迭代不同。推理模型的应用场景可能与以往的模型不同，需要我们改变思维方式。在法律领域，推理模型可以用于识别合同风险，而非仅仅进行文本总结。我们需要探索哪些领域最需要推理能力，并思考如何利用AI来解决这些问题。我们需要思考AI推理能力如何应用于经济学等领域，并改变我们的思维方式。推理模型的快速发展与以往不同，我们需要快速适应并探索其应用场景。深度搜索产品之间差异主要体现在推理能力上，拥有更强推理能力的产品体验更好。深度搜索产品的可读性和趣味性也很重要，这会影响用户体验。计算机使用能力是AI领域一个新的发展方向，其应用前景广阔，但目前仍处于早期阶段。计算机使用能力的应用需要更高的准确性和可靠性，目前的技术还不够成熟。计算机使用能力的成熟应用还需要时间，未来可能会出现一些令人惊喜的产品。计算机使用能力的发展需要经历一个类似于大型语言模型的发展过程，需要时间来探索其应用场景。计算机使用能力的应用可能并非直接操作计算机，而是辅助人类完成任务。目前计算机使用能力还不足以完全自动化完成复杂任务，更适合作为辅助工具。计算机使用能力的发展需要经历一个从辅助工具到自动化工具的过程。推理模型的进步使得计算机使用能力能够更好地理解和执行任务。我们需要开发更多能够增强人类思维能力的AI工具，而不是试图取代人类。目前AI工具开发更关注效率提升，而非创造性思维的增强。向投资者推销效率提升更容易，这导致了对创造性创新的忽视。真正有创造性的产品更难被取代，也更有潜力。当前的文化环境更倾向于效率和套利，这导致了对创造性创新的忽视。扩散模型的出现促进了艺术创作领域的创新，但其商业化模式仍存在局限性。Midjourney的成功在于其专注于创造性价值，而非效率提升。Cursor等工具作为辅助工具，增强了人类的创造力，而非取代人类。

Deep Dive

Shownotes Transcript

Translations:

中文

We are seeing everybody do the low-hanging fruit first, right? Like, all variations of deep search or deep research are the obvious first thing to be doing. Once they've picked their toothpaste, like, you're not switching them off their toothpaste. But when toothpaste comes out for the first time, you go through this period where you're like, "I don't know, which-- what kind of toothpaste am I supposed to use?" Whatever is broken often is the first step to leading to a new solution.

You think if Grok is 10% better than ChatGPT, you thought that if it remained 10% better, it could win. I can't get my head around that. And I'm so baffled by it that I know that you and I must be talking about two different things. I so strongly disagree with it that we must just be talking, like we're making different assumptions about what we're talking about.

Or why don't we just handle it right now and that'll be your opening? Yeah, well, welcome back, everybody. Welcome to Hallway Chat. I'm Nabil. I'm Fraser. There we go. That's the way to keep it loose, just get in the middle of it. You dropped that on me when I had to run, and that's been bouncing around my head since you did it.

I so strongly disagree with you that my only conclusion is that we're making different assumptions. Right. And so you set it up and tell me your assumption here. I liked your phrasing. My assertion was that if an LLM model that we're using today, so particularly a chat-based model, let's talk about ChatGPT, Grok, Claw, DeepSeek, or whatever,

That has some measure of traction that it's in the comparison set. Let's just say that. But what I actually said was, if the product is 10% better, it's going to win. That it doesn't need to be two times better or 10 times better. If it's 10% better, it's actually going to win over the longer arc. What's wrong with that? Doesn't the better product win, Fraser? Yeah.

No, no. ChatGPT has 400 million weekly active users now. They have escape velocity. They have the brand. They have the trust. They're like a global entity. And if you're always just 10% better, I can't imagine there's any hope in hell that you catch up.

I just can't imagine. The share of users that care about a 10% improvement in that product has got to be so small. We might just disagree. We might actually disagree with this one. But I think there are definitely a couple of assumptions. If you were advising a founder who was always a competitor to a player in the space, you would advise that you could win at the model game by going after some other affordance. You hit

Hit chat GPT where it's weak, go after some area where you're 10 times better. And I have to be honest, like obviously most of my life when I'm talking to a founder saying the same thing, saying like, listen, you just can't be a little bit better. Like that's how startups die is that they find some minor optimization arbitrage in the world and think that that's going to be enough and like people don't care. And so you're not 10x better. It's funny when I...

We're generalists at Spark and we do all kinds of different investing. I find that that's a struggle to try and explain to these seed and pre-seed and angel investors who are like, what are you into now? And everybody speaks in these stupid VC invented market maps and verticals and stuff like that, which ironically we're about to do in a minute.

It's like, let's talk about these as categories when really the most interesting companies are the ones that are inventing new categories that we hadn't thought of before. Those are the places that get really excited. But one of the ways I talk to somebody at a pre-seed fund and I'm like, oh, what company should I go talk to you about? It's like, well, don't bring me the things that are like 10% better. Bring me like,

If it just like you open up the product and you use it, and even if in its early and mildly broken form, you're just like, oh my God, there's a real bolt of lightning. It's 10x better at one thing, even though it's maybe worse in lots of other things. That's the thing to bring to me. So you're right. There's some cognitive distance between that and me saying what I said before.

Here's why. I still think most people aren't using any product like ChachiPT or Clot or Grok at all, really. And they don't really fully know how to use it yet. And the second thing is that I don't think anybody has...

switching costs yet. There's an early life cycle of any new category where for anybody who gets even marginally interested, you have this ongoing conversation where you're like, which is the best one and what should I use? And so you start by opening up the app store and you download

a mail client and then you're like, "I wonder if there's a Betelmeir client?" And for the first couple of years of the app store, you probably download two or three or four mail clients. And maybe you end up at Apple Mail at the end like you, which drives me crazy, or maybe you end up using something else. But before the incumbent is really set in stone, there is this jump ball period

This is what CNET used to live off of and all the tech media outlets. This is what cool hunting lived on back in the day.com is like a new category emerges and somebody needs to write reviews where they're like, I don't know which of these three pieces of software should you use? And I still think we're in that phase. If suddenly...

you were consistently 10% better than everybody else. I'm just thinking forward to the 5 million, 10 million, 50 million, 100 million, 400 million conversations. Like I'm just thinking of as 8 billion people come in to use these products, they are continually going to have this conversation where they're like, oh, I tried ChatGPT today. It might be the first thing that they try. Sure, because it's the best known thing.

But then you're going to go talk to your friend who's the chat TPT guy. Like you're not going to do it alone. You're going to go find the guy you know or the woman you know who's like played and is like, hey, is this the right one? And they're like, oh, yeah, yeah. There's like four guys that do it. There are four companies that do it. And the one that I like the most is X. And so you don't have to be 10X better.

when the consumer base is comparison shopping before they've settled in. Once they've picked their toothpaste, you're not switching them off their toothpaste. But when toothpaste comes out for the first time, you go through this period where you're like, I don't know, what kind of toothpaste am I supposed to use? Maybe I should try three of them and decide which flavor I like. And so even just having a marginally better flavor gets you there. I think the

Assumptions that we have that are different is that even though, and was announced this week, the number of users that are using ChatGPT in opening eyes case, I still don't consider them the incumbent yet.

And so I think the model you're using, which makes sense to me, is the startup attacking the incumbent kind of model. And I'm just not even sure where they're at. That's fair. But I think the thing that I was assuming is that 10% better here was constrained to the model. And I think that like so much of my worldview is still model first. And I just can't get my head around that.

the population caring about a 10% delta in the quality of the model. But I think you're saying that the product experience, if something is 10% better overall. Let's be clear. I'm not talking about 10% better on evals, right? I'm talking about- I get it. 10% better to a person using the product. And another example, like the analogy is it's like,

We're a year into search engines. We're still five years before Google. Yeah. Right. Yeah. We're six years before Google. It is AltaVista, Yahoo, you know, Ask Jeeves time. And I'm just remembering in that time period, everyone who was trying the first search engine, it was a viable conversation on a month to month basis to be like, which search engine do you like more?

And if one had consistently been 10% better than all of the others, they would have won up until Google wiped the field. I don't know if we have a Google asteroid coming in five years or whatever, but do you see what I mean now? Yeah, yeah, kind of. Because Google was like 10x better. And so that's why like- Google was just unequivocally better. Yeah, yeah, yeah.

I think I kind of get there if I change my frame of reference away from thinking of it as the model is 10% better. And maybe like the really cold way of saying that is, I was thinking about it like 10% better on evals. No human is going to care about that. But you're saying- Yeah, let's switch. You know me, Fraser. Like, I don't care about evals. Like, it is 10% perceivably better to a customer. Right. Like a

customer couldn't explain maybe even all the reasons it's better they might muddle through it and also consistently if they're 10 better and then their competitor is 10 better a month later and they're another different competitor is 10 better than if it bobbles around then people will just kind of go to either the first thing they started with or they'll just go to the default most popular thing over time um but if there's some model of some company that is giving you a

chat experience, which is, I do think, a new affordance that you'll just use every day.

And they're doing it in a way that is consistently marginally better than the competition. I don't think you need to be. All right. All right. All right. Yeah, I got it. You know, I got it as well because I'm going back to a handful of conversations. Now my, my worldview is that this is just a new behavior that we're all going to have to adapt to. And like, that's going to take a long time. And part of that adaptation will be figuring out what's the right best product. And it,

over a long arc, something that is 10% better should win in that world then. Yeah. Yeah. It's a long race. And more importantly, you're comparing, like I think the really important thing is most consumers are still comparison shopping and will comparison shop because most of them haven't even started using these products yet. And two, to go back to me, there's no lock-in yet.

There is a world where you've uploaded enough documents, you've told it enough things about your kids, you started to build some real rhythm with this thing where there's just real lock-in, then switching costs become real. And then the kind of B+ product will do just fine if it's the incumbent.

but I'm just not sure we're there yet. And I wouldn't cede the ground just yet. Everybody likes to call the game over. Everybody called the game over the week after DeepSeek and sold all their NVIDIA stuff. Everybody loves to call the game in the first quarter, but that's just not how this works. These are wars of attrition often.

That's a very natural transition then to topics that we've been thinking about and discussing as a firm, as we think about where we are and looking forward to this coming year. One thing that's been on my mind is, we talked about it briefly on a previous episode, was Kevin's presentation from Spark around the early web, who were the first movers and how did they do, and then who were the ultimate, quote unquote, winners of the space.

And I've been thinking a lot about a second mover advantage. And like, what is the chance that we are really just seeing the interesting markets illuminated by people who were quickest to pull, you know, a good new capability into them, whether that's law or code or what have you. And, you know, I think there's a...

There's a high likelihood that some of these will be won by the first mover. And there's going to be inevitably a lot of new companies that are starting today that are going to win markets that we think, you know, are already like one, so to speak.

Yeah, it's, in fact, we just talked about it, right? Like we just used the search engine example from early web where you have this massive fight, you know, between AltaVista and Ask Jeeves and everything over search engines. And then it turns out that everybody, you know, needed a couple of years to digest it all, come up with a new novel thing, and then Google comes in and suits the field.

a little bit later. So that is a, we already referenced a very obvious one. You know, one way of thinking about this is I think about whether there's first mover or second mover advantage in a market. And let's just talk about which markets, right? Like, so where do we have real traction today? We have real traction in AI coding. Those companies are

taking off like crazy. We have real traction. There's a lot of law, AR startups. Anything that is voice that calls people seems to be doing quite well right now, like making phone calls to humans and then charging other people for those phone calls to do some kind of AI thing. Those seem to be working. We could go down a list. An easy way to tell what seems to be working is to look at the

YC class and look where they're centered right now. So you could look at the growth rounds being done, but also for some reason it's completely mimicked in a YC class where of course founders are opting in and being like, well, that looks kind of good. I'm going to do my thing that's 5% different than that. I guess the question is, where are there places where it's important to look for second mover advantage and wait for

or search for something deeper? And where is it go time? Right. Where do you feel the heat and you feel the run and you should be out there in the market with something and learning with your customers? How do you decide where you do one versus the other? I wish there was a tidy way to answer that question. When there's a real differentiated product experience that matters and is valuable, that probably is a case where the second mover can do very well.

Apple is famously the last mover in the markets, and they delivered the right product experience and have shown time and time again that that matters. In many markets, the right product experience, which is going to be a combination of the product and the technology, and here we're super early and the underlying technology continues to improve dramatically.

Does the new reasoning models introduce new capabilities that the previous products can't even absorb in like a very native way into these markets? I don't know. Like maybe, but maybe not. Right. I think it's the second mover advantage, like where I'm trying to imagine being on the board of a company or talking with a founder where your advice would be like, don't go too hard right now, even though there's revenue or traction there.

Like it's not the time to go in or go all in. And for me, in a way, it's somewhat ignoring the revenue numbers

or the hype numbers or the markup numbers in the industry and trying to realize how broken the real solutions are. Because we did already go through a wave where a bunch of people, customers really wanted to spend tens of millions of dollars on a bunch of very early AI products at the beginning of ChatGPT that had huge spikes in revenue and then dissolved. And so

you know, for me, the second mover advantage thing just comes out of, are they fully solving the problem or not? And I understand that's a, that's even, that sounds simple. It's still hard because of course, like we're the business of early stage investing and founders are in the business of starting something. It's always broken, a little bit broken in the beginning. It's always like not perfect. And,

And so none of these things fully, fully solve the problem. You're trying to set marks and problems that are worth working on for a decade of your life. It's not going to get solved immediately. It's like applying the amount of fuel and the amount of certainty that is mapped to how much you believe in the thing. And so a good example for me is if you see five competitors...

that are relatively undifferentiated and all doing kind of well, but customers are already doing bake offs between them. That's a sign that one, it'll be hard to compete. Two, the cost might go to zero. And three, if they're switching it, yeah, it means nobody's doing anything that satisfies that customer fully anyway. And so maybe it's worth being more differentiated in

in those worlds. That might mean that you go after a different customer segment. It might mean that you try to solve the problem a different way, but it's like dig deeper. The harder thing is if you're a founder, that's one thing if you're starting a company or you're looking at the market and you're like, should I start another law startup? It's harder when you've already started the company and now there's five competitors and you're looking around and you're trying to make the really hard decision that actually

Now, you maybe weren't a commodity six months ago, but you kind of are a commodity now. And like that means that you're not going to increment your way

to the future. You need to go set some mark way ahead. Meanwhile, by the way, because there's five of you, you feel like you're in a dogfight. You're trying to hit revenue marks for next month. There's revenue to get. That's a very hard founders. I've seen a few founders really do that really well. We had Granola on last time. That's a good example of looking at a market that looked undifferentiated

meeting transcription was a solved market. There's five people that have all raised tens of millions of dollars and blah, blah, blah. And then being like, I don't think any of these people actually really solve the problem. That's a perfect example where there was a clear new capability that allowed you to, you know, quote unquote, solve a problem, right? Transcription. But it has taken real great product work to like show that there's differentiation in, in,

delivering that capability to an end user. So let's move on to another one, but just to back up for a second, I like the spirit of all this. We're not a thesis-driven firm. The founders come up with theses and we're there to partner with them when they come up with amazing theses and try and help them along the way. So we're not a thesis-driven firm, but that doesn't mean we can't have a prepared mind and we can't ask curious questions. And so big picture is every year and then sometimes along the year,

we come back to these questions. We kind of ask ourselves, what areas are we curious about? So this is not a, which markets are you investing in? This is more of a, what questions are we asking? Yeah. I mean, to add credibility to that, the last investment that I did unannounced, so we won't share, I happened to go see them on a Wednesday. I came back and I'm like, you got to

come with me to see this company tomorrow. And we drove and saw them. And yada, yada, yada, very quickly it all came together. And it was not something- And that was nowhere on this map of questions that we're asking or anything like that. We love a good surprise. But it is worth sharing. Like, look, we just, this is supposed to be

an extension of the conversations we're having internally. And that's really the spirit of what we're trying to do here and to bring founders and others into this little process that we, this muddled process we're going through in AI right now. So I think it's just worth translating our internal conversations externally to the questions that we're asking. And so second move your advantage is a good first one. Let's take one more of the questions that you put down internally. I have them in front of me here and then we can take some of the questions I've been asking internally.

I've been spending some time just really appreciating what has been done with the reasoning models. And I think I've said this to you now a countless number of times, but the surprise of last year for me was how quickly 03 appeared after 01. Like,

01 totally pointed to a future of 03, and I would tell you about how exciting that's going to be, and I can't wait for some number of years when that arrives. And then you woke up the next day and there it was. I think that we are seeing that there's an entirely new vector for training models that is going to unlock a whole lot of different use cases for product builders. I also think that

The thing here that is maybe underappreciated is the amount of compute that's required for this is less than like a pre-training effort. And so I think we're going to see a lot of different academics as well as like hobbyists being able to explore with this type of post-training. And I think that we're going to see people shape products differently.

with the hosted reasoning models like O3 and others that are coming, as well as being able to train their own off of Lama. And then the question is like, what...

Like deep search and all of the derivatives that have then come up are so obvious. Like we've spoken about at length that research and synthesis is like a beautiful use case for these models. And that reasoning now is able to deliver that in a very beautiful way. But like, where else are we going to see reasoning models applied to products? I think there's going to be a lot of beautiful experiments in this area.

And that we're going to see significant stuff in this next year. Yeah. And more importantly, like I love this because more importantly, it's saying like, listen, we all tune ourselves very quickly for pattern matching what's going to work and not work. And it's just acknowledging that the solves for reasoning and deep reasoning, the things that might come out and be new opportunities because of reasoning.

might be different. It's a different sort ordering stack than the things that might have come out of, say, a good agent or the chat GPT or GPTs previously. Even if you just take certain vertical markets for a second, legal summarization and transcription as a company, that's a great 2024, 2023 company. There's an entirely different company you would build in legal if you

are assuming that reasoning is your differentiation versus, say, summarization or transcription as the effect of the model. And so, you know, there you might do things like, I want you to identify subtle contractual risks in this contract versus today's Harvey's and so on are like, just summarize the contract for me, like write a brief, right? Or I want you to predict

I'm coming off the cuff here, but in legal, I want you to predict how this clause and this contract might interact in future scenarios in some weird way. Draw me parables until I can understand why this might go wrong. And so, yeah, it is a reminder to us to sort order differently based on reasoning because different things and different companies and different ideas might rise to the top. I'm also curious on this side,

to just instead of taking the market view down where you're like oh how will reasoning apply to legal or health care or finance or education kind of the reverse side which is like hey what are the

what are the most deeply reasoned areas of the world? So forget market down and use case in, just like, what are the places in the world where we just apply incredible amounts of reasoning? And then what does this feel like to work from that way in? And another area I think about that is areas like, to use the law example, like,

I'm paying attention to the Supreme Court right now. Constitutional law is like there's no hard, fast answer. A lot of times, it literally is like an interpretive principle. And so trying to look through many, many layers of a question

I don't know what the business or the startup output from that is. I wouldn't jump to that conclusion, but it's an interesting situation where you're like, oh yeah, that is an area where reasoning really, really matters. A lot of game theory stuff, either in economics or elsewhere, strategic decision-making involves lots of reasoning around what the rational actors are going to do in a situation. And economics uses...

all kinds of really simplistic principles to try and come to those conclusions because they haven't been able to build off of reasoning agents until this year. So what does that mean for the field of economics? That kind of stuff. That stuff's fascinating. Only

Only questions right now, which is going to be the nature of this entire podcast. It'll be very frustrating to people is we're not giving you answers. We're asking good questions. But I'm fascinated with that. I agree. The interesting thing for me as well, like a meta point, is that we went from GPT-2 slowly to GPT-3 and like that arc slowly unfolded and we could have time to experiment to figure out how this new capability can be brought into products.

And I think we just like have smashed into an exponential curve that's very different with like reasoning. And we're seeing that it is... Why is this any different, Frazier? Why wouldn't this just be...

The same thing that we went through before, which is like, everyone's going to do the dumb low-hanging fruit thing first that involves reasoning. It will be wrong, and it will take 18 months for it to really be thought through and internalized and productized. We are seeing everybody do the low-hanging fruit first, right? All variations of deep search or deep research are the obvious first thing to be doing, but

The difference here is that it's not inconsequential. Like this is a profoundly new product experience that's adding value like in a lot of people's lives. And so I think the difference thing, the difference is that you're not seeing people like,

The first version of GPT-3 was great for a fantasy role-playing game because all the limitations were great for that, and then it did simple ad copy, and now it's writing your high school essay. We have just jumped to the point where these reasoning models are great and that they're going to become great this year already in many new products, undoubtedly.

Yeah, because they're standing on the shoulders of giants thing. They're stacked wins off of the previous work as well. Yeah, for sure. Like, absolutely. Like, that's another aspect to it, for sure. Can we sidebar here? You're a fan of deep research. You use it a lot. Or you used it. Do you find much differentiation between the various deep research products? Like, I'm sure you've used Perplexity's deep research and you used...

open AIs and so on. Do you find much difference? Which one you'd use for different reasons? The honest truth was the, it was Perplexity's integration of R1, the deep seek version that, you know, I came running to you and I'm like, no, people, you have got to try this. It is crazy. And that is, that is in the evolutionary like lineage of those things that followed. It was just more reasoning, deeper search, which is to say like,

No, like they all feel like of a kind. I think if you're doing like very sophisticated, non...

nuanced type of problems, the thing that has access to O3, the better reasoning, no surprise, is better than the alternatives. But if you're asking like, what USB cable am I going to want to hook up to for my Mac, you know, my Mac mini, the reasoning in the search is pretty amazing across these things. I don't know, do you? Do you notice a difference? You've been playing around with all of them as well. I love trying to understand the nuances.

I think one of our first podcasts, I was talking about like, when do you go to Perplexity? When do you go to Google? When do you go to ChachiBT in that world and trying to work out rubrics for that, which feels very clear to me. Although it's changed a little bit over time as these guys have moved. But no, I don't, the thing that I found clear is there's still writing style matters deeply. Like, I don't care what,

Google Gemini's research product is coming back with because the writing is so bad that I just don't want to read it. I can't like my eyes start glazing over when I try to read Gemini's prose. And so you have to get above some bar, which is actually not easy to get above of readability and interestability, no matter what the reasoning is inside of it. But perplexity is there partially because deep seeking and reading

OpenAI are there. I don't know when I would go to perplexity deep research versus OpenAI's deep research. I would suspect that OpenAI is doing something at a much deeper level. So maybe if I had some like internal...

lever that was like, this is really hard. Think about it more that I might go to open AI just by, but I don't know, or my, or just my default might be there because I'm there for other reasons or something. I'm not sure. And then obviously the other competitors will come out with their work. It'll be interesting to see if Anthropic releases a deep research product or somebody else, what they do and what that means and the kind of second iteration of this.

Yeah, the thing that feels like it's missing with deep research for me is

access to more context and data. It's weird to me that they released this product, and I get that it's going to go out and read things off of the web, but it's still weird to me that they released this product without a kind of notebook LM style. Why don't you go grab these 15 academic journals and toss them in here? Or why don't you give me all the internal PDFs that you also would have read through for research? Let me synthesize that with internet data and give you back stuff. I assume that will come.

It'll come. I, I, you know, I mentioned to this to you earlier where I feel like they've gotten a little bit of their groove back. Like I love the fact that they released it without any of that stuff. Right. It's,

Here is a research preview. We are a product in service of the model. There's as minimal product experience built around it as possible. Let's just get it out there and see what we can mold from there. I love that. And so, yeah, I think all of that stuff will come. Whereas if they launched it and this wasn't a response, they wouldn't have had to have built out all of that stuff. And now they meander the idea maze. And they have...

users and usage and like feedback to help guide them through that. Yep. Okay. So let's go to the next questions that we're asking for 2025. Let's go to the next question that we're asking for the year. There's like a natural bridge from one of mine that we just talked about to one of yours. And that is like, if reasoning was a new capability, but it accelerated in terms of how good it was so quickly that we're going to see these profound products soon.

Computer use is another new capability that you've been mulling over and asking questions of. So why don't you talk a little bit about that? Yeah, and for contextual background here, we did invest in Adept, which is originally trying to build its own model against this. So I've been looking at this space quite closely for some time. OpenAI obviously has released their operator computer use. Anthropic will have computer use. It is an area that feels...

the way that reasoning was last year. It feels like it's 12 to 18 months to turn into real applications now that there's APIs available. But also at the same time, to be slightly skeptical, we've had some measures of research-oriented, maybe not available to the general public, but research-oriented computer use for a couple of years now. And so part of me would be skeptical and say like, well, if it was really obvious, why wouldn't it have appeared right now?

But the other part of me just watches using these products and just feels like none of them are quite productized well enough yet. You know, they're not perfect. They're okay 87% of the time or 92% of the time or 96% of the time. And unlike trying to tell a story or do an RPG dungeon chat GPT experience, that variance is a real problem. Yeah.

And so the question is like, what are you going to use computer use for? I think there's probably two threads that I'm very curious to go down. One is, do these reasoning agents and probability agents help these computer use models? Is there a way that that might help it get better at understanding when it's about to screw up and go ask for help or reason around how to fix it itself?

you know, call your boss in and say, I'm not sure if I understand how to scroll through this or whatever. And then the other thread is, what are their areas? And I have some sketching on this that I've been doing, but like, what are some areas of the world where that variance is okay? Mm-hmm.

And being a pixel off or five pixels off is okay. When people think mostly about computer use, they think about RPA like applications. They think about, I want you to go to this website, click on these five buttons, scroll, copy, paste, that kind of thing. But one place where being a pixel off is maybe okay is playing a computer game. The way that you can play a computer game every time you do it and the way you go through things, it's

It's not even deterministic in many cases, especially with social games. Now, what's the business or startup idea there? That's a whole other question. But as a thought experiment, just like the process we went through with early GPT, what are the areas where the randomness is okay or even ideal?

Yeah. What's computer use going to be? I don't know. I've been thinking about it for a long time. It feels like in 12 months, we're going to have an answer because it feels like we're finally getting to the point where these things are getting to the public. Yeah. They're getting exposed. I think enough startups, frankly, are trying to play with those APIs and are really pushing on it. I think this is...

The opposite of reasoning, where it's a brand new capability that requires, as you said, great performance and reliability in many use cases for it to be good. And it's just not there yet. And so like we will go through the equivalent of a GPT-2 and a 3 and a 3.5 and like all the other arcs as we figure out the use cases and we'll wake up in 12 to 18 months and there will be a lot of profound stuff.

I would be surprised if there's great value delivered like a deep research with reasoning in like the next month with these computer use products. I just would be so surprised. There's some like little thing inside of me. There might be an RPA company here and it might be great and huge and that might happen. And I'm certainly open to it. But there's something that smells like the thing a couple of years ago where we were trying to have the first versions of technology.

chat GPT be agentic and run around and do things. And it just would go off the rails and be terrible and just wasn't good enough. And it turned out that treating it more like a co-pilot was a good idea or evaluating it's better at evaluating than writing. You know, back then it's like, I could give it a piece of, I could say, please write this in the, in the style of Paul Graham. It's terrible. If I gave it a piece of writing and said, how different is this writing from Paul Graham

It's actually quite good at analyzing text than writing text in a good format. And I think there might be something here that's like that where, hey, just because it knows how to work a computer, the answer might not be that it actually does the work on the computer. The answer might be that it like

Maybe it's watching a worker do something and then like reaching in every once in a while and being like, you're about to do that wrong. Or like you seem confused there. Can I help in this one spot? And because it understands the language of the world that you're in, it can come in and assist, but it's actually not trying to do anything.

35 actions in a row autonomously because we're just not there yet. Like it's not ready to be Devin. Like everyone loves to have an AI engineer go and run in code for five hours and we probably will get there. And there's a reason to maybe think about trying to get there and maybe you try to be early. But like, I think computer use is more in the soup of like,

you know, we're in the GitHub co-pilot phase, not the Devon phase. And yet most people aren't articulating or trying computer use in that context. Yeah, that feels good to me. Like I can totally imagine it

We have to go through the co-pilot step of just like cheap code completion before you can get to like the miraculous thing where it's doing crazy automation for you. That feels good. There's a way to tie these two things together. Like Bob McGrew, who is a buddy and was former chief research officer at OpenAI, had a tweet around OpenAI's deep research. I'll just read it. The important breakthrough in OpenAI's deep research is that the model is trained to take actions apart of its chain of thought.

The problem with agents has always been that they can't take coherent action over long time spans. They get distracted and stop making progress. That's now fixed. And so the interesting thing is that computer use may actually be something that

You and I interact with through our products on a regular basis, but indirectly because it's the model calling it through its chain of thought reasoning process in order to go and get the information that it wants to help us. And we aren't even aware of, you know, it's doing of that. Yeah. Yeah. I think that is actually a great reconnection. What else we got? You want to do AI as Muse and not an Oracle?

I think that's a great question. Why aren't we seeing more cursor for X? Right. Yeah. How can we build more AI tools that enhance human thinking rather than trying to replace human thinking? AI is mused and obnarchal kind of phrasing.

I think that is because Silicon Valley is broken and people are lazy. That's all. That's just a simple put, huh? I think, look, people, I have this phrase I wrote down, which is like one of these like mantras for yourself, for me, which is just like trying to nudge the world into taking creative risk over arbitrage. And I think we...

have an engineering mind instead of an artist's mind when we attack many new problems, when often the more joyful thing to build is a company and frankly, they think consumers like more, frankly, the larger potential outcome is a company often.

is not efficiency oriented. And I understand that like engineers and economists try to run the world and all they know how to do is to walk in and say, well, if you just took the profit margins from 12% to 14%, wouldn't you be in a better shape? And that's a lot easier to think through than inventing a new world.

And so I think it's all of that thinking baked into here. It's a lot easier to pitch to a VC that you've gotten efficiency gains. And so if you're just trying to pitch efficiency gains, you're going to try and pitch a customer and you're going to say, well, Morgan Stanley was taking five minutes to do this job before, and now it takes two minutes. And so like, that's, that's the same mindset that, that drives you towards arbitrage. I think about arbitrage is that somebody else is about to arbitrage you right after that. Right. And when you really truly invent something new, um,

Or if you invent something that people are doing the joy for it, it's very hard to replace. And so those areas of the world have always intrigued me more. And I think that used to be an area where Silicon Valley was very, very focused. That kind of like Apple is a liberal arts oriented company as much as an engineering oriented company pitch that Jobs used to do. I think in this more recent era,

machismo efficiency-oriented world. We get a little bit less of that. So anyway, that's a sidebar rant, but I think that that's why we're not seeing it. I think we're not seeing more tools for thought that are AI-ism used on an oracle because we are seeing in a cultural pocket that is very efficiency and arbitrage-oriented.

But I think that's giving up the larger goal. A really good example of this for me is exactly what happened in diffusion models. You had every single attempt in making new art as we got diffusion models able to make art. And you had the Midjourneys of the world, you had the Dallies of the world, you had the Leonardo AIs of the world. And I remember all of them pitching that same first year and

VCs got very interested in mid-journey because, of course, they could see the revenue and they got excited. But also...

all the advice is bad to what they're supposed to do. Hey, when are you going to go talk to Paramount Pictures or Vivendi or Blizzard or some game company and make art for them and help them, you know, look at how many production artists are on this game and you could make it more efficient and you could like, it was all an efficiency arbitrage play because that's the way we process the world. And I give David a lot of credit at Midjourney and it was the right pick.

I'm not going to go work with the old world and try and make them a little bit more efficient. This is going to be its own thing. People are going to do this thing for the joy of making the art in this world.

I can't size that TAM for you, man. Like I can't build that TAM slide. And that's okay because I'm just going to do the thing I believe in. And so they have, I would consider what I was almost like a net new thing in the world. So I don't know. You know, there's lots of questions to answer when it comes to what is AI even as a Muse versus an Oracle. Cursor, I think of as that way. It's not trying to be Devin. It is also not trying to be Copilot. It is...

You're a partner. It's going to go off and do 30 seconds of work, not 10 minutes of work. It's going to come back and ask you questions. It's going to, it is like your little partner for coding. And I would love a little partner in most of the activities that I do that have some creative element to them. And so what is the cursor for X sounds like the stupid VC language version of processing this. But I think I mean something a little bit more profound. Like I, I,

I love the interactions of coding with Windsurf and Cursor and that level of, and I have no desire, although I know the world wants a desire for something to just be an AI engineer to go off and do all the coding for me. Right. That should exist. It's less interesting to me.

Yep. Yeah, yeah. I heard Satya on the Dwarkesh podcast say that there's white-collar work, and then there's white-collar workers, and that white-collar workers are going to continue to do cognitive tasks. They're just going to change, but the actual white-collar work that they do may look very different. And the reason that I bring this up is that just resonated. I think that we are going to continue to be in a place where there are workers, and there

In your Muse example, it is white collar work working with AI in like a new way to change the way that like software engineering has been done. Right. And it's not that AI is going to get rid of the white collar worker, the engineer. It's just going to change the way that they do their work.

Yeah, I think there's lots of embedded really incredible questions to ask when it comes to what are those tools for thought in the future? Like in a creative field like mid journey, what are the types of affordances and UXs that you want for a user as they're trying to walk a solution space that is like

trying to get something that's inside of their brain outside in the world, but there's no English language for it. If you're trying to describe a song that you want to exist or a poster that you want to exist or a painting that you want to exist, there just is no real perfect language. And so how you navigate through the sea of possibilities with an AI to get to something that is kind of like what's in your head or more likely that

that the back and forth with the AI ends you in something that you didn't imagine in the first place. Right? And that's a little bit different from less...

creative fields and more analytical fields, areas where you're trying to get to a solution. You might not know how to get there. And so asking really good questions as you get there will help you get there. That feels like two wildly different product areas. Analytical, I'm trying to figure out the truth of this company and whether retention curves are working well or whether we have product market fit as an example. We're like, that is actually like a co-piloty thing you could try and figure out. Do we have product market fit right now?

right now. It's a creative exercise, but you are trying to get to an answer. And obviously the arbitrage version of that, the Silicon Valley arbitrage version of that is, "Hey, we can write your SQL queries faster." But that's not interesting. That's not getting to the root core first principles

question that a person is trying to answer there, the white collar question, white collar work question that somebody is trying to answer there. Yeah. Yeah. I have lots of questions there. I think it's super interesting and it's also an area of founders are navigating it that there just aren't going to be 35

other pre-seed companies doing the same level of investigation because of the air we all breathe right now, which is very arbitrage oriented. Yeah. Yeah. Well put. Did we talk old markets at the beginning? We talked old markets at the beginning, right? No, we didn't. We talked about it as something that we could talk about. Yeah. I,

The last question I'd ask is on old markets then. Let's end on that one. I think we have other areas of curiosities for us internally. I try and constantly update a list, but there's a handful of folks. And I actually hope this episode leads to

Maybe you all listen to this and you think we should be asking different questions or you have an answer to one of these things, these questions we're asking. Maybe you'll have three other areas you're curious about and we'll do next week on what anybody else is asking questions about right now. That might be fun to explore as well. So yeah, basically, which legacy markets are ripe for AI reinvention?

And obviously, this is a question that everybody asks, but I think the general lens with which people look at is like the last few years, they're looking at the thing that's like the unicorn from four years ago that they might be able to reinvent. And so, yeah, the area I am drawn to curiously is...

is what are the areas that kind of are almost always reinvented? There's some truth to things and industries that are always on the front end of innovation and so kind of always getting reinvented whenever there's a paradigm shift. And so this is things like

We talked about the instance of Discord, going back to AOL IM, going back to IRC. Messaging just seems to get reinvented with every new paradigm shift. Marketplaces, task management, review systems like Yelp, each of these just seems to absorb. I don't know what it is. We probably could invent some acronym.

phrasing for why these particular industries, if we're trying to write an HBS article or something like that, a Harvard Business Review article, why these particular industries always get reinvented, but they kind of tend to. And so those are the areas where I'm just trying to be present and realize that even if there isn't a startup coming in and pitching this week, that they are likely to be upended, that we still have an answered response.

really basic questions like, "Which restaurant should I go eat at tonight?" Or, "How do I arrange my tasks for the week?" And these are kind of like, in a way, interminable questions. Like, we'll never fully answer, they'll never be the perfect answer. And so, they can always absorb new technologies and come up with better answers.

you know, again, it's a question. I don't have a great insight into it. The whole point is like just being curious about it, right? It leads to more curiosity and more questions for me. We might've talked about this before, but like a lot of these things have to be re-imagined for a world where the previous business model also isn't as feasible as it had been, right? Like,

Yelp reviews made sense because of the social exchange that happened within that community. And then the business worked because of ads. Both of those feel like in a world where your AI assistant is providing the recommendations and the reviews somehow, like where's that coming from? What's the engine that makes that whole thing work? Yeah, I mean, I think that it's almost like we talked earlier about some themes, like being able to do reasoning or...

Another theme is just this idea that I've been talking about at least lately, which is like if Web 2.0 was the wisdom of crowds, then this age is really like the wisdom of experts. You're not trying to get the average of what everybody would solve a physics problem. You're trying to get how a PhD person who's an absolute physics master would solve this physics problem. And so that's a very different paradigm. It's like taking summarization is another paradigm that comes out of this. That's an old one, but it's another one that comes out of this paradigm.

Computer use might be one too. It's taking these paradigms that emerge. Maybe we should do an episode on the paradigms that have emerged. That's actually a future episode would be how do you, which lenses of AI do you apply to each of the things? But that's exactly right. It's like, how do you look at each of these things

and then apply that paradigm to it. What is this new world we're in? And what comes out of it? Another one is malleable software. We can now write software and change software. So for instance, every messaging application, for me, this is a perfect one. Every messaging application has a left bar

which has some levels of categories on it. It's either oriented by the people that you're talking to, or it's the channels that you're able to communicate in, whatever. They all have some fixed ontology. Why would they have a fixed ontology in a world of malleable software? Like, why wouldn't those things rearrange themselves dynamically? But what does it mean to build a messaging platform from first principle that is meant to be that way? It's probably not a thing you slap on later. It's probably a new kind of platform. What does that mean? Right.

Like you said, what does it look like? Yeah. What does it look like or feel like to use? And what promise new promise are you making to a consumer? Yeah. Yeah. That, that you could do, you could go down the list. You know, what does it mean when you're recommending a restaurant? And does that mean that I get to pick the expert?

I want to know what somebody who really knows this restaurant is, is going to pick for me. And it should be at this point, ideally one shot, three shot. You should really know if it's an AI. I shouldn't be glancing through a bunch of photos and a bunch. At some point, maybe I'm training a model on my own restaurant recommendations. I don't know what the answer is, but.

Yeah, I also wonder where we even get new data. This is a totally different subject, which we got was like, I don't even know where we get restaurant reviews in a world where the wisdom of crowds model is fundamentally broken, right? If the ad model is broken, then there's like the principle

promise of the internet is broken at this point, right? The idea was I made content for free. That gets served from the search engine. You click on the ad and you get exposed to the thing. So that's the virtuous circle of the internet. That's just broken.

Like I used to go like, what do I go on a watch? I'd go to Rotten Tomatoes and I'd go page load, page load, page load, page load, page load. I recently asked Claude and actually it was, I haven't told you this. It was, it was awesome. I'm like, here's what we feel like. Here are like the temporal aspects of the shows that we want to like capture. Right. Give us a recommendation. And it gave us a list of three and we read one and it sounded awesome. And we went to like of all places, Peacock. Like it was on some like, you know, and it was good.

It was good. And we didn't go to Rotten Tomatoes. We didn't give them 25 page views. They certainly probably got their information from sites like that. But where are they going to get that in the future? It's very, very, very curious. And you see little bits of it starting to emerge, right? Reddit is selling their data.

Does that extend back to the user? Does the user of Reddit now get an extension of that fee? Or do you still contribute to Reddit knowing it's going into a model? Is there a new version? Is it just Yelp settling their data? Or is there a new reason why you might contribute to a Yelp in the future to help inform a model to help...

inform some other user in the future. Where we get our data from and how is also like an interminable thing that we've worked out in the next five years that nobody knows the answer to because certainly the Web 2.0 era paradigm is broken for now. And some of these ideas of old markets being reinvented, it might literally come from that. It might be like, hey, we figured out how tasks get done or a new version of task management or a new version of a marketplace or a new version of reviews because

Because we have figured out this new economic model. And so because we have this bi-wheel new economic model, that leads to a different kind of experience, which also leads to a new revenue stream. And like, I don't know. I don't know what comes out of that. Whatever is broken often is the first step to leading to a new solution. Yeah.

Yeah, well put. Well put. The answer is for the next number of years, it's going to be Reddit. It's going to be all the places that already have a network because those network effects are strong and they're going to be able to sell and make good revenue from that, but that's going to diminish with time. Yeah, those things feel like FM radio in a way. That's right. It feels kind of valuable, but it feels like somebody's going to build a new market that incorporates in things through...

not brute force. I'm just going to recruit a PhD to answer the question, but like, how do you create a real new, a novel market structure? Yeah. That'll be fun. Good conversation, man. Should we be done? Yeah. Thank you. Thanks for chatting. We'll figure out how we package all this stuff and make sense out of all these questions we're asking. See ya. See ya. Take care.

Questions we’re asking of AI startups in 2025. 52:08 Share

Hallway Chat

Deep Dive

Shownotes Transcript

Questions we’re asking of AI startups in 2025.