We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AI Severance: One Memory or Many?

2025/6/5

Hallway Chat

AI Deep Dive AI Chapters Transcript

People

Fraser

Nabeel

Topics

Nabeel: 我认为从产品角度来看，最重要的问题是AI的长期记忆如何影响用户行为，以及用户如何看待隐私与消费者利益之间的权衡。如果AI要真正理解我，它需要了解我的妻子、我的工作、我的爱好等等。我个人认为，用户为了获得更大的便利，会愿意放弃一部分隐私。 Fraser: 我认为经济并不奖励过高的智力水平，但水平一般的AI如果能避开大公司的垄断，或许仍有重要价值。我们不应该总是追求最顶尖的AI模型，而应该关注如何利用现有的技术解决实际问题。同时，我们应该思考如何平衡隐私和便利，让用户在享受AI带来的好处的同时，也能保护自己的隐私。

Deep Dive

Chapters

This chapter explores the potential democratization of AI models, comparing it to the iPhone's widespread adoption. The discussion centers on whether the best AI models will become accessible to everyone at a reasonable price within five to six years and how startups should adapt to this potential shift.

Democratization of technology, comparing AI to the iPhone.
The importance of providing the best user experience even at a loss.
The role of VC funding in supporting loss-leading strategies.
The unpredictability of AI model price drops and the need to be prepared for them.

Shownotes Transcript

So then what are the long-term implications of the idea that these are going to have very, very long contextual memories for what we put into them? Everything that you just said is the most important set of problems from a product perspective to be working on over the next little bit. Human behavior is, first of all, doesn't care about privacy. And second of all, finds ways around it if there's a consumer benefit. It reminded me that I'm now in an environment where somebody is watching absolutely everything that I do.

We do not as an economy reward that level of intellect. That's the most optimistic story I've heard yet for why a horizontal agent that's not from any of the labs could still become quite important. Hey everybody, welcome to Hallway Chat. I am Nabil. I'm Fraser. Welcome back. One of the great things about technology over the past, certainly of the past 15 years,

is that if you were the world's richest person or somebody making a median wage in middle America, you had access to exactly the same technology. You're both using an iPhone. That's right. Doesn't matter. Bill Gates uses the same- Yeah, he uses the best phone. Uses the best phone. Maybe not Bill because he's at Microsoft. And so, yeah. Sorry, he doesn't use the best phone. Yeah.

By the way, I resent this because I lived through an era where I was trying to import random weird Japanese phones that were like weird and quirky and were trying to be a camera or like whatever, note-taking. It was like I lived through the era of plentiful phones, and now I just have the same glass sheet as everybody else. But yes, it is democratizing. Equal access to the very best product. Yeah. So that's a good framing. The question I was going to ask is,

Because we were covering a little bit last night. Do you think the same is going to be true? No. More assertive. If that was going to be true in five years. I see. If you felt like the same thing was actually going to happen to AI models in five to six years time, which is to say that if you were a founder today, that is inside of the window of you starting a company to getting an exit. So it's like in your lifetime of your company, the best models will be accessible to everybody at a reasonable price, accessible price.

How would you behave? What advice would you give them? How would you behave differently? I'm going to answer that, but stick with me. Because I think you need to tighten up what has to be true. What do you have to believe for that future to be the case? Oh, no, I want to do that at the end. You want to do that at the end? I want to do that at the end. I want to jump to a world where you're sitting down with the founder and we just had this hallway chat together and we convinced each other 100% that in five to six years' time,

Like everybody has access to the highest, best model at a reasonable price. Yeah. And you're like, oh, I need to go take action. I guess you could talk to founders about this. Yeah. What would you say? Provide the best possible experience at a low cost and absorb the losses in the short term. So we see people say, oh, I'm losing money on these queries.

And so I either have to put in a credit system or some sort of other types of friction on use so that they can be unit profitable. And I think you would tell them, F that. You say, no, you are going to provide the best experience

And you were going to absorb the cost because in five years. Presupposing that you have enough cash in the bank to do. Yeah, for sure. Which is why you need VC dollars. Sounds like I think a VC would say. No, just your point is, if I just sum it up as like, oh, if I knew that it was about to become commodity, the cost was really going to fall that fast, then you would be okay taking loss leader.

If you believed it provided a better experience. Yeah, look at Google's new video model as just the latest example. You should just see what happens when you throw tokens at the problem. Yeah. I had a dinner with my in-laws the other night. Yeah. And I was telling them about a remarkable experience I had with O3. Yeah. And...

My father-in-law said, "Is that available on the free version?" I have no idea what their pricing is anymore. I don't know if they have very limited amount of O3 tokens or not, but I said, "I'm not sure." And in that case, I think if you believe that we're heading toward that world, I think that you would provide the best model to all of your users. I think probably more than anybody, OpenAI is trying to exhibit that behavior. They are trying to move as much stuff for the free tier as possible, eating an incredible amount of money because they do believe...

that this is true. And so the reason I asked it that- I'd go further than that. I think that chat GPT, like the phenomenon and hitting the zeitgeist that it did can be chalked up to a handful of wonderful serendipitous good luck and good fortune, as well as the ability to redeploy a cluster of the size that the research team had currently and continue to make it for free. No startup in that moment in time could have done that.

The reason I asked it as an assertive yes is because I don't know whether it's realistically actually the iPhone and everybody has access to it. But I also think that nobody can predict how quickly the price is going to drop, especially over that kind of timeframe. And so effectively, you should behave that way anyway.

I got a debate with a CEO last week where I was literally having this conversation, actually. And I was reminded of it last night. Hey, you're doing something that's agentic. It's cool. The thing is running away and it's doing a thing. I saw a demo. This is for a portfolio company. It looks amazing. What if you throw four times the tokens at the problem? Profitability is a problem. Not sure how we price it. All of us. Have you done it yet? No, it wouldn't make that kind of sense. It's like, well...

It's possible that you five times of the tokens of the problem and nothing happens. But at the very least, you better know before your competitors do what that does. Go...

Figure out whether throwing $400 on a single query from a user instead of four cents generates materially different and more interesting output. Because if it does, you better make sure you're there before other people are because, of course, the cost curve is going to go down. And predicting how quickly it's going to happen is a fool's errand. You just have to be ready for it to drop.

And maybe you hold it, if you only have a million dollars in the bank, okay, so maybe you don't release it to everybody, you release it to people, or you just show it on Twitter, you frankly just show it to investors to raise the money or whatever it is. But you best be testing what it feels like to throw an uneconomic amount of money at a set of agent behaviors and seeing what comes out of it to figure out where you are on your curve of expertise. I wonder if there's two separate things here. And I think what you just said is...

undoubtedly the case. Okay. Like...

You should not worry about the budget of your tokens as you iterate to deliver the most delightful product experience that you can. Yep. And find product market fit. Yep. Because delivering that product, you have high certainty you're going to be able to reduce the price. And the much harder thing that we've seen is being able to deliver something that people actually care about today. Sure. There's another question as to if you always have to just scale compute and or like totally

So, you're putting tons of tokens at the problem and beyond the frontier model. Should you just peg and stay at that place? Because your scenario was like a finite moment in time. And if you deliver that and you can't... So, we had a company who delivered that, they launched their product, their spend exploded. They then spent the next couple of weeks driving down the cost 50% and then 50% again. And that's even before riding the deflationary nature of it. Yeah.

If they hadn't been able to reduce the cost and they had to actually sit on the frontier so every new model comes out year after year, should they stay delivering that product and absorbing that loss over the next handful of years? Yeah. What's your time horizon is really the question. Yeah. Yeah. I think in a world where most of the feedback cycles are too short, it's three months in an incubator, it's one year until your next fundraising round, being long-term greedy while being short-term high velocity is...

the winning formula for most of the outperforming startups that I have an interaction with. Those CEOs might word it differently. That's basically what happens. The people that are incredibly short-term oriented, what do I have to do just to raise my next round? And that kind of stuff tend to make a lot of very short-term decisions that don't play out. The people that are only long-term oriented, a lot of times what that actually means is I don't want to face failure early. I'm just, I'm hiding. And

the balance of the two of those things is the great. Now, there are times where I have three months of cashflow left. I've been there as a founder, blah, blah, blah, where you're like, I can only think short term. Life gets in the way sometimes. And similarly, if you have a million dollars in the bank, you're not subsizing for the next five years anyway. But in terms of orientation, what are you trying to get done? Yeah, I think you want to be long-term oriented, right? No? I think so. Yeah. I think so. Another question. Okay. This is related-ish. Would you rather five...

1,000 iffy agents that are pretty good at what they do, but reliable, but not crazy brilliant. Or one crazy brilliant smartest person in Google versus 500 unit level Google people. Do you want one Ilya or 500 me's?

I think it totally depends on the task. Yeah, yeah. Undoubtedly, it depends on the task. And what does that mean, though? Well, so first of all, do you disagree with that? I think that there's certain science questions that I don't want to throw the wisdom of crowds. Like, you want somebody who's brilliant to solve the hard science problem that hasn't been solved. And then you don't want the one brilliant person to crash through the expense report on nature. So the dinner party academic...

Like EA version of this conversation is you of course want the one brilliant person because the one brilliant person will figure out how to manage all the other people and they'll run the economy and they'll also make themselves more brilliant and there'll be self-effect ASI and all of the other things, right? So the kind of hardy conversation version of this is fairly obvious. I think an economist would answer that it's 500 mid-level people because if you're

If you just look at supply and demand of the market, it's very clear that the most brilliant person is not paid 5 billion times more than 500 average people. We do not, as an economy, reward that level of intellect. Now, you could also have an argument that intellect is not directly related to economic output anyway. And so what is it like? So that's why you don't see that loop? Is that true? What? Does the average worker...

not make however much less than Greg Brockman? I think compensation is tied to a wide variety of things. Yeah, yeah, yeah. I mean, the risk that he took and all that other stuff. And yeah, yeah. I think we give an incredible amount of economic benefit to extroversion. Yeah. We give an incredible amount of economic benefit to...

males for being male to people who are friendly. There's a bunch of social science studies about this kind of stuff. - Agreeable. - Yes. All things being equal, meritocracy, yes. All things are not equal. Humans have preferences. - This isn't for economics. Sorry, for the reality. I'm going your point fine. - No, the loop I would come back to is how many things do you do every week and how much corporate output, how much personal output?

Are you trying to be the top 1% in and how much of it is just work that needs to get done? I do not need the top 1% best house cleaner on the planet. And I am not willing to compensate the best 1% house cleaner. I just need somebody to help with the dishes sometimes. And my Uber driver, I do not need the best driver on the planet.

I just need somebody who's going to stop slamming on the accelerator and making me car sick. Put the dots on your iPhone. What about your Thursday at work? I think we ask this question all the time. I would argue that

I don't know what it's like. We could talk about what a startup is like, but Frasier, look, one of the reasons, I guess you're throwing it back in my face because what I'm trying to lean into is you just want 500 people doing really good work, but my actions would betray that given that I strongly advocate at Spark to keep the partnership to six to seven people and to not scale. No, but I think I'd push us there not to throw it back in your face because I actually think it's interesting. My guess is that

the lion's share of your job would benefit more from having 500 people running around. And then I think there's very acute moments where you are probably in the P99 and it's decisions in those moments and your ability to perform in those very small moments that make the difference though. Is there a rubric that...

Because this would lean into how you think about Marvel's element if you're a founder. This would lean into how you think about expenses and revenues. It would lean into what products you take on now versus three years from now. I mean, this is a parlor conversation, but it's actually, I think we have a lot of implications for where you'd lean. So is there a rubric for the things that you would want 50 or 500 or 5,000 or 500,000 mid-level people to do?

versus the one genius, which would be another proxy way of saying, what are the tasks that we as a startup community should try and take on right now? And which ones should we wait? That'd be another way of wording it. But that's not the phrasing I wanted to be. I wanted to be like, is there a way of framing or thinking about the job to be done, the work that would help you figure out one or the other? Is there marginal returns?

for being exceptional at the task. And I think that most, as much as I don't like to admit it, most of our tasks, there aren't. Right. And then there's a few small, finite set of tasks where it's true. And I think that

What does this mean? I think that there's- That's the thing we found with coding. Yeah. The thing we found with coding over the last few years is that when I sit down with really amazing programmers and they look at vibe coded code, they're like, this is horrible. And the whole point is, yeah, it gets the job done. It's fine. Like, I understand it's not as structured or as efficient as you would have done it. And like, it can be done better, but like average is fine. It executes, the webpage comes up. Yeah. The search function works. Yeah.

Yeah. And that's knowledge work. So I'm not even talking about like brute force, copy these things out of an Excel spreadsheet and put them in my database stuff. I'm still of the mind that you are going to want to have software engineers orchestrate 500 agents to do those tasks. And then on the stuff where they can actually have an edge and add value, they're still doing that. What's the UI layer for orchestrating 500 agents? That's a great question. What is the UI layer...

Let's start with the simpler question, and that is, do we need a new UI layer for orchestrating agents? This is the conversation that, yeah, we were having with one CEO recently who was just saying, we already have a way to talk to 500 agents, and it's called Slack and email and all of the other office affordances. You're just building Microsoft Office for agents, basically, or even, you know, you have to refer to them as agents. I...

And empathetic and love the simplicity of that. You're not teaching a person anything new. They're using project management software that's aid. And they've all same way. They've always used project management software. They're just assigning it to an agent to do or not. There's a simplicity to that. That maybe is an in-between step for me. I don't buy any of that. You don't buy any of it. I don't buy any of that.

I slack an engineer and I say, listen, we got to make something. It's due by Tuesday. And they run away and they do stuff and they try and get it to me by Tuesday. I don't know whether that's a human or an agent. It doesn't matter. Like, why not use the things that we have evolved software to be good at, which is communicating with other entities. And this is just an alien in our midst. I can believe that. Okay. I'll catch you back. Yeah. My response initially was because I think we're a ways away from

from having agents be able to take a task like you just said and come back with the output. And in essence, that's what the product is today for the most part, or the products that are getting built. And I think whether it's in Slack or otherwise, the more interesting thing is mimicking what happens with that teammate today, where it would be a very special work relationship for you to

give three directions to somebody who runs off and works for two weeks and comes back with exactly what you were looking for. Or else that was so mundane of a project that there was no nuance that had to be teased out and understood around what the objectives are, what the goals are. Okay. Okay. I think you're touching on something that I have been thinking about all morning,

And I think I just got to a little bit of a... All right, let's do it. A little bit of clarity. So now you can muck it up once I say it. I don't think it's complexity of the task. Uh-huh. I think it's visualization of the output. Okay.

And when a lot of people have a conversation about an AI model and how it gets better, and especially if we're talking about something like, say, vibe coding, one of the ways that we talk about why it works for coding is because you have evals. You have an evaluation of output that's deterministic. You can figure out whether the code runs or not. Does it compile? Oh, that's supposed to be helpful.

I think that's a little bit of it, but it's actually not the broader point because that actually doesn't abstract out to everything. As a good example, then why does something like Midjourney work? How about random web pages where you don't even know if they run or not, but they just come up really fast? Those companies seem to be doing really well as well. I think it's proof of work. I think that the correlation is the parts of the economy where the AI can very quickly generate proof of work is the stuff that's working.

And the stuff where it can't generate proof of work easily is the stuff that's struggling a little more. And even inside of coding, this is true. And I'll explain what I mean. Okay. If I am coding a lovable webpage. Yep. Really fast. Yeah. Comes up in 90 seconds. Okay.

I have a visual display of all of the things that were done well. And I can look at it in a half a second. I can be like, this is broken. I didn't like that that thing was over there. Stop doing that, blah, blah, blah. I don't know how many hundreds of lines of codes were written or thousands or millions. But like in 15 seconds, I have proof of work of evaluation of all things that went into something. Right. You do the same thing on server side coding. Yeah. You don't trust it at all. Yeah.

It runs away for 10 minutes, it makes a bespoke database from scratch, it does a little... You know what you're doing? You're inspecting code for the next 20 minutes.

And there's no proof of work. There's no way of having a visual proof of the thing that I just did in a way that is evaluative by the person on the outside. This is also why by producing text output works really well. Hey, make this deep research report. The research report comes out. I glance at it really quickly and I'm like, no, this whole thing is too pedantic. Can you still be more harder on me and blah, blah, blah. And also find evidence. That's proof. Really quickly, I look at the thing. A visual evaluation. Yep. And so I think there's these two things

camps of, we just get in these loops of these conversations where it's, do we need new UIs or is chat going to be enough? Chat's not enough. There's going to be new ways to talk in the future to AI. And then the other side is, no, that's how you talk to humans. That's how you talk to the aliens. I think it's more if you are in a knowledge work economy in which the thing that's being output or produced is...

From the AI, not from you. So think output, not input. Yep. It's quick and visual in nature. Yep. And you can instruct it quickly through chat. Then great.

Okay. Probably that the lowest common denominator is going to work. Chat, we already know how to chat with people. I can text your friend, you can send me back an image. I can be like, "I don't like that image. It's not Ghibli enough," and then you can go to it again and bring it back to me. That works. That's the loop that all of AI art is in as well and all the tech stuff and some of the coding stuff when the coding stuff is making an artifact, when it's making a quick and easy artifact. Okay. If it worked as more abstract in nature, I think it will struggle for AI adoption.

no matter how good the models are until we may need to invent new visualization layers that help to show me the work that was done. I have a friend who's working on a way to do AI prompting of games. And so I can say, make a first person shooter or make a chess game or whatever. And

the output to a certain extent is visual. I can fire it up and I can see a thing. But the problem is, let's say it's made five levels of a game. Without me playing through the whole thing, there's no visualization layer, like a site map or a way. We don't have a ludology or an encyclopedia or a visual of explaining how Assassin's Creed or Fortnite actually plays. What is the feel of that game? We haven't yet invented some layer. And so the problem is then the feedback is incredibly slow.

I can't figure out that you made a decision on level two to make it almost near impossible. And by the way, the gun should have come in a little bit earlier, not later. And I don't like the recoil amount and like that, because that's not really fun because I suddenly lose track of who I was targeting and like a million things that go into something like building a game that we just don't have a visual language for to describe all that dense data. And so that means I have to play through it for 20 minutes every time. And I play for 20 minutes every time. Now, like my loop deck is slower. So I just think there's many pieces of work

that if we fast forward five, 10 years and we look back and we say, hey, when did the AI take off in that environment? It will not just be about model capability. It will be that we possibly invented a new way to visualize the work that was being done so that the AI could say, oh, here are the 35 decisions I made. And this, I'm looking back to, when do I have 500 agents going and doing things? When 500 agents have some way of giving me a report of,

Of the work that was done. Okay. That is coherent and makes sense. And that I can evaluate and say, I think you did these two of these 25 things wrong. Go fix those.

I think I started to get it at the end there. And then I just had a sip of my Diet Coke and so I'm getting focused. Let me see if I can say that back to you. First of all, if the output is, first of all, human interpretable, but also it has to be further than that. And it has to be like the adjudication of the quality of the work.

has to be readily available. I can see it. You'll read judgment. Yeah. Yeah. Pro-fluent outputs an amino acid sequence. Yes. Not human interpretable. Yes. Very hard for us to automate without going to the wet lab and everything else like that. The interesting thing is you do it in the wet lab, I'm sure that you then get the charts at the end of those tests that are human interpretable. That's right. So it says maybe that's the path toward automation there. Yeah. I see.

And so then back to your 500 agents and or when you want a mass number of median workers versus the exceptional worker, when you want a new UI, how to think about all that stuff is if the work product, you can look at it and you can just either have it in gender trust or you can give a feedback in more direction. That's the path. Yeah. Okay. Sure. I get it.

And in software engineering, you have- That's why front end tasks are perfect. Front end tasks, probably a bunch of different reasons as well. But the reason that you can put the model in and get good product out of it for automating code is for this reason. But then on backend, it's a bunch of different complexities as well. I'm just calling out that... I have this right now. I'm working on a website called...

something's messed up on the front end. I can see the problem I made. And then I have a, yesterday I had a login flow that got messed up and I've been trying very hard to not look at the code because I'm trying to like only, I'm only allowed to talk to you in prompts and so on for just for discipline purposes. Can we kind of get used to what does this feel like to use if I've never been a coder before? And I can't put a let's be figure out what's wrong with the login. Why is that? It's because we don't have some way of actually showing

the logic of what's going on yet when we use one of these code editors. Yeah. Like we can show the visual layer, but we can not show the logic layer. Okay. Very interesting. So my argument would be an AI researcher argument version of this conversation, which we have had before with some of these places is, oh, that's just like model capabilities. So all I just got to do, I said, wait for the model to get smarter, more evals, smart people solving bugs, and then the model gets better and then it doesn't have any work. And I was like, maybe, sure. But more importantly,

You can't show me a chart of this logic which would let me point out

where you made the mistake. - On front end, you run it locally, you deploy, you look at it, you tweak your code, you fix it, you go. - Yep. - On backend, there's like a whole process. It exists on front end as well, but like you have your tests, your unit tests, you have the concept of a pull request with code review. - Oh yeah. - Like you pull in the person who's an expert on these other systems so that they can do the code review. And the absence of that today,

I get it. That makes sense. Or put differently, we've gotten to this point where the AI models in CodeGen, some of the more advanced ones are like, they're like leaning towards mini PRD land. You're like, you talk to Replit and it's like, I think I'm going to do these five tasks. It's not even PRD, right? It's really just task list. I'm going to do these five things. I'm going to say that back to you and you feel better. Yeah. And maybe the right visual...

Feedback is to think of it like it should be generating a real PRD. It should be generating a PRD at the logic layer of what is going to be going in coding so that I can glance through that. And even not as a coder, if I'm at PM, I can read a good PRD and be like, dude, you don't logic test after that. That's dumb. We're not ready for that yet. Like that kind of thing.

Maybe the whole point is how do I manifest that to human readable language? And until we're there, if the human readable language proof of work doesn't exist, then that's the areas where you just have to invent it or add it to your product. And that will compensate for a lower model because then you're humans in the loop. Yes, I agree with all of that. And I feel vindicated by you because I think that everything that you just said is the most important set of problems from a product perspective to be working on over the next little bit.

Because I think the models are quite capable at doing an awful lot. Oh, this feeds back into your, listen, the models as today, we got 10 years of innovation. If the models don't even get better, we get 10 years of innovation. But the thing that needs to be innovated on is like getting feedback from the model and giving direction to the model. Yeah. And think about what you just said. There's a finite number of tasks in our lives that you get a human verifiable way

work product where you can adjudicate the quality of and give feedback immediately. Yeah. The path toward... And those are the ones, more importantly, it's not random that those are the ones that seem to be working well first. It's not just that the model is quote unquote good at that. That's right. It is that we are good at humans at evaluating. That's right. That's right. And so then everything that's outside of that zone should get lit up increasingly by current...

state of model. It's the role of designers too. That's where the designers comes in and say, look, here's a task you need to do in the world. We don't know how the model speaks to you about the task it just did. Yeah. And so like, that's... Go figure out what that proof of work is. That's it. And it will come from both ways. It will come from the model getting better understanding that it should come back for feedback or input. Yep. And then I don't, this goes full circle to our friend Noah's observation that there's three ways generally for a model to be able to do that.

I can't remember what they are. One is like to present the options. One is, we should ask him. Do we have notes on this? No. Well, let's see if we can suffer through it. Do you know what I'm referring to? No. Oh. It was that dinner that he was at most recently where he was saying that there's really only three ways to elicit feedback from the users for the model. Yeah. And I very much agree with him at the time. Yeah. It sounded good.

It sounded right. That guy, just if you want to know what's happening two years from now, we should just ask him. Yeah. The model can just ask for direction, free form. Yes. The model can present the set of options that are in front of it. And there was a third one that was profound, but I can't remember. This is it. I found it. Where did you... AI searching works. Okay.

Noah said that the model can interact with users in three ways. This is plod. Okay. One is making good assumptions on behalf of the user and hiding complexity. Two is coming back and asking the user for input. And three is presenting what the model is about to do and allowing the user to make changes with smart defaults and a rich UI that is ready to modify those defaults at a moment's notice.

I think that's right. Three broad categories, and you can imagine a very vast product space within each of those. Yep. I think that's the most important thing. Yeah. Is innovation on that axis. And it's not surprising because to back to your comment about being in Slack and giving direction to humans is that is what a great collaboration has. Yeah. You dictate something for me to go and do.

I run off and I make sensible assumptions. And if it turns out that I show you work product where the assumptions were brutal and I spent two weeks on it, that's horrible. But if I make sensible assumptions and I show you something that's like directionally what you wanted, that's great. If I come back and say, I'm stuck, I don't wait two weeks. I come back in an hour and I say, I'm stuck.

Here is the situation summarized. Here are the three options in front of us. And here are the perceived pros and cons of each of those. What do we do? Those are the right feels. And yet I feel like most of the founders that I've been interacting with, and frankly, most of the way I evaluate companies has not been even asking the question about whether that product is doing a good job of when to do bucket one, bucket two, and bucket three.

I think there's very few products that even do any of that. Deep research in OpenAI does it a little.

But if then statement, it always asks for a little feedback. It always does. That's not smart. It's not a great product experience, but it's the first step towards something. I've used one of those like long horizon agents that we don't need to name where I gave it a job and it spent six hours. I don't know, like 45 minutes grinding on something. And it came back and it showed me the output in like the second step

there was a flaw in its assumption and it was all wasted. Which also goes back to visualizing the work. Yeah, absolutely. Yeah. You said when you were running away there for a second, you said on the prior topic, you said that does it have proof of work has implications for our day job. What did you mean by that?

It's kind of like what you were just saying around, I haven't been asking these questions. I assume that this is a solvable design space from a product design perspective rather than just a model perspective. And so then more to the point is, what do I mean is,

We just mentioned that the products that we've seen get traction have been ones where the verifiable proof of work feedback loop was present without having to innovate on the feedback mechanism. There is going to be things that are one step outside of that, that with very simple innovation on the feedback between the user and the model or the user and the agent, you have

there will be domains that become solvable. That's right. With today's models. That's right. And you also, similarly, if you are trying to work with the team, it's also casting properly for the problem. It's understanding like, oh, do we need another prompt engineer or do we need another AI model builder? And it's like, no, actually, you need somebody who's going to be thinking about proof of work problems. Like, how do you visualize the decisions that the model made back to the person? The technical will be like some kind of like

mechanistic interpretability, but I think we were also talking about the application layer. 100%. Which is just like make a PRD. It's both. Hopefully, we'll see innovation on this at the model layer. Yep. But it's totally at the application layer as well. Yep. At this point, what is the clarity that you require to be able to go and do the next meaty piece of work? Got it. Next question. Okay. What are the second...

secondary and tertiary implications of the idea that these models now have long-term memory, AKA ChatGPT releases memory. The first implication is, oh, lock-in, right? Like that was my first, like, oh, maybe I should chat with ChatGPT more than I bounce around to 15 other different chat prompts because now I know that I can ask it questions about how I thought about things over long periods of time and blah, blah, blah. And I want that memory in one place. Sure.

I somehow don't think that's how it's all going to shake out. So then what are the long-term implications of the idea that these are going to have very long contextual memories for what we put into them? If anything. Did you have the same emotional response with ChatGPT when it said we have memory and you're like, tell me more about myself and it grabbed everything from the last, whatever it is for you, probably longer than most people. I didn't, no. No, you didn't? Did you give it all the little prompts they put on Twitter that were like, tell me about myself? Oh, sure. Sure. Yeah. But it had no implication for you? Uh...

No, I mean, it's interesting. It's revealing. It's frightening. Didn't change your behavior at all. You went right back, sent your next text into Claude or perplexity or whatever, bounced around. No, I've been wrestling with, I think, probably like a second order effect of all of this. And that is, am I going to make the decision myself or have it thrust upon me by my employer into a world where I have a work chat GPT and a home chat GPT?

Yeah. Like a consumer in an enterprise version. I don't even think you have to get into all the normal work home profile context to just simply say, do you want two different sets of memories? Yeah, that's right. Do you effectively want AI severance? Yeah. Yeah, that's right. Yeah. Do you want your primary handle on Reddit to be aware of your non-Reddit handle? Yeah.

To also be aware of your home email account and also be aware of your texts with your wife. Yeah. I do. You do. I do. Work as well, everything. I think if I worked in a place, we work in a weird place. We have lots of agency about the IT that gets involved here. But if we worked in a place where, here's what I think is going to happen. I think a whole bunch of

founders are going to make the horrible mistake that there's a split between work and home for very similar products. So there's a world where a product I need at work is different than the one I want at home, even though they feel similar, right? So the spreadsheet I use

at home and work are different. So maybe there's two spreadsheet products analogy loosely. But if we play out the work home analogy in reality for most corporate software today, the most ways I interact with software, email is basically the same in both spreadsheets, same in both PowerPoint. People use that for both. Most of the things that you use to make a blur of the line, the only times that they don't is when

a corporate mandate security, we won't let you happens. And I'm not saying that won't happen. It's happening right now. I have a buddy at Amazon, like seniors at Amazon. We were just talking the other day about all of the rigmaroles that he goes through in order to be able to try to stay on the cutting edge of AI. And that's not easy. But I look at human behavior and human behavior is, first of all, doesn't care about privacy. And second of all, finds ways around it if there's a consumer benefit. And

In order for an AI to be incredibly the most helpful thing it can be to me, it needs context. And it needs context about my wife and my work and my hobbies and blah, blah, blah. In order for me to, for instance, at the end of the year, be like, hey, Jimmy, my AI friend, can we talk about next year's goals?

I think as we evolve these creatures around us, we are ostensibly in a context war. That's what all of this boils down to. The same way that there were previous social wars where it's like, who's going to get your friend graphed? And then just after that was the kind of like data war, who's going to get all the data? Yeah. I think this is different than a data war in that it's not all the data on the internet, although we're certainly playing a little bit of that war too. This is the value to you of an AI truly understanding you.

The user benefit of all of that is too high for people to not break whatever corporate rules are put in place, is my contention. In three years' time, is IT and security and privacy going to be able to monitor all of the information that gets sent to these? No. I don't believe that. Look, there's going to be total lockdown procedures.

Yeah. Where I work in a government job and I wasn't appointed by Trump. So I actually have to obey all security procedures and I have to use whatever software happens to be there. But by the way, there was a time period where the government decided they know was allowed to use cloud software. What happens? Oh, there was a time period where you definitely not allowed to use Gmail or Google Cloud. That's insane. Those people like make search engines and stuff.

And guess what happened? The benefit to the user was high enough over time that all that stuff crumbled. And so if you fast forward, we're saying different things here. I think there's going to be a user benefit that for home and work context to be blurred because we are one human. And I want this thing to understand all the context of everything that's happened in my life because it will be more useful to me.

And that will override any one job. Because more importantly, I want to know about my last job. Like I've had four jobs in 10 years or something like that. And I'm trying to get advice from this thing about what my next job should be. The best way it can give me advice is if it knew everything I did all those jobs. I can't imagine. I'm trying my best. I'm trying my best. I can't imagine...

a world where that happens within a corporate environment. Don't get me wrong. I think there'll be a couple of wonderful enterprise lockdown security, boring focused software companies that are just there to give you a gimped version. They will rise to a billion dollars and then they will die.

Because over time, I think, except for maybe government or really law, there's a couple of very, very small areas where you really have to. But otherwise, outside of that, I think in general corporate environments, the benefits will affect. Because again, you said, can an IT person lock it down? At this point, we're five years from now, I'm wearing AI smart glasses that see everything in my life. And streams it to IT? No, streams it to me. I bought the classes, man. Okay, okay. Because it's beneficial to me.

Okay. You're not logging in with your Procter & Gamble account into this environment then? I am on my computer, but I'm still wearing my glasses. But who's getting the...

I'm so confused right now. I somehow avoided our IT onboarding. Yes, you did. So did I. Yeah. And I got a note- Don't touch my computer. Oh, well, I got a note that said you have to get software that allows us to monitor it. So I added that last Friday. Oh, yeah. They should have done that. I've learned my lesson because I got an email today that said you have a virus, an application with a virus on your computer and it's sitting in your trash can. Mm-hmm.

First of all, I was like, I don't think I do. My second response was, how on earth do you know that it's sitting in my trash can? I felt so uncomfortable. You felt violated. I felt violated. It turned out it's a startup that agentically takes over your computer. I can understand why they thought it was a virus. It's an agent. It's an agent that you cede all control to your... Yeah, I do that every two weeks. It reminded me

That I'm now in an environment where somebody is watching absolutely everything that I do. Yeah. We certainly don't have that on our quad accounts. And I ask quad for things with my family, like help me work through this complicated situation. And all of a sudden, I'm going to have to start thinking about, is this the one that I want to put into this profile? I very much agree that people give up privacy all the time for consumer benefit. I'm coming off more determined now.

than I really am for the purposes of discussion. Yeah, of course. I understand that even today I log into Google and I have to switch between my Gmail work account, my Gmail personal account, and I keep my Google photos in my personal account and they're separated from work account. I'm not saying there might not be two memories. Okay. Or frankly, there might be like 50 because I'm logged into AI agents that are doing all kinds of different things in my life. Yeah. I suspect that

That AI does so much with more context that I can't imagine personally living in a world, once these things are smarter, that I don't have an AI that has all of the context. And I suspect, much like a lot of the privacy arguments, that as long as that benefit accrues to a user and a user can see that this thing that knows me and really knows me, they will do whatever they have to do in order to make sure that thing can really know you.

And maybe that becomes an IT tete-a-tete war. Right. I'm in- Yeah. But there's supply and demand. Some founder makes a startup that gets around the IT's weird thing and installs itself so he can get contacts because the consumer can then ask it personal questions and it knows what it's working, personal life is like, or maybe it's, again, an external device that gets attached to your body and trying to pat you down every time you walk in to go work at Fidelity or whatever it is. There's a lot of people who carry around two phones. Yes. Yes.

Sure. Like when you soften it like that, sure. Sure. When you back down. Sure. Not back down. Like you're being bombastic for driving home the point and stimulating conversation. I think we are seeing that these are exceptionally helpful in the work environment. Yes. And I think that we are seeing that they are exceptionally helpful in a personal environment. And personal environments are one where we are

Like a lot of people are lonely. A lot of people are isolated. A lot of people are dealing with all sorts of personal struggles that they are turning to these things for. And I can't imagine most people doing that on a work account. Last question from me. Will you, it will not sound related, but my brain's related. Would you stop using Windsurf now that it was acquired by OpenAI?

And use Cursor. Presuming that next week, Windsurf only uses OpenAI models. I downloaded Windsurf. So this is not the question that you're asking, but I downloaded Windsurf and it is so clearly still a hardcore engineer product that I deleted Windsurf. And I went to Replit and Lovable and WebSim. There's just no need in my life for that type of product today. Yep. But to your question, I don't care about that.

You could answer the same question by the, if Replit was bought by OpenAI. Yeah. No, I don't care. You don't care.

I don't care. No, what I've been wrestling with is undoubtedly there's been for these pro users, like very early adopter pro users, there's been value in being able to switch between models. And you even see that from Cursor. They're like, "This model's now available in the product, but we don't recommend it as the default." Things are marginally better tit for tat week over week. It's an amazing time if you love that type of stuff. I think there's going to be a really interesting question as to whether the vertical integration of adapting the model

to the UI and the job to be done is better for the end user versus the benefit of being able to like tit for tat taking the best model of the week.

Yeah, I think we're talking past each other a little bit. We do that a lot. But this helps though, because I don't disagree with you. I use Replit still regularly. I probably spent $40 on Replit literally this week. Talk about a great pricing model. They're like, what if we just increase the token price by 10x? People will still play. And there's no model exposure there in the primary agent interface. We have some sense of what they're using. Like they're not telling us. And so they could switch models back and forth and I wouldn't know. It just doesn't get the job done or not.

So I agree with your like, hey, the users, which I really always love your, I just love your orientation always to this, which is just to think about how a consumer is trying to get a job done as simply as possible. And if you serve that need, then we don't have to be tweaky about the whole thing. And that is generally your North Star. And it's usually right. Whereas I just am a control freak. But in the case of Windsurf,

I specifically have information that would make me change behavior, which is that like I do not like opening eyes models for coding.

In the words of Anthropica bought them, I'd be like, "Oh." Is the difference 10%? Is it 10% better or worse? It is certainly 10% better or worse. It's not a thousand percent better or worse, but also the difference between Windsurf and Cursor is less than 10%. Like they are in a Tetetet war. They're very close. Every time somebody launches something that's interesting, the other guys run at it and get it out soon afterwards. And so I used to bop in between them fairly regularly.

And so I would say something a little bit more expansive, which is like in a Red Ocean situation, which certainly forks of VS code are. In a Red Ocean situation, you're not just communicating to your consumer that you are the best in an even week. What you're trying to communicate to a consumer is that you will be with them for the next 20 bests. Yeah. Yeah. And that's the break that just happens. Yeah. Like I don't even care if they show me the model.

But if Cursor hid the model, fine. I just want to make sure I have access to the best model because I know life's going to change a lot in the next 18 months. This is another good reason why in a very red ocean market,

I take some of my opposite views on polish. I normally think you want to like really something that are super polished and super great and just do the right thing by the user and make sure there's no errors. But when in the very red ocean market, you want to be on the opposite side of the coin. And not because the thing wasn't slightly buggy and blah, blah, blah, but because you were trying to communicate to a user that they should be committed to an event. Nobody wants to change software every week.

And so what they're trying to do in a red ocean market is figure out who's going to be there for the next 35 features. And so being a little bit on the edge and launching the thing first or very fast follow right afterwards, look, there's some empty calories in there that feel like waste. It feels frenetic. But guess what? Like you're a founder, you bought into a red ocean market. Like this is the basket that comes with that fight. And the worry I have for windsurf now is,

Kudos to the founders. And I actually thought the product is better than Cursor. So I have been a Windsurf customer. But my problem is that my trajectory of what I expect, what wars they're willing to go into or not go into over the next year just changed. They are not willing to change from OpenAI models would be my guess. And I want a company that is willing to change anything.

Which is an interesting way of thinking about... Of course, I don't bring this up because of Winstriper Cursor. I bring it up because I think this is true for lots of categories of things that are competing at the AI layer, application layer. I think what you're saying is there's basic parity between the UI layer of those products. Yes.

And you as an end user can absolutely absorb the value of the model improvements. The model that is best is changing so fast. Regularly. And you still benefit from all of that improvement. Yep. Yeah. Sure. Another way of saying them being part of OpenAI means they hamstrung a whole particular area of possible product improvement. Yeah.

It's not about me controlling which model I can switch between. It's that model switches are a feature, not a bug. We shall see. Because there's a world where having the ability to adjust your model for the UI that you're serving it in...

might actually lead to a better product. The promises of vertical integration. Yeah, yeah. We'll see. We'll see. That is the counter. That is the Claude Code playbook. That will hopefully also manifest itself over time as Claude Code becomes what it's become. I am very interested for that war. Yep. That's the most optimistic story I've heard yet for why a horizontal agent that's not from any of the labs could still become quite important.

Yeah. Because you might have your 500 P50s, your 500 median workers, and your one superlative agent all working in the same platform. And those 500 could be from Gemini. And that one could be 06 from OpenAI. And you would want both of those. You know what we don't advise our startups to do that we should is exactly what you just said.

Which is if you're trying to make a case for a product, an application product you have out in the world that can talk to multiple models, and if you're trying to make a case to them as to why you should not use Chat2PT for that task, then...

It doesn't mean you need to have a dropdown that says, would you like to use Gemini for this task or Crop for this task? That's a choice that no one actually really cares about. That's right. But saying, hey, we have found that all the math should go to this model and all the deep research should go to this model. And by the way, we're dynamically changing those every day depending on how these models evolve. Every three weeks it changes. And so in the interface that is apparent to a user so that you –

That is a new product feature that you now can believe in this product for. Oh, you make good model choices for me. Yep. Yeah. That's like a product decision. Yeah. You make good model choices for me and the freedom to use the best model. And you have, yes, you have the independence to keep doing that over time as an exchange. Yeah. We've asked that to a lot of different founders and nobody's given me that answer. Yeah. Why is that?

I think the only answers you've gotten is people make models switching a user capability. So things like Poe, where you can ask five models the same question to see what you get. The culmination of this rambling conversation coming to this point makes me very much believe that's true. What's an example of a company that would benefit from surfacing to you? First of all, making good model choices for you. So one query or sets of queries go to sets of different models. And then two would benefit from surfacing that to you.

Surfacing it to you? Saying you want trust. The trust comes from saying, oh, that's an interesting query. That's mathy. We're going to go here for that and that kind of thing. I think in the fullness of time, users won't even care because they'll just be able to feel. If there really is differentiated value in different models, you'll just feel it. I think illicit should do this. So yes. Yeah, they are.

No, I think the elicit should surface to a customer- I see. ... that they did great model routing, that model routing was a feature- I see. ... and they're doing a great job of it so that then I trust them to make better model routing choices over time. I don't know why Perplexity doesn't do this. Perplexity has handcrafted I can switch. Yep. And it has an auto button. Yeah. But it doesn't have a router button. No. Ooh, this seems like a deep research question. Yep.

Do you want me to ask this for deep research? Going back to your question of who should do it, the broad agents that are doing long horizon tasks absolutely should. Oh, Manus. Manus, Cognition and Devon. Everybody who's routing multiple tasks across what I have to assume are multiple models that change consistently should do it.

Yeah. And I know the internal debate is, do users care? I think they do. Because I think you want to work with a good model. The models change. The idea that you can not have to read one more tweet about how Gemini 3.1b and 0397 is better at blah, blah, blah. And I'm trying to remember to make sure I write it for that. Yep.

I think that's great. I very much agree. That's the bull case for a broad horizontal agent that is not from the large lab. Yeah, that's right. I have a question for you. Yeah, a venture question. I've been doing this now for two years. Jesus. Which is amazing. I now have had some investments that are raising their next round. When would we as the existing investors want to extend an offer ahead of them going to market? When do VCs preempt?

When do VCs preempt? I'm not a huge fan of preemption. It's becoming a more common thing in the market because people want to get ahead of things. I think generally preemption is obviously to the benefit of the firm more than it is to the founder. And the pitch is obviously taking the time to get to do less work and so forth. But obviously you're giving some investor a better price for your lack of desire to go out to the market.

And I generally just try and think about what's best for the company long term. And if you do that, things generally work out. The two times you do preemption, the first one that often happens is you do preemption because you're worried about a round coming together. And you are not really preempting the round. You're pricing the round so a round can happen. So there's a $40 million round. It's a $40 million round that's going to have two to three players that are involved in it.

You don't know if the market's going to price it really well. And so often the conversation internally that a founder will bring up to a VC, if the founder's savvy or pro or being coached smartly by another VC, is like, hey, you're pro-rata and this was going to be $10 million anyway. What about if we take that plus my two buddies and one strategic and we've already at 20 of the 40? Why don't we call that around and then I can go out and raise the rest and that'll maybe help catalyze things?

That's the first time that quote unquote preemption happens. More often than not, that's the case. Every VC knows that and sees through that move, by the way. So there's no like, oh, my insiders are super excited. And then you could just, you've done this long enough. You look at it for two seconds, you understand exactly what's happening. So it doesn't. I think especially with the second one, if I understand where this is going, because what's the second time when you would preempt?

Well, the second time when you have parameters that you literally are just trying to get more of this company than you have, and you're trying to get it at a better price than you think the market is willing to give. And so you're saying, well, he could price this at 100 if he goes out, but internally, I would rather this be at 80. So why don't I try and write a check? Obviously, it doesn't benefit the company. Right. They're giving up more dilution. It quote unquote saves the company time. But I tend to think of actually going out to market and fundraising. Yeah.

at least every 18 months as a good forcing mechanism to test yourself against the market. A big reason startups are more effective than corporate R&D is that you don't get to go hang out in a lab for five years and never test your ideas with the world. And sometimes we don't like the results of those tests. And sometimes we think these stupid VCs, I often think these stupid VCs don't think long-term Verizon enough. I can complain as much as about it, but it is better than every other market mechanism that's been

invented it. It's like the Winston Churchill line to kind of like democracy is the worst form of government except for all the others. Look, we know that, for instance, innovation happening inside of R&D labs, way worse track record. Way worse. Government, way worse.

Each model for innovating does contribute things, but for startups, you have to get market signal. Signal. It's like an incredibly good, it's iron sharpens iron. It's hard and it's stressful, but I don't think it's a distraction. I think it's good. So I think you're cheating the founder a little bit if you do it. Sometimes founders don't do the math, Fraser. It often doesn't benefit us. If we were the largest investor in the Series A-

then what's the problem? The problem is that if we're writing a $20 million check where we're the largest investor in the Series A, then who are we diluting? We're mostly diluting ourselves. Like the amount you'd have to invest of the round

in order to actually increase your ownership is incredibly large. So when does that happen that you want to do that? Basically, if you have more money than you know what to do with. One of the factors of this new VC meets private equity class with assets under management

is your product and so you want to have 10, 20, $40 billion funds is that you really just trying to find a way to deploy cash, not for great returns, but just for average returns. So I'm okay just dollar cost averaging into you, private equity style investing versus venture investing. But we have to admit that lots of large funds

are basically doing that now. Again, I don't think that benefits the founders, even though it's sold as something that benefits the founders. The times when a VC actually wants to put the $100 million to work are always the times when you will go out and raise and want to raise anyway, because you can raise $100 million, you will go do it. Because you get a better price than the insider is going to give you anyway. So that's why the really big fund shouldn't be

At least I believe that. And that's how we raise the fund sizes that we raised. So that's why we don't preempt that much. It's the same reason we have relatively simple term sheets and we don't really negotiate a bunch of little tiny little things on term sheets. It's like this is a job that is very easy to describe and

And very hard to execute. Yeah. Yeah, yeah. And usually trying to whittle around the edges of innovation on the core of it, which is just like just invest in good early stage companies, help them and then help them raise more capital. Be aligned. Be aligned is like the right way to do it. Yeah. I spoke to our friend that we've discussed recently. Yeah. And I spoke to him yesterday and came back today.

And he was like, yeah, it's not just about me learning how to do that and getting good at that. I think it's the right thing for the business for these fairly nuanced reasons. Yeah. What were his nuanced reasons? So it's going to be a capital intensive business. Yep. Having a mature set of investors around the table who can continue to help with that. Yep. Signal to the market. Yep. I think he looked at it through the lens of long-term, what is best for the business. Yeah. And there's just a bunch of different things that went after that.

There is the third reason why people do inside rounds. The first one is the soft, "I'm worried about my founder being able to raise." The second is, "I want to buy up more ownership, mostly because I have too much money to deploy and I can take advantage as a VC." The third reason that is sometimes common is that seed investors do it because it's not from their fund.

they're forming an SPV. And so for them, quote unquote, a preempt isn't really a preempt. It's just more free money. And so different set of math that honestly sometimes can be beneficial for a founder to take. I'm not speaking on Spark's behalf. We don't do that kind of thing. We don't do SPVs. But like from a founder standpoint, especially if you're worried about going out to market and you just want the capital, then those can be okay. There are a bunch of reasons that founders have found why SPVs suck and are

Not great, but we don't have to worry about that right now. And turning this podcast into an SVB podcast. Maybe some other time. Cool. All right. Should we be done? We should be done. Let's do it. Thank you. Cool. Take care. Bye-bye. See you next time.

AI Severance: One Memory or Many? 01:01:02 Share

Hallway Chat

Deep Dive

Shownotes Transcript

AI Severance: One Memory or Many?