We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Distilling Lessons from AI in 2024

2024/12/26

Hallway Chat

AI Deep Dive AI Chapters Transcript

People

Fraser

Nabeel

Topics

Fraser: 我认为2024年AI领域最大的收获是AI工作流程界面的清晰化。它包含三个面板：上下文面板、聊天窗口和模型渲染结果面板。这种三面板设计现在已经成为主流，并在ChatGPT、Cursor、Windsurf和Notebook LM等产品中得到应用。此外，邮件和信息中的AI摘要功能虽然好用，但也存在一些局限性，例如在处理少量文本时效果不佳。我认为Google Gemini的Deep Research工具虽然展现了未来研究工具的潜力，但目前还不是一个成熟的产品，其输出结果过于技术化，缺乏人性化的表达，界面设计也有待改进。总的来说，2024年AI产品在功能和界面方面都有了显著的改进，但仍需进一步迭代才能更好地满足用户需求。 Nabeel: 我认为2023年大型公司纷纷发布AI产品，但这些产品的市场渗透率并不理想。2023年还出现了一些‘准产品’，经过一年多的发展，这些产品在2024年才真正实现产品化，例如基于agent的推理模型。 Google Gemini的Deep Research工具是一个很有潜力的产品，它能够进行深度网络搜索并整合信息，但其输出结果的格式和用户交互方式有待改进。在对2024年Hallway Chat播客内容进行总结时，我发现Claude模型的表现最佳，其总结准确、深入，并能抓住关键信息。而ChatGPT和O1模型的总结则过于笼统或缺乏深度。我认为2024年的关键词是‘品味’，AI产品的成功不仅取决于技术，更取决于用户体验和产品设计。在数据驱动时代，难以量化的产品特性更具竞争优势。桌面实时流媒体技术和O1模型展现了巨大的潜力，但目前仍处于早期阶段，需要进一步迭代才能成为成熟的产品。O1模型更具有真实性和主见性，这与其他过于讨好的模型形成对比。 WebSim工具展现了软件自我表达的潜力，虽然实用性有限，但具有启发意义。许多AI产品并非全新的发明，而是对现有产品的技术框架进行重新构建。 AI时代的信息获取方式需要从‘大众智慧’转向‘专家智慧’，但目前这种转变的方式不够优雅和可持续。总的来说，2024年AI产品发展迅速，但仍有很多挑战需要克服，例如如何激励用户参与信息生成，如何改进模型的语气和表达，如何设计更人性化的用户界面等。

Deep Dive

Chapters

This chapter reflects on the AI product releases of 2024, analyzing whether they lived up to the hype and identifying trends such as the rise of agentic AI and the evolution of AI workflow interfaces. The discussion also includes an evaluation of Google Gemini's Deep Research tool and the role of 'taste' in creating successful AI products.

AI workflow interfaces came into focus in 2024, characterized by a three-panel design (context, chat, artifact).
Agentic AI, initially slow to develop, saw productization in 2024.
Google Gemini's Deep Research tool offered a glimpse into the future of research synthesis but needed product iteration.

Shownotes Transcript

Translations:

中文

I would say my biggest takeaway from this year is like this is the year where the AI workflow interface kind of came into focus. For me, it's this three panel interface where you have a dropbox of context

where you can now both chat GPT and Cloud and others, you're like, you can pull in a Word doc, you can collect it to Google Drive, whatever. It's like, this is where you get context for me to talk. And then there's the chat window itself, which is still a user interface du jour of the way that I interact with this model. And then there's the template, the tablet or the play space, the Cloud artifact area, where the model is now rendering something for you.

The thing that I love, it might be one of my most beloved AI features of 2024, is the summaries that occur in mail and in messages. It is so good. I don't know.

I don't know, man. Like, I got it. It's just bad. I disagree. I respectfully disagree. I do use the messages summary feature. I find that it's not great. It fits the amount of characters perfectly, and it gives you the gist of what's been discussed in these emails. It's delightful, and you're looking at me like I'm a maniac.

Hey, everybody. I'm Nabil. Welcome to Hallway Chat. Welcome to Hallway Chat. It's Fraser. Welcome back. If there's any, this is take two because we forgot to hit record and got five minutes into a wonderful discussion that we will try to impromptu play back now because, as you know, these are not the most scripted things on the planet.

It is year end. We wanted to kind of reflect on what has been a crazy month of releases in AI from a bunch of different parties and try to figure out how to make sense of it is maybe the summary of what we wanted to discuss today. Is that fair? I'd pop it up a level. You know, when we had a conversation about this, the December releases are interesting. There's been an unbelievable number of things coming in with the last couple of weeks that

I think I'm more interested less in hot takes for the week because neither of us are trying to turn this into some situation where we hop on a Zoom and have a conversation on headlines. Right. Here's the news. Here's how we feel. I think it's a good opportunity to reflect back on 2024. And maybe in the spirit of collective learning, we can try and, you know, like just...

What are the takeaways in how AI products evolve this year that maybe we could, you know, write on some sticky notes and put on the wall next to us as we go into thinking about what we're working on next year? Sounds good. I got started by looking back at 23. So I spoke to Chachi PT, Claude, Gemini, and

You know, all the models. I feel like I have five therapists around me at all times giving me conflicting information. And I asked them what happened in 23 in AI as a way of kind of thinking about the trajectory we've been on. I often forget how quickly this is all moving.

And obviously in 2023, a big one was Sam Altman was fired briefly from OpenAI in December of 23. But other than that, 23 felt like when you look back on it, the year that all of the big companies released something. So that was...

Adobe Firefly, Canva Create, Spotify DJ, Bard, Snapchat AI, like literally like big company into the chat. They just all came in with some product, which as we talked about a couple of weeks ago, if you reflect back on those launches now,

I think everyone was quaking in their boots that the incumbents are going to win. And if you fast forward a year, like I would imagine that most of those teams internally and we've heard some of this are kind of like disappointed with the penetration of those products. It didn't really work. And then

The kind of second thing I noticed when I looked through a bunch of that stuff was there were a couple of things that kind of quote unquote hit the market that were really not yet products yet. And we got this little glimmer of the future, but it was going to take time. And that was the summer of agentic stuff is a good example. And we saw some very early voice stuff. And then if you fast forward a year, I think that's when you get,

you know, now agentic reasoning with a little bit of what O1 is doing. It's a little bit of what companies like Cognition and Encoding, you're seeing a lot of like really seeing agentic work actually execute and do very interesting things. Not across every product category and not across every vertical, but it feels like it's finally been productized. And that took, let's remind us, like that took a year, right? Year, year and a half. Yeah. Yeah. Pretty wild. Which isn't to be surprised, right? Like the

That summer, the models were a number of generations earlier than where they are today. All of the tooling to stand these things up hadn't really been built yet for what you're supposed to do. And we hadn't gone and plumbed the surface to figure out

what use cases work well and which ones don't. And we've made tremendous progress on all three of those vectors. So with that in mind, like what are the kinds of things that you think launched in 24 that were a glimmer of the things that you think might be great products next year, but maybe weren't quite there yet?

I played around at length with deep research. I don't know how to describe it. And if you try to describe how it fits into their product, well, I'll just get lost. It is from Google. I think it's from the Gemini team. I think it's just a brand new model that has been trained to do extensive research.

web research, and there's a UI within Gemini.Google.com. I think that allows you to use it. I think because it's freaking hard to find. It is. I think I had to ask you for help to find it the other day and I'd already found it once beforehand. Yeah. Actually, you got to go to Gemini.Google. And then I think you have to drop down from what looks like the Gemini logo to switch it to the Gemini Deep Research 1.5 model or something like that.

It is undoubtedly a glimpse of the future. It is not necessarily a product today, but my guess is in six to 12 months, there's going to be a lot of different product experiences that are providing benefits

this type of value. And so what is it? The name is literal. It is deep research. You do a search, it goes and it combs the web. And in some cases it was finding like 86 different sources. And then it synthesizes those sources based on the question that you asked. And then it generates, I don't know, like a long report. So long that one of the top level features is to open it up in Google docs so that you have like your traditional reading and editing experience for it. It's

awesome from a research perspective, right? Like the idea that it breaks down the line of inquiry that it has to go to the web to search for, and then it finds all of these sources. And along the way, actually, the product experience is actually pretty nice. Like it says, here's my plan. Do you agree? Do you want to edit it? Do you want to like muck around with it? The plan portion is a perfect example of the

the product leap from 23 to 24. No product manager would have launched that in 23, but we're in this kind of like agentic show your work 24. And so of course it shows you the plan and then it gives you a chance to edit. That's a very good point.

That's right. And then, you know, for another kind of theme from the past little bit is it's a UI that handles latency measured in minutes or tens of minutes rather than, you know, hundreds of milliseconds because it's actually going out and doing work on your behalf. And then it comes back and it gives you the report. It feels like I'm looking at really compelling research wrapped with a fairly simple UI around it. And it feels like we're mostly looking at

interesting research coming from the Gemini team and it will get refined at the model layer. And in a year, year and a half, a lot of people will be having these like really thorough, effortful research and synthesis, you know, flows within products.

but it doesn't feel like it's there today. You know what it feels like to me? Recently, when you were talking about Notebook LM and then the podcast feature, you were saying how it works because they've made a lot of great product decisions within the model itself and how it generates the audio. Like there's two hosts that have some interplay and there's, you know, the mannerisms. It's missing that. It feels cold and technical. It feels like, I don't know, it feels like an LLM has written a very basic script

technical report on the question that I asked. In general, I think that's my biggest problem with Gemini overall. One is I think the Gemini models, which just came out and updated, are...

undoubtedly really well performing. Like I just a huge credit to Google at kind of, you know, there's, there's several headlines about Google roaring back and stuff like that, that are going on right now in the space. And I, I think all of that is fair, especially for API work. They still don't have the tone, right? The tone has a Nike like 2022, 2023 feeling this kind of like antiseptic at this point, almost like grading kind of tone. Um,

That is unfortunate. Claude's obviously, I think, the best at that thing. But even ChachiPT over the last year has gotten, you can feel it, it's gotten better and better at it. It's not all the way there, but there's clearly somebody in there that's put a little bit of time and effort in trying to get that right. And Gemini seems very far behind when it comes to that. And so that's the first bit is just its writing style. The second thing, though, is I think you're right. I don't know that the output

of this deep research is something that's supposed to look exactly like a generic web word document research report. There's something about the format and interplay between you and the document that just needs some product iteration. And that's right. I don't know what it is. I'm thinking back to some of our conversations over this year with

folks like a little bit of like the meter situation like is there more of a malleable interface version of this idea where i'm playing with artifacts i'm playing with it as a dashboard almost of these ideas um or is it a situation where it's doing all this really deep research but then it's almost or is it like augmenting me while i kind of write or flow or speak the way that say

I don't know, the guys at Granola would have designed this interface more as like a silent co-pilot in the background that's like making you smarter and filling things in versus trying to yell over you and write the paper for you. Like, it feels like the research task is really great. But yeah, the product instantiation of it is probably another cycle away from really hitting something that is mass market.

It's interesting, right? Because there's two product problems that have now been smudged into one, right? Like one is the agent is going across the web to do the laborious research and synthesis for you. Right. And I'm pretty particular, like.

I want to have a lot of control over what I write. Yeah. And you don't get the deep research and synthesis right now without the report written in like the style of however they've set it up at the same time. Right. And so you can imagine, as you said, like there is likely going to be a lot of good product work and discovery to be done. Maybe the output of that

Deep research is literally just the research rather than then trying to put it into the digestible report, which is the final step. You know, that's right. By the way, I asked, I had a funny, you don't know this, but I fed in the transcripts of all of our hallway chats for the year. And I also had all these models try and tell us what our own review of the year was.

Okay. Which I'm going to get to in a second. But I would say my biggest takeaway from this year, if I'm trying to think about it, is like this is the year where the AI workflow interface kind of came into focus. For me, it's this three panel interface where you have a Dropbox of context.

where you can now both chat GPT and Claude and others, you're like, you can pull in a word doc, you can collect it to Google Drive, whatever. It's like, this is where you get context for me to talk. And instead of just me being in a raw chat, and then there's the chat window itself, which is still the user interface du jour of the way that I interact with this model.

And then there's the template, the tablet or the play space, the cloud artifact area where the model is now rendering something for you. And this kind of three paneled like Dropbox chat and then artifact is

I now see that pattern. It's in ChatGPT. It's been now fully adopted, but it's also in stuff like Cursor and Windsurf. And if you look at Notebook LM, like I think it's the trend. And maybe that's the way of surmising the deep research product. The deep research product is doing something at the model layer. It's doing work. Great. And now at the very least, you should surf it up into an interface where you

That's just providing stuff to the context window. That's just the left panel. It's just throwing a bunch of interesting stuff into the context window. And then by the way, let me drag any documents I want over there and then let me chat in the center.

And then on the right hand side, now we can build a paper together. Now we can build this thing together. I would at least start there. Now, this is just, you know, two VCs opining on the product. Like all of the nuance and wonder and joy of product comes from the doing. So that would be the, I'm sure you do that and you get it up there and you'd find the 15 things wrong with it that you'd have to iterate. But at least if I'm building a product in 2021,

5, January 1st, I would at least start with that as my palette and then go. Yeah, well, yeah, I mean, it is always good to dunk on ourselves, but there's another way to take about it, which is like more kind. And that is, I think you and I can have a user-centric point of view on these things. And if you think about the technology that they delivered, that's cool. But the value that they're delivering to the end user is a report.

Right. And so what they're basically saying is we have a middling written report for you because that's the right work product for this experience. And I would be shocked if the work product that actually people care about is the automation of the actual deep research rather than the synthesis and the writing of a really oddly framed report on it. Yeah, I agree. So review of 24.

from ourselves to ourselves. I took all of the transcripts. Well, in the case of some of these, like ChatGPT, when it has search, I just said, look at all the podcasts for hallway chat for 2024.

and write me up a high-level review. I want you to focus on the most creative and insightful topics. I do not want a generic summary. I want an insight-filled set of headlines. Even subjects that were mentioned only once can make it into the summary if they are valuable enough to founders and product builders of AI startups. So that was my prompt. I use the same prompt everywhere. I gave this to ChatGPT, and then ChatGPT, the O1 model as well, and then to Claude. And...

I have to tell you, like, it's an interesting situation that I've struggled with. I'll read them out to you. But my feeling was that Claude beat all of them. And it leaves me with a period where I would think going back through a year with the transcripts that something like 01...

We'll just do deeper research to go to the previous topic. And in real time while we're talking, by the way, I decided to drop this into deep research. You can get feedback from deep research as well. So I'm going to come back to that in just a second. So ChatGPT basically gave me a list of 10 or 12 things. And then I said, that's too many, like basically summarized to the top three. ChatGPT's answer was user-centric design over model obsession.

Okay. So, you know, focus, right? And then second, avoid the AI hype trap. And then third was tailor AI solutions to diverse user needs. Okay. It's terrible. It's terrible. I'm not listening to this. You didn't even internalize any of those words. And neither did anybody listen. So GPT-01, now this is at least processable. Again, it gave me a list of 12.

And again, I said, listen, like sum it up, dudes. And it said specialization beats size. So startups are discovering that smaller domain specific AI models deliver more value than massive general purpose ones. Second, data is your differentiator. Curation and careful selection of training data has become as essential as product building, blah, blah, blah. Third, alter over code.

Engineering a team that's curious, adaptable, and strategic about AI technology is more important than any single tool or algorithm. I don't know when we ever talked about that third one, by the way, but it seems like a half-decent point. I like it, yeah. But all those seem fine. They honestly don't feel like the things we actually focused on. They just feel like, it feels like Owan did a lot of research and came back with some generic advice. Here's Claude. Now, Claude, I had to do a little bit more work. Actually, I had to copy and paste a bunch of transcripts into the project.

In some cases, because it didn't have web search. But here's what it said. Here are the three major strategic insights for founders and builders. One, the OKRification trap. The obsession with short-term metrics and quarterly OKRs is actively preventing innovation in AI products. After a decade of optimizing for efficiency gains, the startup ecosystem has developed a muscle memory for quick iterations and measurable growth.

The company's making breakthrough progress, so they're willing to resist this pressure for immediate metrics and take a long-term view on some things that aren't measurable, which literally we talked about like four minutes ago in this podcast. Second, software should be soft. I like that. Pulled out a quote from somebody that we spoke with. A fundamental shift is happening from shipping fixed software products to generating malleable software on demand. Keeps going. I'll go through the rest.

Go deep to stay ahead is the third one. While many startups have tried to move quickly to build on top of the AI API, the most defensible positions seem to come from going deeper in the stack. This means not just prompt engineering, but looking at key parts of your data exhaust and what unique insights they may develop for your users over time. Like, what do you think when I say all that back? Does this sound like insights at all? Are they usable? The cloud one more so than the others. But yeah, sure. Like, I think that that gets the...

rolling conversations that are unfocused over a 12 month period. It's hard to like cut through and summarize, but that's a pretty good job. Okay. So I also have real time feedback from Gemini. This is deep research Gemini and it might be better. Okay. So what it actually did was pull out

Headlines. Like it actually took quotes from our podcast and turned them into headlines, which I guess is okay. But it's also basically just the headlines of podcasts that we've done. But it's forget incumbents. It's startups versus the LLMs. Okay. Right. Software should be soft. Okay. Right. Design an AI for user agency instead of your interface. Right.

How AI is empowering hobbyists and creatives, which we've did a whole episode on, but also touched on. Otherwise, what does it feel like to build for hobbyists versus the creator economy? And then this one's just a headline, making product on the S-curve of AI, adapting to disruption, which is just like literally a headline for one of the shows. Not bad. Not bad. I think that's been a reasonable theme throughout the year, though.

It's fair. Is where are we and what type of features should you be building? Absolutely. You know, I think you glossed over something because we had that false start today. The OKR-ification and careful what you measure and you can only drive improvements on what you measure, et cetera, et cetera. I don't think we've talked about Claude and the tonality. What is there to say? I think, you know, we talked about Gemini. Yeah, it still sounds like a totally generic AI model speaking to me. 23, I think, was when Claude really got its tone right.

We're two years in. I don't think anyone's really matched it still. I think that's largely because there's no eval for it. Right. Right. There's still not enough effective eval for it. And so the word of 2024 is taste. The overused word of 2024 is taste. An unfortunately overused word. Yep. But it's a cliche for a reason. It is the thing. Right. And taste is a very hard thing.

to measure? Very hard thing to measure. If your pipelines and processes are around optimizing for evals, you know, good luck getting a great eval for this. And so I think, why did I want to re-raise this? I think it's because

These soft things are what make products great, right? And they might've gotten the Claude tone right in 2023, but it is clearly getting better and better and better. Like it is the vector of progress is in place and others are noticing it, right? There's a whole New York Times article on what you and I know from day to day. People are going crazy for it in the Bay Area. I had that moment that I talked about where it was like helping me bake the roast and the tone was so

so good that I thought of it as a companion for one quick second. And I'm very much not in that camp. Look, I think it's always true. We can always almost at any time in the last 20 years, 30 years, 40 years, talk about the value of design and product. And I could bring up, just play quotes from Steve Jobs from like 1985 for the next 20 minutes. It would be fine. I'd love that. That's always true. I wouldn't mind it at all. I would say the...

Thing that's true even more so is in a world where we have now been data centered for the last 15 years and we're getting better and better at measuring more things, then you have to assume that all of your competition is also measuring those things. And, you know, it is true that what you measure is what you get better at.

But in a world where more things are measured, the leverage therefore increases in the things that are unmeasurable. And so your challenge, I think, as a leader is like, oh, it's so easy to put up an OKR or an eval. It's so easy to drive towards the goal. And in fact, that's the cleanest way I can help motivate my team and understanding we've hit it so we don't just like

self-satisfyingly pat us on the back and say like, good job, but we actually know we've really delivered. But at the very same time, the most unmeasurable parts of your product are probably its most defensible parts. Right.

And that is the little bit of the conundrum that will actually only grow in leverage over the next couple of years. Because, of course, everything you can eval is not just your team evaling it with an OKR now, but it's the models you're building, evaling against it as well. So your model is going to be self-improving and these other models are going to be self-improving. And so no matter what, it's like it's a rat race, right?

And so therefore, whatever parts of your product philosophy, whatever parts of your customer experience are the parts that seem of value but are very hard to quantify, those things as the overall competitive edge will grow in relative value over the next couple of years. That's probably like a, frankly, a five, 10 year trend. For sure. If you can get it right. But I recognize like,

How do I then motivate my team to the unmeasurable? That's a very interesting, longer discussion that we have time for now, but it's where we should all be trying to search for. Yep. Yep. So that was a long rambling road that you and I took from your question of what have we seen now that is likely to become compelling products today?

sometime next year. And I would put deep research in that bucket. Like I think great research to get it there. I think if you think of it through the lens of a team building like a demo platform to show that research and what it can do, they did good work to get it out, you know, on probably a very tight timeline, but it's not the product that is ultimately going to materialize as, and we're going to see that sometime next year. And it's going to be awesome. Like there's

no doubt that research and synthesis that goes super deep is going to be so valuable for so many different use cases and so many different users.

Yeah, I've got two others. So if your first one is that one, I've got two others. I have a big one, a small one. My big one is desktop real-time streaming, which Google launched desktop real-time streaming. There's also some startups like Highlight that have also launched desktop real-time streaming. And this idea that the context window for which you were chatting with an AI is just the things I'm doing on my computer right now is obviously and clearly initially magical.

But I would argue is still in a way pre-product market fit, or we are still figuring out what the proper affordances are for that relationship in a way that again, it's like agentic stuff from a year and a half ago, summer where like you get little glimmers of what it feels like. And you're like, oh, this is definitely the future. But then you kind of like reflect back on yourself a week and a half later. And you're like, I

I don't know exactly how it fits into my life and I don't know which workflows and I don't know when to talk about what and all the rest of those things that have yet to really kind of be negotiated and worked out. Right. So that's my one I'd be watching. Like, I think it's a real thing and it will probably take a little bit of iteration before we get there. Do you agree or have you found patterns for Google desktop research?

No, I totally agree. It is so intoxicatingly fun, though, to think about what types of products are going to be launched on top of this in the coming future. The issue is you raised this to me last time we were chatting about it is I don't want what feels like accessibility options to narrate what is on my screen in front of me. Right. Like you had the same experience as me. It

I see a dashboard and beside that is a webcam video showing somebody and da, da, da, da, da, da, da. Yeah. It's like, thanks man. I'm not blind. I'm looking at the window. I understand it's a window. I don't need to describe all the windows on the screen. Yes.

Yeah, but it will be amazing because in the short term, you'll ask it how to do something and it will tell you and then you'll just go and do it. And then in time, there will be products that then just control your mouse and go and do those things with that technology. But we're just getting a glimpse of the capability that will get better and better and better and then refined into the product that is able to actually deliver those things. But it doesn't feel like it's there today.

Yeah, I agree with that. But that makes me excited for the future. We get to see all the experimentation. Oh, sure. I've told you this before. I think the one that I'm most excited for the future are these reasoning models first seen in 01. Yeah. So part of the problem is the audience of people who are paying attention to demo research is

is multiple orders of magnitude more than it was even a handful of years ago. Right. And you and I played around with GPT-2 when it was released and we would have, you know, kind of squinted at it and tried to make sense of it. But the total audience of people who was playing with that was probably in the thousands. Right.

But now you land something that is really like remarkable research in 01. And people are like, oh, it doesn't write my essays as well as 4.0 or Claude, right? Like it's not good. And it's because the audience now for these demos and these research releases is basically...

basically like, you know, hundreds of millions of people. I messaged a whole bunch of friends there to congratulate them when I got to experience it for the first time, because it really does feel to me like GPT-2, where it's not GPT-1, it's GPT-2 in the sense that they've got it good enough that you can now experience what's about to happen. Yeah. And we're going to see in two to three or four years, I think the same level of change that

that we've just lived through from GPT-2 to now all over again because of that type of model and that architecture. Yeah, the phrase I had when I was trying to, somebody asked me about O1 last week at a dinner and my phrasing was like, oh yeah, it feels like a great model. It's just missing its chat GPT moment.

Yeah, yeah, yeah. But the Chet GPT from GPT-2 was like three or four years and a whole bunch of different, you know, generations. And I very much, that's a great way to put it, is we are going to see it get orders of magnitude better. And we're also going to learn how to shape it into a product experience that works. It already is profound. It's so, so remarkable. I think that's one of the most amazing technical releases of the year for me. For sure. And I think we're a long way away from seeing product.

I still don't know when to turn to it, like when to use it and when not to use it. There are times where the tonality, for instance, of Claude is just a better product for what I need, even though it was kind of a researchy question. It just still answers better. But the thing I like most about 01 is it seems more authentic.

assertive and more disagreeable in a perfectly wonderful way. Like most of these models are so ridiculously sycophantic at this point that they are just here to make you happy. And yes, I will go do that master. And like they do whatever. And in fact, I've gotten to putting in, please disagree with me inside of the Claude, like instructions inside of projects to try and get it to disagree with me. And O1 just seems more fundamentally

like willing possibly because of its context window, its research is doing to just come back and say, hey, I think your argument here is flawed or you're doing something wrong here or so on and so forth. I fed it a transcript of a conversation I was having with some partners on a weird side project where I'm opening a board game library in Berkeley. And just for fun, I took like a two to three hour conversation, took the entire transcript, dropped it in and basically said,

what do you think about the topics discussed? What do we miss? And what do you disagree with most and provide reasoning against that? And it did a very good job in a way that ChatGPT was horrible at and Claude was good at, but I thought,

01 was really interesting in the way that it kind of dissected out the logical fallacies that we had in the things that we were talking about and then kind of push back with a little bit of research against it, a little bit of thought process against the different topics we're having. Really helpful for obviously a very low stakes conversation for our little weird side project, but was great. It was great.

There's so much to like about that little anecdote. Even the idea that you have with friends a side project to start a board game store in Berkeley. That I'm also then randomly recording all of and then also feeding into the files. Yeah, yeah, yeah. Everything that you just said is what people said when GPT-3 was first introduced, right? Like, oh, I don't really know when to use it. I don't know how to use it yet. And it feels like...

They are meandering that path with us. They being the model builders, right? That's what's so wonderful about this moment is there's a lot of people who

pushing and pulling on these things and experimenting in real time to see what they can do and what's novel and what's interesting and what's not working. And then it just, it gets fed back to the people building the models and they improve it in those directions. And so a couple of generations from now, we're going to collectively like heave it forward. I think you're absolutely right. And it makes me feel a little bit more optimistic about the whole ecosystem because I do worry about everything becoming closed off.

And you're right. These things are pretty raw, which means we're all collectively in the playground together trying to figure it all out, which is exactly what you want to do, what

what you want to be. Yeah, for sure. Here's one looking back. It's not from December, but we haven't really discussed it at length. You and I spent a good amount of time playing around with WebSim. I think we mentioned it way back when, but then they went from just hallucinating websites to basically like hallucinating web apps. And I was reflecting on this year. I think that's the most fascinating product that I've used this year. And I pick fascinating carefully because it's not entirely...

useful and it's not entirely valuable, but oftentimes like those things emerge from things that are fascinating. And, you know, we like Sean and Rob so much, they have framed it as self-expression through software. And, you know, I think the, the things that like are obvious when writing code becomes automatic and, and like,

democratized or free. Like you're going to have software apps that become disposable. You're going to write your SaaS on demand. You're going to have all these other things. And this was an experience where you looked at it, you played around with it. You were part of the community and software, self-expression through software with WebSim feels like we are going to discover awesome things in the future.

I agree. It reminds me of Tumblr, reminds me of MySpace. It reminds me of the first

BBS I ever built way back when. Like it's just kind of like this manifestation of self in a way that is necessarily more playful and more expressive than it is some kind of like utilitarian function, customer development, V2V, vertical task process thing. You know, that reminded me of something that we talked about a while ago, but I don't think we ever brought up here together, which is this thing that's been teasing away in my brain about

how none of these products ever really changes. They get just reinvented for the new technical framework. Like that's totally what that brought up for me, man. You know, Discord is basically the same as AOLIM, which is basically the same as IRC. You know, this view that

That kind of real time messaging and communication layer is probably a thing that still exists either with Discord or some other company in 20 or 30 or 40 years. It's going to be mutable need. And then if you take that lens that there are just almost self-expression is one, there are almost a canonical behaviors that people want to do. You can use that lens and then look through the major categories and try and find something

The areas that, you know, maybe our founders in 2025 can spend more time plumbing away at because even though we're a couple of years into AI, there's probably some of these canonical categories that haven't really been fully explored. What's funny is I totally forgot about doing this, but there's another situation where you and me had talked about it. The partnership had talked about it as well. We covered briefly in an offsite.

I talked to some models about it as well. So I have some Gemini and some Claude and so on conversations that I just had in the car mostly while driving around. That is my new default, by the way. You should short podcasts as a medium because when I get into the car now, it's just voice. Like I just turn on...

chat to BT voice mode. And I dump out some of the things I have that came from the last meeting. And I'm just trying to think out loud about, I know I'll have a transcript about it. I also get some feedback on it. I have a sparring partner for it in a way that's like, I normally would have like primed up a podcast to be filling that time when I'm commuting, but I don't know what that means for the thing we're doing right here, but that's, that's fine. Okay. So here's what I wrote.

I'm trying to consider the major categories that have endured in consumer and creative software, considering the set of primitives that the internet and consumers just need. For example, IRC becomes AOLIM, becomes Discord, eBay and Amazon Marketplace and Etsy, Print Shop becomes Adobe, becomes Canva, just kind of like, and I gave some other ideas. And then that said, please consider other canonical products from the late 1990s, whether they were large companies, doesn't matter, just that they felt popular and essential.

And then think about modern companies from the 2010s that match the same usage pattern. There does not need to be a contemporary post 2020 company, blah, blah, blah. Think of five to start and then we'll hack away from there. It gives me back some, I go back and forth a little bit because of course it's like off slightly. I didn't get it quite right in vector space. And so kind of like oriented around that.

I'm going to read off a handful of these. I want you to think about whether you think any of these, especially in light of the conversation of what we just talked about, which was 2024.

We have some experiments in real time. We have some experiments in deep research. And we have this evolution or manifestation of this like almost like three panel view artifacts, chat and context window. So with that in mind, here are the canonical things that it came up with. It's information retrieval. This is an obvious one. Ask Jeeves becomes Quora becomes something to the kind of Q&A format, that kind of stuff.

No, I don't like this list. I'm looking at this list now. I have a lot of lists that these models gave me and I don't love the list to be honest. They're like, okay, but they need some human love.

They're still not great. That sums up the past couple of years. That's right. Okay. I'll read off some of them that seem kind of interesting, but I'm going to skip some ones that seem dumb. So, yeah, I think there's a question of what the future of Q&A format Quora is and whether that gets completely solved inside of ChatGPT or whether there's something new to be invented there.

Writing and document tools, the kind of like WordPerfect becomes Google Docs becomes Notion. What's the next version of that? It might well look like Notion, but I don't know. There's a world where that's something that's quite different. Data analysis, a core canonical thing you need to do. You'd started doing it with Excel or Google Sheets, SQL. What is the next version of that? We've seen lots of startups experimenting with that, but it feels like we haven't quite gotten to what it's supposed to be.

What I think about a lot that is on this list too is online reviews and recommendations, you know, opinions to Yelp. What does it feel like to get judgment on something in the future? The way I would rethink that is, you know, if web 2.0 was wisdom of the crowds, then I think of this AI age as the wisdom of experts. Like it's a different thing. If I'm asking about why this rash on my face, what that is, I don't need the wisdom of the crowds version. I don't want the Google version of what the rash on my face means.

which is, as we all know, is just going to mean cancer because everything you look up on Google is going to be cancer. But what I want is the wisdom of experts. I want, you know, 40 doctors who are incredibly smart, who have inputted into the model thoughts about what this could be. And we're seeing this, obviously, with the model companies who are spending ridiculous amounts of money right now, hiring a bunch of PhDs to do a bunch of data model entry. And so I wonder if there's a similar version here where...

You know, if this era is the wisdom of experts, then is there a version of that that has to do with this particular product category? Otherwise, I don't know where we get new. If you're not paying people to figure out what the good coffee maker is, I don't know where we get new information on the Internet.

Versus internet slop versus all the stuff that ChatGPT is just going to spit out with no net new insights, right? No ground truth data of somebody who walked into that restaurant and thought the ambiance was good or bad. Very interesting. Certainly that's going to continue, though. Like there will be a new Reddit. Like I use Reddit today for discovering what to buy online.

But there will be an entirely new undoubtedly, right? Like there's tell me. Like you used to contribute to Yelp and you would get a little bit of fame and fortune because you'd be SEO to find it later. And then Yelp would get some advertising revenue where they get traffic off of that, which they can then monetize back to advertising revenue to stores who are trying to get awareness.

But if I'm just speaking to the model and the model, first of all, removes me as an active actor, even though I reviewed that restaurant, it doesn't have to say my name. So I don't get fame and fortune. I don't feel like a happy Yelp elite, right? So then that disincentivizes me from contributing to the model because I'm not getting recognition. Then there's no advertising either because the model is just responding.

The wisdom of the crowds model of Web2.0 just gets broken in that. And I don't know what the new model will be for net new knowledge generation on the internet other than models hire PhDs by the hour to be smart into their models, which is the answer in 2024. But that feels brute force and ugly and not scalable or sustainable over time. That's not an ecosystem that's healthy. That feels wrong.

That feels like you're starting a university by paying a professor to just write as many papers as possible per month and paying them by paper output. That's not an ecosystem. We like stumbled into what I think is an awesome discussion, but I don't think we should do it on this today. Like this is no, no, this is a really meaty topic. It took me a second to realize even what you were talking about, but it breaks Web 2.0's like core proposition.

Maybe there's a new product experience that leverages AI, but it's not just like you to the model. And maybe there is still some sort of dynamic where the collective is present or some small community is present. But for me, the summation is I wanted to know the general wisdom of crowds in Web 2.0. I wanted to crowdsource thousands and thousands of opinions and get the kind of net average of that opinion. That is not the way I want to interact with an LLM model. I want an LLM model to know

who's smart, where the experts are on that topic, weight those far more than the

person who is uneducated or doesn't know about that topic, the PhD in that topic, I want to understand and be able to derive their knowledge. And that is what people do. That's implicitly what evals are doing, by the way, right? Is that you have a bunch of PhDs upvoting and downvoting the right and wrong answers and therefore leaning it towards where the PhDs think are right. That's what's happening. That's how it gets better math answers over time, right? And that's how it gets better philosophy answers over time.

Only they're doing that in a way that feels brute force and unsustainable by just literally paying smart people by the hour to do it as much as possible. That does not feel like a way you have reconstructed the global economy for the future. That feels like a short term, small brained fix where you're throwing money at the problem.

There's a teasing away at me that feels like there's a more elegant way that we should be organizing the economy for the future. If this is really how we're all going to be smart and how we're all going to do work over the course of the next couple of decades. And we can stop there and we can come back to it. But I think that's the summation of what I think the issue with this

We started it with a Yelp conversation, but it's obviously it's much deeper and broader. That's how these models get smart. You think it's small brain to like hire doctors to not just rank the results, but like create the reward data or like the fine tuning RLHF data? I don't know that the like purist one on one capitalism view of this is like, yes, you pay the people for the labor and then they make the thing and then you get leverage off of the labor. And that's how the economy should work.

I am still enamored with the very elegant way, not that it didn't have its flaws, but the very elegant way that we allowed people to do the things they were really passionate about kind of intuitively and intrinsically in Web 2.0. They got obsessed with viewing restaurants. And so they went and did that. And then we found a way to make that work and make use of it.

If there's a way we can solve the knowledge problem in a way that rewards people for intrinsic behavior versus extrinsic behavior, we will get much better results and we will operate at much larger scale. And that doesn't mean they don't get paid. I'm not saying we don't pay people like people should.

find ways to be remunerated for their work. But that's probably the summation of what I'm trying to grasp at. Mm hmm. Mm hmm. Boy, there's a whole lot of different things here. Like, where's the data for training the large models? Where is the outlet for self-expression online? Like the business model of Yelp and Reddit and everything else has allowed these communities to thrive and flourish. But in a world where like the business model changes, how are the services going to change or which new ones are going to emerge? Yeah.

to take their place. There's a lot of like really interesting stuff there. Yeah. Okay. So I'm going to finish up on a couple other categories that are canonical categories, and then you'll see if any of these trigger another deep hole hour long conversation. I don't think so. I mean, the other ones that are on here are...

Media consumption. That's an obvious one. What's the future of music and movies and things like that? How do diffusion models and the rest of that play into that? That's a clear one. It's a canonical thing we do. Marketplaces. So the Ebays and Amazons and Etsys. And I think there's areas on the search side

where you probably can maybe reinvent the interface. And then we've seen some folks that are experimenting also on the creation side, right? What happens to Etsy when you can imagine everything that you could possibly make? And is there a way that that becomes a dialogue with the creator and the buyer? Are there ways you just change that relationship in a way that feels more collaborative? I think there's lots of fun, interesting ways that you rebuild what an online marketplace feels like there. Anything else on here? Uh,

Project management? Sure. There's always some version of Asana, monday.com. Like there's some version of keeping track of everything. There will be a new version and it will be rethought for a world of AI capabilities, for sure. Right? For sure. Yep. 100%. Yeah.

Like the old ones have are ridiculously sticky. Like Jira is still around. Right. Unfortunately, but there will be there will be a new modern experience rethought for a world of AI. Yeah, that's right. We could stop there. Those are some things to go over your Christmas breaks where you have a minute with your family and then you're bored because they have some Christmas show on that's just way too boring, but they like to watch it every single year and your mind is wandering. Yeah.

You can let it wander into this area. I have an update. Recently, we talked about how there was no product from before 2023 that has introduced AI features that are actually usable. And this like squarely puts me on like the left side of the curve, no doubt.

I think that on the whole, the integration of Apple intelligence is really like janky and it's like ham. No, no, no, no. Wait, wait, wait, wait, wait, wait. And it's ham fisted. And like, I don't want like the rewrite feature that like, I just want to copy paste this stuff. Like it don't, don't give me that.

I am super proud of the gang who has like the new camera button that allows you to go right into the vision for chat GPT. But I demoed that like twice and it's cool. And I don't know how often I'll return to it. The thing that I love, it might be one of my most beloved AI features of 2024.

is the summaries that occur in mail and in messages. It is so good. You love it. It is so good. I love it. I consume it, you know, a hundred times a day. Yes. And, and they're great. I don't know. I don't know, man. Like I disagree. I respectfully disagree. I do use the messages summary feature.

And I find that it's not great. Maybe it's better in the male context where it has more words to work with and more paragraphs to work with. So there's something to surmise. But when it's trying to summarize five texts, it gets a little off.

I'm looking at it right now in mail. It is spot on and useful. They must have trained their own model, I'm guessing. And somebody who like really cares about the product experience, it's delightful. And you're looking at me like I'm a maniac.

The one thing I do want to use is my friend Dan Chipper over at Every. He shipped this thing called Quora yesterday and I haven't had a chance to install it. But it is a inbox agent, which basically summarizes all of your emails and send you a summary of your email every single day, twice a day. And

And also with some draft responses in your own voice, theoretically. I'm more likely to do that than I am to switch to Apple Mail. That's for sure. Oh, last bit for me on 2024, like takeaway. I got to tell you, I had a conversation with a founder who, shall rename nameless, but is running an AI startup this week. And somebody I talk to frequently,

And he was just talking about how he came to realize over the last couple of months that about half of his AI team and maybe only 10% of his total team are actually AI red-pilled at this moment. And let me say what I mean.

After the chat GPT moment, there's a second moment that happens, I think, in people's brains where they're like, oh, this isn't just like a smarter search tool that I can like debate with. This is like a junior coworker and partner. Not is AI going to be helpful?

But there's this moment that you have on this journey where for a lot of people, it's like watching Devin or some kind of agentic thing run around and actually like really do work. And I had a conversation with a founder who was just like, yeah, I realized that like most of my product team and even most of my team wasn't yet at that level of belief in what was coming down the pike. And then I

I relayed that to a portfolio CEO yesterday, and he said to me, like, I have this exact problem. Like most of my executive team uses chat GPT regularly, but for kind of like, you know, Google search plus plus kind of needs and is not at all. He called it not at all living agentically.

and not really considering what the world feels like if you're living agentically, where, you know, this world where you're almost like thinking like you're giving instructions to an intern, like, let me let me write in paragraphs and really describe this problem to you so that you can really get to the root of the answer and try and feed it back to me. Interesting. I don't know that I have a takeaway other than like I was like, holy crap. Like there's a whole bunch of different thoughts into my head all at once. The

The saying that the future is already here is just not evenly distributed. The leverage given by this technology is going to be so dramatic that the people who are early to adopt it are going to have tremendous productivity gains relative to everybody else. That's like the obvious thought. The other thought that I keep coming back to that I think a lot of people don't appreciate is,

People are busy with busy lives and they have hobbies and they have stress. And for you and me, like we love this stuff and our lives would be this even if it wasn't a big part of our job.

But for most people, it's kind of like, hmm, that's interesting. And then they go watch the Lions or the Bills, right? Like no shade, but it's just like different there. I am constantly surprised that even in technology, even with people who are at the forefront of technology, there's oftentimes a lack of plasticity in thought.

They might not have the imagination to appreciate that. If it's doing this today in a year from now, this is what it's going to be doing for all of us. And I think that if you like combine those last two ones, especially like if you're not playing around with it and you also don't have a great like elastic imagination, it can be tough to see a month or six or eight or 12 ahead. I think I agree with you on both of those points. And maybe that's the challenge here.

back is just not to take for granted that even the people that are closest to us are exactly operating with the same priors and therefore making the same decisions. Like for people on our team, I'm not going to go this for our own team. Like I'm going to make sure everybody's opened up web sim and

replit agents and maybe Gemini deep research, like three agentic computing things, spend 15 minutes at each, build something at each, watch it work. It's a different mental model than the way you interact with something like ChatGPT, and it will help you think differently about product. And I have to first try and get my wife there. But I then like, I think it's a good challenge to try and make sure, frankly, all the CEOs are

That I'm working with and frankly, all of the people that they work with have gone through a process like this because maybe I underestimated how much of an edge it will actually get you. Even inside of this little bubble of AI we're all already in, there's still already a disparity of where we're sitting in our worldviews.

Mm-hmm. Let alone the rest of the world. Right. Let alone the rest of the world, for sure. So build your best web sim, build your best thing in Replit or V0 from Vercel is also quite good. I've been playing around with that lately. Windsurf is awesome. I lost hours of my life to a random windsurf project from one o'clock to three o'clock in the morning earlier this week. Go play with these things over the holiday break. You'll enjoy it even if you've never played it before. You know, one last thought that all of this that you just raised has given me

a reminder of is a lot of people in the past, like three months have trotted out the Gartner hype cycle curve. And they're like, Hey, like we're in the trough of disillusionment or whatever it is. And I will soon be up on the plateau of prosperity. There's been like, honestly, like 10 people who have raised that in the past couple of months. And my feedback to them is like, I actually don't think of it that way at all. Like the Gartner hype cycle was, uh,

Invented in the 90s for desktop scanners, right? And webcams where I think we are still, for the reasons that you just said, like the CEOs of these companies who are on the frontier of building the future in different fields are still maybe underappreciating what this technology is going to do. I think we are overall still not nearly as...

as excited as a society as to like what's going to happen on a 10 year time horizon. I think we're still underestimating it. I agree. I'm sure I am too.

Right. It's one of those things where the number of zeros is too hard to totally internalize no matter how many times you stare at it kind of thing. Yeah, that's right. That's right. You and I should be better positioned than almost anybody to appreciate how crazy it's going to get. And I told you like five times in the past two months that I'm still underestimating how profound some of these products are going to be in our lives in a couple of years.

Yeah, agreed. Well, on that note, maybe that's how to wrap up. That's it for now. We have plenty of time over the break to play with things. I'm hoping I find three more releases next week to play with more AI things and think about the future and also maybe spend a little time with your family as well. We can do a little bit of that as well. I'm going to be hitting the Apple summaries in my mail app hard. Take care, man. Talk to you later. See you.

Distilling Lessons from AI in 2024 55:58 Share

Hallway Chat

Deep Dive

Shownotes Transcript

Distilling Lessons from AI in 2024