We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Building an Interestingness Leaderboard

Building an Interestingness Leaderboard

2025/3/29
logo of podcast Hallway Chat

Hallway Chat

AI Deep Dive AI Chapters Transcript
People
F
Fraser
N
Nabeel
Topics
Nabeel: 我认为,评估AI模型的方式应该与评估AI产品的方式有所不同。我们应该创建一个AI产品的趣味排行榜,以促进AI应用的创新和发展。我认为AI生成的应用应该按照“有趣程度”排序,而不是其他指标。通过vibe coding的方式,我们可以创造更多有趣的AI应用,Levels.io的成功案例就是一个很好的例子。我认为下一代AI游戏平台应该类似于一个整合了Cursor和Reddit的平台,允许用户访问和修改其他用户的代码,并查看和学习其他用户的作品。我认为,目前缺乏一个平台来展示和分享AI应用,这阻碍了开发者之间的学习和交流。我们需要一个平台来促进AI应用开发者之间的交流和学习,这个平台应该是一个自包含的平台,而不是一个简单的链接集合。它应该提供一些筛选机制,例如按应用类型进行分类,并提供一个筛选机制,帮助用户快速找到感兴趣的应用。我们可以通过抓取Twitter和AI相关的新闻邮件来获取AI应用的发布信息,并通过分析网站流量数据来构建这个平台,这个平台应该展示新兴和流行的AI应用,而不是仅仅展示最受欢迎的应用。 Fraser: 我同意,目前缺乏一个平台来展示和分享AI应用,这阻碍了开发者之间的学习和交流。我使用Claude 3.7测试了一些vibe coding应用,发现它比其他工具更好,但我目前还没发现任何vibe coding应用因为Claude 3.7的改进而从不可行变成可行。我认为AI生成的早期游戏更像是娱乐消遣,而不是真正意义上的游戏。我计划用Cursor尝试修改开源的《命令与征服》游戏代码,尝试将开源的《命令与征服》游戏代码导入Cursor,看看能用vibe coding实现什么。我认为,一些以前不可行的想法,现在可能因为AI技术的进步而变得可行。我认为AI应用迭代周期变慢的原因之一是成本增加,以及缺乏一个排行榜来展示和比较不同的应用。我认为,这个平台应该足够宽松,以便开发者能够发现和学习相关的应用。早期的一些平台,例如Flickr和Midjourney,通过其独特的机制,形成了自己的创意生态系统。我们需要一个平台来促进AI领域的集体对话和学习,目前,我们缺乏一个合适的指标来衡量AI应用的价值。我认为,这个平台可能需要是一个封闭的平台,类似于Roblox。我不认为每个垂直领域都需要一个独立的平台,我们需要一个统一的平台来促进整个AI生态系统的进步。早期互联网也存在类似的平台,但它们的用户规模较小。这个平台需要提供真实的信号,而不是虚假的信号,并提供客观的衡量指标,而不是主观的评价。Taft是一个尝试构建类似平台的例子,但它失败了,因为它展示的应用不够有趣。这个平台应该按照“有趣程度”来排序应用,就像Flickr曾经使用“interestingness”作为排序标准一样。

Deep Dive

Chapters
The conversation explores the idea of an AI product leaderboard, similar to app store rankings, to foster innovation and inspire new AI applications. The discussion touches upon the viability of AI-built games and the potential for community building around AI development.
  • AI product evaluation is discussed
  • Vibe coding and its role in AI game development are explored
  • The potential for a breakout success of an AI-built game is considered

Shownotes Transcript

Translations:
中文

It's like watching the sitcom get built as you're watching the sitcom. It's going to feel like goofy toys and entertainment and community. Probably will be, but it will at least be a good test of where the systems are today. I vibe-coded this piece. It's so silly, but look at this other thing. I want it sorted by interestingness, Fraser.

Hey, Frazier. So the conversation we're going to have today, it's almost like an open call, right? I do think there's somebody out there that has an interesting idea that will riff off of the conversations we were having at dinner last night and this week and help us just increase the amount of interestingness in the world of AI apps with a product that probably we should be able to vibe code our way into the future. Let's do it. Yeah, let's start from the beginning. Frazier, you said you've been...

checking out some stuff this week and then we'll dive right into it. I went back and used all of the like vibe coding apps using Claude 3.7 to check, but I don't know, like just works better. The reason that would matter is because it's like, oh, just Claude 3.7. It's so much better at coding.

Sure, everything gets 10% better, fine, or 100% better. But in your read, when you went and tried out all these apps, it didn't move any company that you looked at from non-viable to viable. Because that would be the crazy thing. If there was stuff that was like somewhat janky product that couldn't quite make that parlor game or...

whatever it is, app platform or some derivative of cursor going after some other vertical or whatever. And then like suddenly 3.7 is like, oh, we can do that now. At least you didn't find that in the first cursory pass this week. No, I didn't. But have you seen levels IO? Have you been following what he's done?

with Cursor and Sonnet? No. He basically vibe-coded a flight sim dogfight game that's multiplayer, and he's gone up to having 26,000 simultaneous people dogfighting in the air. He then...

had people saying like hey i'd pay for this so he integrated like lovably the concept of an f16 that you can have as your premium dog fighting thing he has sold some number of those and then a company reached out and said can i advertise so he again like vibe coded a blimp into the game that then has like their branding on it did we just be quitting our jobs and building in public like what what's the video game you would you would vibe code yourself to do right now frazier

I'm too anchored on this because it's so hilarious. What would I vibe code? My answers were too mundane. Well, so maybe this is an interesting conversation because before this, we were vibing...

asteroid or like brick breakers into existence or you know people are still excited about multiplayer snake right right like the only thing that's not great about this is the graphics i mean not the only thing there's

There's a lot, but. Well, there's also just the, it's vibe coding. Like obviously if this was a package good, no one would buy it. Part of what you're doing is just following along with the levels IO guy and you just, it's community building, right? You're like, it's part of the entertainment. It's like watching the sitcom get built as you're watching the sitcom. It is. What are the web sim guys call it?

self-expression and entertainment through software development. It's not people who are into software development. It is like his fans and then it's become a meme. And like every great meme, it's taken on. So maybe there are just ideas that we all are not thinking about right now that were too janky or unviable if you were playing in the play space two years ago, if you were just trying to build a company and prototyping that you should go back and...

think through. I don't know what they would be, but it's got to be true that some non-viable things just became viable, especially related to coding. I mean, there was a whole era of, you know, addicting games and all these other flash website game generators. And there are a crop of AI gaming, but none of them have quite gone yet. But maybe with 3.7, we're going to see, we're going to be really surprised in 30 days. It might be the time. I think it's probably the time would be my guess. Could be. Could

Could be. My guess is that the AI-built games are going to be in that first bucket that you mentioned a couple minutes ago, in that it will be entertainment and popcorn rather than games. People who are building an AI games platform, I would be surprised if they're the ones who are the breakout. What do you think it's going to feel like or look like? It's going to feel like goofy toys and entertainment and community more so than it's going to feel like a games platform is my bet.

Yeah. Do you play with anything? I've certainly been playing around with all the deep research stuff. The thing I want to try this weekend is on the same topic is and conquer

The old game, the first kind of RTS game from Electronic Arts was just released as an open source project. And so here's a very viable modern... And it's not modern in the sense of like Call of Duty from last year, but it's modern that has an open source code. And I cannot wait to get to tomorrow and pull that thing into Cursor and just see...

what I can get done vibe coding inside of a really structured environment like that and see what happens. Probably will be, but it will at least be a good test of where the systems are today. I don't know if it'll be terrible. My guess is that you'll be able to do stuff. I had a fun conversation at this, I didn't know last night, almost turned into a, almost like a sort of prototyping a product live with like six designers. I'm curious where it might go if we talk about it a little bit here. The question is, aren't there more weird applications

AI experiments, right? If you look at the beginning of mobile or the beginning of the internet or even the beginning of the Facebook app platform, there were many more weird experiments coming out. We had one just say like maybe this technology is leans itself more to B2B SaaS like solutions. Maybe ChatsPT just leans itself more to B2B SaaS type solutions, which I don't buy at all. Not one bit. Yeah. Have you evolved your thinking on that at all?

Like we're now a couple of years into this thing. It is a very horizontal explosion that the opportunities are literally everywhere, but we don't see as much weird stuff. And it feels like the iteration cycles are actually kind of slow. Like it's just a look at another guy's got another model and maybe it's a little bit cheaper or whatever. I don't know if I might think it's evolved in any more profound.

Certainly the industrialization of startups and what we celebrate as a broader community has probably changed the way that people are prioritizing what they do for at least some subset. I think we shouldn't underestimate the cost associated with this as well. You could experiment in the mid-2000s.

With Web 2 or you could experiment on the Facebook platform or even with iOS for no marginal cost. You're right. You're right. And maybe in that world, a product like DeepSeek is just net beneficial to the amount of experimentation. And so it just increases the velocity, not just of models, which is a whole other thing, but of just this whole ecosystem. Yeah. My read was that it's because there's not a leaderboard.

Hmm. I was thinking about it more. Friend said he thought it was because there was not distribution, that you had this mobile distribution advantage with an app store where suddenly you have a new thing, a new way to grow. And Facebook, certainly the Facebook app platform, part of its promise to founders was like you can go viral and get in somebody's feed with your lonely cow and grow.

But I think it's the lack of a leaderboard or another way of wording it is the lack of a common news outlet to talk about who's winning and who's not winning. Our news feed right now is almost like trying to watch the stock market ticker in order to figure out.

about the Fortune 500 companies are gonna be over time, right? - Yep. - Like just imagine you were just staring at a stock market ticker and then I was gonna ask you at the end of the day, like, "Hey, which stock was up the most today? "How'd it go?" It's like, or this week or this month, like it's a terrible way to figure out things. And the thing that I remember as a founder in the Facebook app platform was like all of the other founders, including me, you'd log in every day and you'd look at the top charts of the top apps and some new thing would spike up for the week

And, you know, look, most of them were terrible, but it gave you a thing to kind of experiment with. And then you go see it and you see that, oh, that guy experimented with their login flow a different way. And, oh, that's kind of cool. He just drops you right into the game. I've never seen that before. You know, like, and then you're like, oh, I could try that in mine. And it was same with mobile. You had the mobile app store.

And so in those early days, you were looking at that every day and you were looking at new types of games and new types of business apps and new types of travel apps that were trending. And it was just like inspirational juice for you to then stand on the shoulders of those giants and then built your next thing. It's weird that in a world of everything being ranked, like there's no, you know, App Store product hunting equivalent of AI products right now.

Does that resonate at all? I saw a couple head nods, but- No, it does. Part of my response initially was going to be like, that sounds like you're making the argument that your friend made that you dismissed around distribution. But I realize that you're now talking about inspiration and discovery and recognition more so than distribution. Yeah, I am talking about builders being able to see what other builders are working on. It's like we're in a common salon.

where we are critiquing each other's work, analyzing each other's work, and iterating quickly. I understand that the output can also be that it gathers you more consumers if they see that, but that's not the outcome I'm talking about that's valuable. Yep. Although having tens of millions of consumers come into your app is also valuable, let's be clear. That resonates. I mean, frankly, that's part of what's so appealing about WebSim, right? The idea that there's this creative place where people are congregating and finding and creating

forking and finding inspiration and building and then it's there and then you can contribute to it. The best thing about WebSim is that it offers a version of that. It doesn't offer it for the whole ecosystem of AI just for WebSim things, but I'm really glad there's no version of that for Cursor. There's no real leaderboard or explore page of AI apps. It needs to be loose enough

as a discovery mechanism for other builders that you can find something that's close to what you're thinking about building as well. That's, I think, part of the issue here. It can't be super constrained. Because if you get that also, like that thing develops its own meta. I got into Flickr in the early Web 2.0 days, one of the very first Web 2.0 darling companies. And it's like a photo sharing website with an explore page. I can't tell you how many times with my first kid I took photos specifically trying to...

Think about the Flickr Explore page and the meta of that page and what would get me to trend. Midjourney did the same thing early on. Like, Midjourney almost developed its own creative meta. People were learning how to prompt by banging their head blindly against a prompt and only looking at the slot machine pull to get what they want. You're exploring the Discord and you're looking at the Explore page and then you're looking at how other people are using the product. And I think we just...

grow a lot faster that way. We're not having that level of collective conversation at the level of playing with

Claude or playing with CodeGen or playing with AI generally. So instead we're left with like, well, what are the milestones that makes you look? It's like, I don't know, like they just raised $30 million series B. Like, okay, that doesn't happen often enough. And it's not an interesting milestone to be the creative nexus of what you should be looking at. So the two examples that you shared around the Facebook platform and the iPhone app store,

There was a solution where as the consumer, you got the discovery as well as you seamlessly used it on that platform. I think that's probably an important piece. What I mean by that is like, it can't be a product hunter or a Reddit type thing where it's like, here are the five cool apps of the day. And then you have to link off and use it.

It is like a self-contained platform. And so this is where then I go to my comment earlier that I think the next gen AI for games is going to feel something that's not like an AI for games platform, but it's going to feel exactly like what you just described. And it will be,

some version, you know, cursor for like, where's this for cursor vibe created apps or Reddit created apps. Yeah. Loveable should do this tomorrow. Yeah. Yeah. It's not going to be those, but you're going to be able to like go and find the playable or usable versions of the software that people have bought into existence. And you're going to be able to like remix aspects of it. And you're going to be able to like,

go into the source code and take what you need and fork it into your own thing. - Yeah, I think you're onto something. You're right, part of the issue is this is all on the internet.

And so your point is it might happen inside of a closed platform, like the way it happens in Roblox. Like people look at what's trending and it helps inform what you should build next on Roblox. People look at the Steam pages and Steam Spy to look at what's working and that influences what's going to happen next. I kind of think that's true, but then that leaves it up to each cantoned village to

vertical market to go do their thing that they're going to do in their market. And I somehow don't think, you know, sure, so Glyph is going to have an explore page and Wordware is going to have an explore page and Zapier is going to have an explore page. Fine. I don't think that's the collective conversation we're looking for.

as a group of founders and investors and so on to move the whole ecosystem forward. So sure, that can happen, but I don't want to give up on the larger goal. And so, yes, you're right. We don't have Apple. There's not a single platform that's pushing it, like say Facebook or Apple. In a way, didn't this happen in the early internet as well? Like in the very early internet days, Yahoo's a directory. And we also had a bunch of like cool hunting and products like this. That was an earlier world and we only had like

5 million people on the internet or whatever it was back then. But I think we had versions of it that were federated and maybe Product Hunt is an example, not a crazy successful example, but an example of a federated version of this. You just need signal. You need real signal, not fake signal to make it work. Because the thing that the App Store has, that Roblox has, is they can see the data.

right? They can see number of views. They can see the things that are moving up in a way that it is not true when it's just product time and it's a separate other rating system. That's just a dig system.

or Reddit for the thing. It's less a measurement of a movement and more of a measurement of the community's endorsement of a thing, which is a different thing, right? It's the fans of content as a community like the object, which is a wildly different thing. So you just need a way to have some kind of objective measurement. Maybe we should build this

Frazier. So I wonder if we could build it just by looking at traffic stats. Like if we paid for a feed, uh,

of web traffic. And so you could see how many people were going to X domain versus a different domain. Would that be good data? I think I've lost the thread of what you're trying to do. Just imagine the app store. I want to log onto a site and it's only AI products. And I want to see out of the last 30 days spiking in AI products. What should I be paying attention to? What should I go try now? The reason why I said like I'm losing the thread is like

I don't know if that's where you started. Like you said that you wanted to see weird experimentation. Yes. And what products are spiking that have AI within it? It might be like the sales agent, the SDR sales agent. Yeah, but you don't want to see that. That's not in this bucket, is it? I'm not saying it's a deterministic list that is sorted by weirdness. Oh, that would be a different question we should come back to. I'm just imagining what happened in the Facebook app platform days. I would log in on a Friday.

morning, I'd look down the app store and the app store would be organized by the things that were new and trending. So not just if I was trying to look at the top list, it's still going to be Farmville at the top. It's not that helpful.

But you look at new and trending. And what you will see there is, of course, if you just glance down it, you will see four different farms. And like, that's fine that you see that. I don't need to click on those. But then you might, after that, see like my pet turtle. And you've never seen a pet game before. And you're like, oh, that's weird. What's that? And like, why are people trying that thing out? And then you drop in and you take a look at it. So it's more like some version of...

all of the things that are being released, which is fine. We can probably scrape Twitter and all the AI newsletters and all the rest of that stuff to get like launches that have happened over the last month or two months. But that is quite frankly, too long of a list to ever try. And so it's just, it's getting a heat check against that group.

to get down to say 10 or 15. And then, yes, I assume that maybe only half of them I want to check out, but I'm now looking at a list of 10 instead of a list of 400. Does that make sense? Yeah. Yeah. I mean, no, you don't think it's going to get me then. You don't think it's going to get me the list I want. No, I think that the field of view was narrowed on the Facebook platform dramatically because of the, the, the constraints of the platform and the nature of the audience and the people who are building on it. And like,

If instead of that time you had like a list of here's the trending web apps or people who are using a technology like, you know, HTML, then you'd be disinterested. If you do it just based on do they use AI? That's not what you want. You want that first layer of curation is what I'm saying that the Facebook platform instilled. Maybe that's as simple as everything that I've been talking about.

but then there's a little bit of automatic categorization. So you can look at the customer service oriented ones and I can look at the games oriented ones and another person could, you know, something along those lines can get something that feels a little bit more constrained to the space. I'm surprised that there's not

A modern take on TechCrunch for curious AI experiments where like Michael's celebration of entrepreneurship and these like crazy web app experiences was so pure that

So your answer is it's editorial. Well, that's the easy first solution, right? Yeah, that's right. This brings us back to your PRD that we need to write because this is what you're talking about. This is what you want to build, the curated place for this. Here's a weird little app that I tried. It's cool. I vibe coded this piece. It's so silly, but look at this other thing. So let me give you an example of something and then you tell me why this doesn't work. And I agree.

I don't think it's quite it, but there's a product called Taft. There's an AI for that and T-Com. You can drop into there and then go to trending. So this is supposed to be a directory, almost Yahoo-like of all of the AI things being built right now. And trending is supposed to be in their own heuristic.

the thing I'm asking for, which is hot stuff right now. Yep. But look at that list and tell me why, why did this go wrong? Cause they're on, they're uninteresting. Yeah. They're uninteresting. They're shallow. They're shallow in an uncouth way, like create AI course and create coloring pages. Those are like GPTs, but you're not looking for full things. You, you just want like different creative new novel content.

It could be as shallow as it can be. I remember at Flickr, when that Flickr was trending, they didn't sort it by views or popularity. It was sorted by a word called interestingness.

Yeah. I want it sorted by interestingness, Fraser. Yeah, yeah. Well, that was a good exploration of the idea. I'm sure folks online will have some thoughts as well, and we'll loop back to it later. Thanks, everybody. And as always, if you see an AI product worth chatting about or you have some comments on this, let us know. See ya. Great.