We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Gaurav Misra & Dwight Churchill - Building Captions - [Invest Like the Best, EP.405]

2025/1/7

Invest Like the Best with Patrick O'Shaughnessy

AI Deep Dive AI Insights AI Chapters Transcript

People

Dwight Churchill

Gaurav Misra

Topics

Gaurav Misra: 我认为AI领域的重大突破在于能够训练越来越大的模型。这需要更好的硬件、更先进的机器学习架构(例如Transformer和扩散模型)以及更有效的训练技术。但归根结底,决定成败的是数据。对于像我们这样的视频生成和编辑公司来说,视频数据比文本或音频数据更难获得,也更昂贵。因此,建立一个可持续的数据飞轮至关重要,它能够不断地收集和利用数据来改进我们的模型,并保持我们的竞争优势。此外,不同类型的AI公司之间存在根本性差异。那些致力于通用人工智能(AGI)的公司,目标是解决一个无限的问题,而我们专注于视频生成和编辑,解决的是一个相对有限的问题。这使得我们更容易建立一个可持续的商业模式。我们的模型改进主要依赖于持续不断地用更多的数据进行微调,以满足不同的使用场景和视觉效果需求。我们相信,随着模型的不断发展,视频生成技术将在18个月内达到好莱坞水准。我们公司最初的成功源于一个简单的AI字幕应用,它迅速获得了大量用户。这让我们意识到,即使是简单的AI应用也能产生巨大的影响。我们从一开始就设计了一个数据飞轮,通过收集用户数据来改进模型,从而提供更好的用户体验,形成良性循环。随着时间的推移,我们不断扩展产品功能,覆盖从脚本创作到视频编辑和分发的整个视频制作流程。我们的AI工具包括AI Creator和AI Edit,分别用于视频创作和编辑,两者都非常受欢迎。我们发现,通过扩展产品的应用场景,可以不断开拓新的市场,并且在一段时间内没有竞争对手。这导致了我们业务的快速增长。但我们也意识到,随着技术的不断发展,未来会有越来越多的竞争对手出现。因此,我们专注于提升模型的质量和用户体验,以保持我们的竞争优势。我们认为,目前我们只开发了视频生成应用的1%-5%,未来还有巨大的市场空间。我们训练的视频模型是扩散模型,它从噪声开始,逐步去除噪声,最终生成清晰的图像。文本条件有助于模型确定最终图像的目标。视频模型比文本模型更容易优化,因为视频生成是一个相对有限的问题,而文本生成则涉及到无限的智能问题。在未来,我们将能够生成人与物体互动的高保真视频。这需要收集更多人与物体互动的数据,并利用文本和图像条件来指导模型生成。随着视频生成技术的进步,高保真视频的成本会降低,这将改变视频的价值,并可能导致其他相关领域的价值变化,例如个人形象的价值。我们公司已经经历过其他公司试图模仿甚至恶意竞争的阶段,但最终凭借更好的产品质量胜出。我们与社交媒体平台合作,为其提供高质量的原创视频内容。我们专注于自身的产品和技术发展,而不是一味地关注竞争对手。构建顶尖AI产品不需要大型团队,少数优秀人才即可胜任。我们公司注重培养人才,并为他们提供充足的资源。AI软件的定价机制仍在发展中,目前消费者和企业对AI软件的付费意愿较高,但未来随着竞争加剧,价格可能会有所下降。我们相信,我们的商业模式最终将更像传统的软件公司,拥有较高的利润率,这得益于我们对模型训练成本的控制和数据飞轮的建立。实现视频生成目标后,我们将继续探索AI技术在其他领域的应用,例如社交网络、影视制作和教育等。 Dwight Churchill: 在与竞争对手的竞争中,我们关注的重点不是技术本身,而是设计模式和用户体验的创新,从而重塑人们的工作方式。我们专注于为客户提供他们今天甚至都未曾预料到的功能,并将其商业化。我们通过有意识地细分市场,专注于沟通型视频的生成和编辑,而非其他类型的视频,从而在市场中占据优势。我们已经经历过其他公司试图模仿甚至恶意竞争的阶段,但最终凭借更好的产品质量胜出。构建先进的AI产品不需要庞大的团队,少数几个优秀人才就能取得世界级的成果。我们公司注重培养人才,并为他们提供充足的资源,让他们能够专注于创新和突破。关于AI软件的定价,我认为目前还处于早期阶段,难以预测最终的定价模式。但我们观察到,消费者和企业对AI软件的付费意愿都比较高。未来,随着技术的成熟和竞争的加剧,价格可能会下降。但高品质的、经过充分授权的模型仍然具有较高的价值。投资者对AI的理解程度参差不齐,需要更多关注AI技术在不同行业的应用,以及AI工具在企业内部的应用。许多公司都在探索如何利用AI来提高效率或降低成本。我认为,在未来,那些拥有最佳模型并能够持续改进模型的公司将成为赢家。这需要持续的数据积累和对数据飞轮的有效利用。 supporting_evidences Gaurav Misra: 'I also want to call out here, like there's a pretty fundamental difference between different types of AI companies that are out there. (text after 1st sentence omitted in dots)' Gaurav Misra: 'But what is going to make those models better and better? (text after 1st sentence omitted in dots)' Gaurav Misra: 'It has been a pretty interesting journey. (text after 1st sentence omitted in dots)' Gaurav Misra: 'That gives us significant advantage. (text after 1st sentence omitted in dots)' Gaurav Misra: 'It's interesting to think about because for the models that we train, they're diffusion models. (text after 1st sentence omitted in dots)' Gaurav Misra: 'You never know. (text after 1st sentence omitted in dots)' Gaurav Misra: 'I mean, it's definitely the most exciting that for anybody who's working on the engineering or product side, (text after 1st sentence omitted in dots)' Gaurav Misra: 'When you think about us, we actually divide the product in two areas. (text after 1st sentence omitted in dots)' Gaurav Misra: 'I mean, I think the most interesting thing that I've seen is a lot of new companies popping up. (text after 1st sentence omitted in dots)' Dwight Churchill: 'And that's the really exciting stuff. (text after 1st sentence omitted in dots)' Gaurav Misra: 'We've actually niched down quite a bit on purpose because as you said, like video is huge. (text after 1st sentence omitted in dots)' Gaurav Misra: 'Yeah, I think that will happen within six months. (text after 1st sentence omitted in dots)' Gaurav Misra: 'How do you think the value of these things will change over time as the cost and frictions to create them falls? (text after 1st sentence omitted in dots)' Gaurav Misra: 'I think we're kind of taking the unique angle on this generally, which is that we are training specifically on people. (text after 1st sentence omitted in dots)' Dwight Churchill: 'I'm really curious for like the sharper, rougher elbows part of building something so fast. (text after 1st sentence omitted in dots)' Gaurav Misra: 'Definitely. (text after 1st sentence omitted in dots)' Dwight Churchill: 'Can you talk through what you've learned about that, how it's changed? (text after 1st sentence omitted in dots)' Dwight Churchill: 'It takes a few good people. (text after 1st sentence omitted in dots)' Gaurav Misra: 'I don't know if we completely understand it yet. (text after 1st sentence omitted in dots)' Gaurav Misra: 'I think the entire world of investors, VCs, growth equity investors, public investors, (text after 1st sentence omitted in dots)' Gaurav Misra: 'It seems like this other category of more bounded problems have pretty normal, great business models. (text after 1st sentence omitted in dots)' Gaurav Misra: 'The way we think about it for our business specifically is that there is a bounded cost that actually solves this problem. (text after 1st sentence omitted in dots)' Gaurav Misra: 'So I think today we're really excited about accomplishing this particular mission, but I think the possibilities beyond that are practically endless.' Gaurav Misra: 'I mean, I think Snap had, as any company, a lot of good things and some bad things. (text after 1st sentence omitted in dots)'

Deep Dive

Key Insights

What distinguishes AI companies solving bounded problems like video generation from those tackling unbounded problems like general intelligence?

AI companies solving bounded problems, such as video generation, focus on rendering solved problems like CGI, making them more accessible and efficient. In contrast, companies tackling unbounded problems like general intelligence are solving an unsolved frontier, which may require continuous investment in larger models with no clear endpoint.

Why is video data particularly challenging for AI models compared to text or audio?

Video data is heavier, rarer, and more expensive to train on compared to text or audio. It requires significantly more storage and processing power, and there is less video data available globally, making it a unique challenge for AI models.

How does Captions' data flywheel contribute to improving its AI models?

Captions' data flywheel allows the company to continuously ingest and grow its video data, which is used to train better models. This creates a feedback loop where user-generated content improves the models, enabling the company to stay at the forefront of video generation and editing technology.

When could AI-generated video reach Hollywood-quality production?

AI-generated video could reach Hollywood-quality production within 18 months, driven by advancements in diffusion models and the scaling of parameters similar to the evolution of text models.

How does the training process for video models differ from text models?

Video models, particularly diffusion models, start from noise and gradually predict layers of clarity based on text conditioning. This is different from text models like GPT, which predict the next word based on previous context. Video models require significantly more computational resources due to the complexity and size of video data.

What are the key use cases for Captions' AI video editing and creation tools?

Captions' AI tools include AI Creator, which generates videos of people talking, and AI Edit, which automates video editing tasks. These tools are used for marketing, sales, education, and social media content creation, allowing users to produce high-quality videos without extensive editing knowledge.

How does the competitive landscape for AI video generation companies look?

The competitive landscape is intense, with many companies attempting to replicate Captions' success. However, Captions differentiates itself by focusing on A-roll video generation and building a data flywheel, which gives it a significant advantage in training foundation models for human-centric video content.

What are the potential pricing strategies for AI software applications in the future?

AI software applications may adopt a mix of subscription-based pricing and value-based pricing, depending on the use case. While consumer pricing is evolving towards higher subscription fees, B2B pricing may align more with the value of replacing labor costs or improving operational efficiency.

What lessons did Gaurav Misra and Dwight Churchill learn from their time at Snap?

From their time at Snap, they learned the importance of innovation, product-centric culture, and the CEO's intuition in driving success. They also gained insights into navigating highly competitive markets and the challenges of maintaining product-market fit in a rapidly evolving industry.

What is the kindest thing anyone has done for Gaurav Misra and Dwight Churchill?

For Gaurav, the kindest thing was his parents ensuring he was born in the U.S., which provided him with opportunities he wouldn't have had in India. For Dwight, it was his wife's support in enabling him to take risks and start Captions, which significantly impacted his career.

Shownotes Transcript

Translations:

中文

This year's presenting sponsor for Invest Like the Best is Ramp. Ramp has built a command and control system for companies' finances. You can issue cards, manage approvals, make vendor payments of all kinds, and even automate closing your books all in one place.

We did an incredibly deep dive on the company and its product as part of this new partnership. And what we heard and saw in customer surveys over and over again was that Ramp is the best product by far. We've been users ourselves since I started my business, since long before I was able to spend so much time with the founders of Ramp and their team. Over the holiday, I was with Ramp's founders. Those that listen know that I believe that the best companies are reflections of the people that started them and run them.

I've always loved the idea that Apple was really just Steve Jobs with 10,000 lives. Having gotten to know Ramp's founders well, I can tell you that they are absolutely maniacal about their mission to save people time and money. As far as I can tell, they do not stop working or thinking about the product and how to make it better. I'm sure they're proud of what they've built, but all I ever hear when I'm with them is them talk about what they can do to improve and expand what Ramp does for its customers.

I used to joke that this podcast should be called, This is Who You Are Up Against. I often had that same thought when I'm with Ramp's founders, Kareem, Zach, and Eric. I would not want to compete with these guys. I wish all the products I used had a team as hell-bent on making the product better in every conceivable way. I could list everything Ramp does here, but the list would be stale in a week. I highly recommend you just start using it to run your business's finances today.

This year, I'll share a bunch of things I'm learning from these founders and this company, and I think it'll make you realize why we are so excited to have this partnership with them and why we run our business on Ramp. To get started, go to ramp.com.

As an investor, I'm always on the lookout for tools that can truly transform the way that we work as a business. AlphaSense has completely transformed the research process with cutting edge AI technology and a vast collection of top tier reliable business content. Since I started using it, it's been a game changer for my market research. I now rely on AlphaSense daily to uncover insights and make smarter decisions.

With the recent acquisition of Tegas, AlphaSense continues to be a best-in-class research platform delivering even more powerful tools to help users make informed decisions faster. What truly sets AlphaSense apart is its cutting-edge AI. Imagine completing your research five to ten times faster with search that delivers the most relevant results, helping you make high-conviction decisions with confidence.

AlphaSense provides access to over 300 million premium documents, including company filings, earnings reports, press releases, and more from public and private companies. You can even upload and manage your own proprietary documents for seamless integration. With over 10,000 premium content sources and top broker research from firms like Goldman Sachs and Morgan Stanley, AlphaSense gives you the tools to make high conviction decisions with confidence.

Here's the best part. Invest like the best listeners can get a free trial now. Just head to alpha-sense.com slash invest and experience firsthand how AlphaSense and Tegas help you make smarter decisions faster. Trust me, once you try it, you'll see why it is an essential tool for market research.

Every investment professional knows this challenge. You love the core work of investing, but operational complexities eat up valuable time and energy. That's where Ridgeline comes in, an all-in-one operating system designed specifically for investment managers. Ridgeline has created a comprehensive cloud platform that handles everything in real time, from trading and portfolio management to compliance and client reporting. Gone are the days of juggling multiple legacy systems and

and spending endless quarter ends compiling reports. It's worth reaching out to Ridgeline to see what the experience can be like with a single platform. Visit RidgelineApps.com to schedule a demo, and we'll hear directly from someone who's made the switch. You'll hear a short clip from my conversation with Katie Ellenberg, who heads investment operations and portfolio administration at Geneva Capital Management. Her team implemented Ridgeline in just six months, and after this episode, she'll share her full experience and the key benefits they've seen.

We were using our previous provider for over 30 years. We had the entire suite of products from the portfolio accounting to trade order management,

reporting, the reconciliation features. I didn't think that we would ever be able to switch to anything else. Andy, our head trader, suggested that I meet with Ridgeline. And they started off right away, not by introducing their company, but who they were hiring. And that caught my attention. They were pretty much putting in place a dream team of technical experts. Then they started talking about this single source of data. And I was like, what in the world? I

I couldn't even conceptualize that because I'm so used to all of these different systems and these different modules that sit on top of each other. And so I wanted to hear more about that. When I was looking at other companies, they could only solve for part of what we had and part of what we needed.

Ridgeline is the entire package and they're experts. We're no longer just a number. When we call service, they know who we are. They completely have our backs. I knew that they were not going to let us fail in this transition. Hello and welcome everyone. I'm Patrick O'Shaughnessy and this is Invest Like the Best. This show is an open-ended exploration of markets, ideas, stories, and strategies that will help you better invest both your time and your money.

Invest Like the Best is part of the Colossus family of podcasts, and you can access all our podcasts, including edited transcripts, show notes, and other resources to keep learning at joincolossus.com. Patrick O'Shaughnessy is the CEO of Positive Sum. All opinions expressed by Patrick and podcast guests are solely their own opinions and do not reflect the opinion of Positive Sum.

This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of Positive Sum may maintain positions in the securities discussed in this podcast. To learn more, visit psum.vc. My guests today are Dwight Churchill and Gaurav Misra, co-founders of Captions, which uses AI to generate and edit talking videos and has grown to significant scale at remarkable speed.

We explore a key distinction in AI, tackling bounded problems like video generation versus unbounded problems like general intelligence, and what this means for building sustainable businesses. We also explore their unique data flywheel, why video generation could reach Hollywood quality within 18 months, and why building advanced AI products doesn't require huge teams. Please enjoy this great discussion with Dwight and Gaurav. And a key side note, the first person you'll hear is Gaurav. ♪

So guys, the topic on everyone's mind, I think, is this shift from AI as this incredible technology that's decided everyone understands how amazing this is to, okay, great. What are we going to do with it? And how can we build enduring generational businesses with this technology at the core?

You were very early in building a business that charged customers very early on using this technology. Maybe you can begin by just riffing on the lessons that you've learned so far about building an AI business that are maybe distinctive from a normal software business or something, and also get into some of the open questions that you yourselves have.

trying to evolve your business model. I just think this is becoming the important question in the marketplace right now. And you're one of the earliest adopters. So you're the perfect people to answer.

Getting into it, I think the first question behind the question that comes to mind is what exactly did we actually achieve with this AI revolution? What is actually the difference? AI existed before and it exists today. Obviously, there's something magical about what is there today. I think when you get into it, you realize that it's really about the ability to train larger and larger models. Yes, that's actually a combination of we have better hardware to do it. We

We have better ML architectures, like there's transformers, there's diffusion models. There's all these new types of architectural unlocks that we've created. And then there's other techniques that we've created too, which just allow us to train larger and larger models. And turns out the larger and larger you make these models, the more problems they can solve, the better they can be at solving like text generation or like towards AGI or video

video generation or media generation in general. I think when you realize that, what actually you get to is that what really matters is the data at the end of the day. A lot of companies are like scraping the internet and the internet is also limited in some ways. There's only so much information on the internet even, and that's growing every day. But I think at the end of the day, beyond that, we're going to have to find what are those sustainable sources of data that can continue to grow bigger and bigger models. And

I think that's going to be the fundamental question behind who actually ends up winning in a lot of these different areas that AI is excelling in today. I think for us, being on the video generation, video editing side, it comes down to video data, which is actually much heavier, much rarer to find, not as common as text or even audio, and potentially much more expensive to train on as well, much more limited in terms of being created in the world. And so that tends to be a big challenge.

One of the big things that we're thinking about is how do we actually create a flywheel where we can ingest data on a continuous basis and a growing basis, and that data can actually

create bigger and bigger models for us and keep us at the forefront. I also want to call out here, like there's a pretty fundamental difference between different types of AI companies that are out there. I think if you look at a lot of the text generation companies, they're not solving text generation, like we don't call it text generation, they're actually kind of solving a totally different problem, which is intelligence.

Intelligence is an unsolved problem. No one's figured that out yet. And yes, we're achieving some levels of intelligence in these models. And there's a long way to go. It may not end at human intelligence. There's people in the world who are really smart. There's people in the world who are not so smart. They both exist. And clearly, there's a range of intelligence that's possible. There's not one value for like you're intelligent or not. So yeah, is there a chance that there's the ability to go smarter than the smartest human? It's possible. But that's a frontier that we've never reached.

And so it's kind of solving this unsolved problem. But I think if you think about audio generation or video generation or music generation or these types of things, right, it's, I think, a little bit less of solving an unbounded intelligence problem and a little bit more of solving actually rendering a solved problem. And video, for example, like CGI exists. We can make fake things. We can make fake humans. We can make fake sceneries and dragons.

And so this is a solved problem. We know that there are solutions to these. And with AI, we're actually just making it easier to solve these problems. Not just a little bit, but like 100 times easier, which in the end, that means more accessible, larger market, more people can use these types of technologies. So I think that's one of the fundamental differences there is if you look at business models for like artificial intelligence companies that are really working on AGI, then you kind of have to think about this unbounded problem of like, okay, we put in a

a bunch of capital into it. We create a model only for that model to be beat by the next model and that model becoming essentially useless and obsolete. And then there's the next model after that. And how long does this go on for? Actually, we don't know. It may go on forever. There may be like no end to this intelligence race. Whereas if you look at the media generation companies,

it actually is creating an asset. And there might be very soon a point where, oh, wow, it's just really good. It's just perfect or close to perfect. And we've kind of solved it. And then it's an asset. And then after that, it's just a software company. And the asset's really expensive to create. But once it exists, it just generates value. And it doesn't lose value that easily. So what is going to make those models better and better? I think it's going to be like fine tuning with more data, fine tuning for specific use cases, different...

different types of things you want to generate, different types of visuals, whatever it might be. Use cases like, oh, it's going to be used in ads or movies or social media or something else. But there may be a point where it's like, wow, yeah, this is pretty good. It's realistic. I think that's a pretty important thing we're thinking about right now. How do we bootstrap that data flywheel to be able to reach that level?

What is it like to work with video data where I imagine like just the terabytes or petabytes or however you measure it of data that you have is sort of insane? How do you think about something that might just get as good as it can get? I love the point that if you give a Hollywood studio or WADA or something enough money, they can literally create any visual that you can imagine.

the friction between imagination and output is already gone. It's just really, really expensive. So really what you're doing is just making something cheaper. When do you think that could be achieved?

I think it's pretty soon, honestly. I mean, at the rate at which video models are growing, I mean, you probably remember seeing like the Will Smith spaghetti thing. Everyone's seen this meme, right? And it went from like really horrible to like, wow, this is actually good. And I think really, really good is probably around a year, year and a half away. I only say this because if you compare, for example, like text models to like video models, text models are already like in the 400 billion parameter range.

People understand better how to scale LLM technology today just because more money has been put into it, more time has been put into it. Like diffusion models, still in the tens of billions. It's still early, not even close to the text models. So as that grows, there's just no doubt it's going to get better and better. And like the experts kind of know that this is all possible. It's just that very few companies in the world have the funding

and the expertise to actually go after this. So like, it just takes some time. Like, it's not like some unsolved problem. People know what needs to be done. It's just, we're all getting there. We're all moving towards it. And we'll see those models getting better and better, especially on the video side. I could easily see within a year and a half or so, something coming pretty close to like indistinguishable, essentially from like a real

recording, maybe even sooner. That's not like the worst case. Yeah, I don't think people are entirely able to grasp that yet. I think the way that that influences how they do their work every day, the workflows that end up getting reinvented, new paradigms of all that, which is arguably part design problem, part just product problem in general. This is pretty close, the timelines that Gaurav was talking about. People are experimenting today. It's extremely early. And I think that

companies adoptions and stuff around that. We're not far off at all of really reinventing a lot of how people end up doing their everyday work. Can you describe the stages that you've gone through as a company? Maybe we'll use like the Tesla analogy.

One of the beautiful things about their model is the cars, by virtue of being driven, are gathering data all the time. The product itself naturally generates data exhaust. And I think you've had a somewhat similar story on the video side. And so you don't need YouTube or some massive proprietary library of video to do what you're doing. Can you just describe, take us back to the day one of the business, what it was to start, why you started there, and then how it's progressed since? Yeah.

It has been a pretty interesting journey. And like, we've been through some interesting twists and turns through it. But I think if you like, connect the dots end to end, it's interesting. When we started the company, the first app that we made was captions, we launched it. And why did we make it? The goal was to get content creators to create content on a video creation platform of some sort. Not easy. I was a Snap before this and Snap had

tried this many times. They launched apps. And I mean, video is kind of a commodity. Video editors are commodities. A lot of these companies are actually foreign. And that's because we're just trying to minimize costs at this point and really difficult to compete in.

Our thought was the way we're going to crack this is we're going to use AI to help create video somehow. That's going to be our differentiator. That's why people are going to come to us. And so we saw that there was a need around speech to text. It was a technology, by the way, at that point, that was pretty good. In tech circles, people were like, of course, speech to text, we understand that it's pretty good at this point. But I think the average person actually didn't understand how good the tech had gotten.

and how accurate it was with names and like obscure terminology and all kinds of stuff. So when we built the first product where it was just like, hey, it's just literally put text on the videos. And by the way, this was built in like two days on a weekend, really just band-aid together. And we put it on the app store, went to sleep. The next morning, it was top of the app store. There's no explanation. We didn't do anything to make that happen. Somebody saw it. They posted it on something. It blew up and then woke up and I text Dwight. I'm like, hey,

I think there's like 600 videos per minute being created on the app, by the way. And so that was kind of like an instant success. But even in that two days of work, we had already instrumented the app in such a way that we would be able to continue training better and better models so that we can deliver better value to the user. So the idea was like, the app is an AI app where people come in,

They use the app. We use the data to make the model better and deliver even better experiences the next time the person comes back. That was done from day one, literally. That was the original plan. Now, post the launch of the app, we've added so many more features over time, expanded the offering so much more. And we cover now the entire space of everything from like script writing to recording to video editing, distribution as well, and how AI works.

can like transform each of these different areas, because there's applications in all of them. And there's data that can be collected across all of those that can improve those models. And that's what makes our offering really unique, because all the other companies are not really thinking about the data collection side and just generating outputs. And that's why they have to kind of scrape the internet to make their models better. And for us, really, it's more about

growing a user base so that the data can actually power better and better models. And a lot of that comes through like video. So video being funneled directly into video generation models.

That gives us significant advantage. That's potentially a possible way in which a future sort of business model could be set up. And actually, it's kind of familiar, by the way. It seems to me similar to the Facebook or Google business model, where you have a mass consumer free product, basically, and the data is used to power essentially like a B2B paid product.

If you think about the literal process of training, maybe you can explain it to people that are curious about how this actually works. So you have raw video. A lot of it has voice in it. You can start it, obviously, by translating that voice into text. But let's...

But let's say you're trying to train a model. I like how you guys referred to many people like Asura focusing on what we'll call B-roll background video or just like a landscape of video. And your focus has been on A-roll, like a human being on an iPhone looking video talking. How do you train a model where the output is A-roll like that? Like just imagine a portrait video of someone reading an ad read or something like that.

that's indistinguishable from an actual live video taken on an iPhone? What is the literal training process? What is the target of the model as it's training?

How similar or different is this to just next token prediction? What's the mental model for next X prediction or something in a video? Like, how do you think about the literal actual training process of what's happening? It's interesting to think about because for the models that we train, they're diffusion models. So they actually work by starting from noise. It starts from literal noise, like static you see on TV at every step based on text that's provided.

It looks at the noise and it tries to like predict a layer of clarity in that noise. It says man wearing blue shirt. So it starts to like draw a little bit of man wearing blue shirt out of noise. And then every pass it's taking through it, it's discovering a little bit more of the man wearing blue shirt. So that's the text conditioning that's helping it decide how to reach the destination of what man wearing blue shirt looks like.

So, that's how the diffusion models work, which is slightly different from like how a next token prediction model like GPT works, which is kind of just as you might think about it, just predicting the next word based on all the previous words that have been spoken, which are considered the context. So, these models are different. We are still earlier on in the diffusion model training path. We're still in that 10 billion, 20 billion, 30 billion. Meta's movie gen was, I believe, 30 billion parameters.

People haven't really scaled this up. We actually don't know how big OpenAI Sora is. They didn't, I think, release that information. But a lot of the work is going to go into scaling up these things. Video obviously is really heavy. That's what makes it different from text. Consumes a ton of space, a ton of processing. For us, even if we were to download, just download all of our training videos, it would cost us a million dollars to download the training videos. That's a whole different regime than like text.

It brings different types of challenges to training these models, basically. What does that mean in terms of the sap on resources that video models will represent relative to text models? Like one of the big discussions in public markets and private markets is how big do the GPU farms need to get? Are these video models to get to that point of perfection necessarily more consumptive of GPUs than text models would be? What's your two cents on this big question of

Do we need to build nukes next to data centers to train the perfect Lord of the Rings model or something? You never know. But honestly, I think what will save us on the video model side is actually the fact that it is an easier problem than the text problem. The text problem is intelligence, as we're talking about. And the video problem is more rendering. We already know how much rendering costs. We already know, yeah, it's GPU intensive. If you were to literally CGI render a scene out, yeah, it will spend some time on the GPU. There's no doubt.

Can we be more efficient than that? It's possible. It may not be the most efficient today. Maybe there's better ways of doing it. Maybe AI will be cheaper and faster than regular rendering. And I think if that's the case, then that's a good thing. But I think we know that it shouldn't be worse than that. We should be able to solve it with fewer resources than that, potentially, or at least the same. We generally understand where it's going to fall. It's still early. Just like on the training side, we're still scaling up these models and it's still, oh, it's 10 billion parameters, 20 billion parameters, whatever.

On the inference side, it's similar learnings happening simultaneously. We don't need to do 100 steps of diffusion for inference, like 100 denoising steps to reach a clear picture. We can distill models and have them work with a few steps of diffusion now. I think we're definitely the most inefficient we'll ever be. And it's only going to get more and more efficient. It could be a factor of at least an order of magnitude like 10x or something.

Can you talk about the felt experience of having a business? We won't quote how big the business is, but it's very big and it's grown ridiculously fast. One of the things you hear that's a common idea now is that a new technology like this unlocks distribution. Distribution used to be really expensive late in the mature last SaaS cycle or something. People sort of had their tools. But when tools are just 10x better or more, 100x better,

distribution for a time becomes really easy. I think you've been beneficiaries of that unlocking of distribution. Just talk about what that is like. What is it like to see revenue and users and all this stuff scale at this pace? Because it seems like the revenue growth rates of some of these AI application companies are faster than anything we've ever seen. And I would just love you to riff on that a little bit and one, describe what it was like, but also just reflect on anything that it teaches us.

I mean, it's definitely the most exciting that for anybody who's working on the engineering or product side, I feel like there's nothing more exciting than seeing direct results of I did a thing and the next day it caused an impact, right? Like people cared. There's just nothing more exciting than that. And I think we see that, which is great, which is why we've been able to build a great team and

hire all this great talent, really set us up for success. But I think maybe the most interesting part of it for me is how you can almost see how as you're expanding the use case, it's actually growing the potential market. And that potential market has no competitors. As you expand the use case, you kind of see we're doing ads now, or we're doing like higher quality video, even on that axis.

And you see like entire new areas of market unlock where there's actually no competition. And actually, that's what causes the fast growth is actually nothing other than just we are the only company that can do something for a period of time. And that will change. And I think that's why it's going to be interesting to see as more and more use cases unlock at some point, all of it's going to be unlocked. All of it's going to be having competition in it. That'll be a different time. Might be years from now. I don't know when it will be.

But at least for now, what we're seeing is this ability to expand use case. And by the way, we really think that the use case unlocked so far is somewhere in the 1% to 5% range. We've barely scratched the surface of what's possible. As that grows, we see these entire new markets unlocked. Like, wow, this is a whole new set of people who can now do something actually useful with this. Yes, they're completely willing to pay. They're running to us. We don't even need to sell it. And we're the only option. It makes just growth really fast.

I think that's been probably the most exciting thing for me. Can you level set what the platform can do today? The major use cases, like everyone can imagine feeding it a video, getting a captioned video back. That's very simple. Can you lay out the other ones and give us a sense for their relative popularity? What is the revealed preference of how people want to use a platform like captions?

When you think about us, we actually divide the product in two areas. So there's the traditional video editing and video recording, which is just as you would expect it. It's a video editor and a video recording software. And this is built for consumers, completely free. The play here for us is to provide

provide a service to a large number of people, kind of a freemium business model in a way that they're already familiar with creating. But the goal is to actually upsell them into the AI use cases. You actually don't need to spend all this time video editing and recording. You can just generate it. So on the flip side of that, we offer the AI suite, which is two products, AI Creator and AI Edit. These are exactly mirroring recording and editing.

AI Creator literally just makes videos of people talking, whether that's you or an actor that we provided, or anybody you might choose that you have the license to use. We can make them say whatever you want, deliver whatever message you want. And we can even create people that don't exist. So in between that, you get a bunch of optionality of how you want your message to be delivered. A lot of the use cases like marketing and sales and these things are like very close to revenue.

And then there's AI edit. Just a recorded video isn't exactly enough to create value. You want it to be edited in some way to tell a story that you want to tell. That's why we have AI edit. The purpose of that is take a video in and edit it for you. You actually don't have to worry about keyframes and animation curves and timelines and all these concepts. Video editing is not easy. And a lot of people avoid it because they just don't want to deal with this complexity. And our thesis is

We have a foundation model that just does the editing for you. So you don't have to worry about actually editing anything. So that's the suite of products, basically. The traditional versus the AI. And in the AI, we have AI creator and AI edit.

Just to clarify, like in something like edit, am I prompting it to say I want it to do this specific thing? And then is it sort of like prompting? That's where it will go in the future. Currently, it's more style preferences that you provide to it. So it's in the early days of that. A lot of what will happen in the future is as we get more video editing data from our traditional products, we're going to use that to funnel into our foundation model, the ability to essentially prompt with text, whatever you want to say, say things like,

I don't like these images. We want like different images with a better vibe or let's cut it down to like 30 seconds, like 45 is too long or it sounds a little slow. We want to tell the story a little faster pace, general prompts, what you might actually say to an actual video editor and the type of thing that someone who doesn't have the detail and intricate knowledge of video editing might say. So what is like the relative breakdown of what tools people use?

the free versus the paid and the editor versus creator? Like, how does it shake out? Yeah, so today, like a vast majority of our users are paid users. In between AI creator and AI edit, they're both about equally popular. There's some people that just use AI edit. There's some people that just use AI creator, depending on the use case. And then there's a bunch of people who use both one after the other. So using both lets you basically get from absolutely nothing to a fully edited video with just a couple of words typed.

which is a great first time experience. Now, some people might want to just record their own video, or they might be editing on somebody else's behalf or something like that. So they might actually like take a real video and pass it to AI edit to be like, okay, I already have a video, edit this for me. And the AI creator side, some people don't want the editing or they have very specific use case of what they're trying to do with it, they want to just figure it out on their own. So they just do the AI creator part, a lot of times it's like using their own likeness. So

So they can just mass produce videos of different types. But it also can be like using one of our actors. A lot of that is marketing content and things like that. Things that go on social media, but also like ads and anything that might be marketing related. So those are sort of the relative popularities. I would say like they're about equal. Does it feel like you're in...

an arms race right now with other companies? To an extent, yeah. I mean, I think the most interesting thing that I've seen is a lot of new companies popping up. All of them are trying to do the same thing. Like, I'll give you an example. I was at Snap before this. Literally five other people have left Snap and tried to start the exact same company. Yeah, it's working. We should be doing that thing. It makes sense. I don't blame anybody. Like, I think it's great that they're doing it. But I think what I...

like about it kind of in a way the most people are copying us. I think it's like a great sign. It means that we're doing the right things and we kind of avoid looking at other companies too much. Our product strategy and what we build and what we do is really decided by our mission and vision and where we see the future being. It shouldn't be decided by what somebody else is doing because they may not have a strategy at all. We don't know. Their strategy might just be looking at us. So a lot of times we'll look at competitors only to the extent of understanding, okay,

okay, this is what they're doing. What we really focus on is thinking about our North Star and where do we see the future being? And are we building towards that future? Not just from a technology perspective, from a product perspective and a user experience perspective. And I think that's the fun part, right? I think that's so much fun. When do we get a chance in history to actually invent the entire stack from the bottom to the top, all the way from the hardware level? Like there's bugs in the NVIDIA drivers. There's bugs in the hardware level. Like it's crazy. And we get a chance to like literally invent

The UX, how are people going to interact with these things? Like, I think people are not even thinking enough about this yet. They're just literally taking models and throwing it on UI and be like, press button, output. What if it was more interactive? What if you could see the steps of diffusion? Or you could like preview things in the middle of the diffusion process, change things according to like what you want it to generate. There's just so much that's still to be unlocked.

every function, whether it's design, learning about like how the technology works or technology people learning about like how the marketing is going to work. This is going to get so much more evolved and so much more integrated. And that's what we focus on. I think the arms race is ensuring that we're delivering always way in front of what our customer even needs today. Whenever we releasing something, it gets commercialized on day zero immediately. We're not

We're not like testing it with a bunch of people and seeing what they need and seeing if we're actually solving anything. No, no, no. We're building this for their work. We're incredibly ingrained in how they do their work, whether you're a large enterprise or all the way down to the free consumer. Ultimately, to Gaurav's point, by inventing those design patterns and the way someone can interact with these new models, we're literally paving the way for how people interact.

even think about doing their work. And that's the really exciting stuff. That is the arms race in my mind, but that's not necessarily against another company. What are the trade-offs that you've had to choose one way or another as you build? Video is a big category. That could mean I get to make a Lord of the Rings quality movie, or it could mean something much more provincial than that. We've actually niched down quite a bit on purpose because as you said, like video is huge. It's like a massive market and it's almost too many problems to solve.

I don't think if we tried to focus on everything, we would solve all of these things. So our focus is very much on videos oriented around communication. These are talking videos, people saying stuff. A lot of it tends to be marketing, sales, education. These are the big categories, or maybe communications to some extent. And

It's about generating those types of videos. It's about editing those types of videos. But I think generating stock video is fine. I think that's a great thing to solve. But our goal isn't to create stock video. It's actually to create a real video telling the actual story of whatever it is you're trying to convey. So not just bunnies jumping around on Mars type of thing, more like telling a story, pitching a product or whatever that might be.

something really communicative, informative. And that's where we've seen a lot of our product market fit. We're actually the only company training a foundation model to do this type of thing today, to generate A-roll. There's a couple of technical reasons why that's the case. There's other companies in the space, but they're not training foundation models. So we'll see how the space evolves in the future. I think it will actually tend more towards what we're doing. What are the surprising...

hard limitations of what the models can do today or might be able to do in a year. Like I'm imagining we're sitting at this table, there's a bunch of stuff on the table, my specific brand of water bottle or something. I want to tell the thing to be able to like hold it like this certain way. And like, I want to be able to sort of direct an object that's not the person, but that interacts with the person. Is something like that relatively straightforward? Yeah, I think that will happen within six months. Guaranteed, essentially.

We'll probably start seeing the first versions of this coming out within months of now. How does that work? Are you creating like a 3D representation of this thing somehow? What are the steps that go into the ability to create something like that?

You have to find training videos where people are already interacting with objects, you drinking a can of Coke or whatever it might be. And then you have to be able to identify those objects and then provide them as conditioning. So for example, it might be text conditioning. So if you can like adequately describe this particular can of Coke in text,

That might be enough, but it may also not be right. Like Fiji water bottle has a very particular design. Unless the model has seen one before, it may not be able to precisely recreate it. And text might not be enough to describe what it looks like. So you might imagine like image conditioning. Here's a picture of a Fiji water bottle and then text that says,

man in blue shirt holding Fiji water bottle. And then it'll be able to figure out the rest from there because it's seen bottles in general and understand what bottles look like. If it sees it from one angle, it can predict what it looks like from the other. So if you're like rotating it around and moving it around, it'll guess essentially what it probably looks like on the other sides, but it'll be pretty accurate because you can see the bottle from one angle. You could imagine a world in which we provide multiple angles of the bottle just to make it a little bit more accurate. Maybe there's something on the other side that isn't visible in one image that you want to make sure is like clear to the model.

So those are the types of things that are just obvious. This will be the first of what's going to happen. How do you think the value of these things will change over time as the cost and frictions to create them falls? Humans are really good at scarcity and assigning value to scarce things and

And so a beautiful video that shows a product was valuable because it's costly to create in some sense, probably. How does the availability of perfect, high fidelity, unbelievable quality video at a moment's notice? How do you think that that changes the value of the video itself? And I'm just curious of like other knock on effects of what you're doing that you've thought about.

I mean, I think it's interesting. One comparison you can kind of draw with this is if you think about the 2010s, generally, like it was a phase of design really taking off. Companies like Canva and Figma were created in this decade. And not just that, but there were a lot of like companies that were doing make a website with a few clicks. It looks awesome. Great design websites, just like one click away. This was an AI. There was a huge movement to just like

If you want to sell something on the internet, if you want to have a business of any sort, you need a great design website. If your website looks like it's from the 1990s, no one's going to buy anything from there. I think that's cool again now. Yeah, it is cool now. Yeah. Which is crazy how fashion moves, right? It all moves in cycles. That's right. Yeah.

There's almost nobody that has a bad website anymore. But that doesn't mean that having a good website is not valuable. It's still valuable. If you don't have a good one, then you might still suffer today, even though it's like commodity, essentially, everybody should have it. The video is more worth taking out this decade. I think we'll see more and more people adopt it. It feels like there's a lot of people adopting it today, but I think it'll be even much larger than that because the portion of creators within the video ecosystems will grow. More people will be creating it and potentially even more people consuming it.

So I actually think that the value of the video will not shrink exactly. It'll still be high quality video will be high quality video. And it'll be a requirement if you want to like market, sell or whatever you're doing. But I do think that there's going to be other things about video that are going to become more valuable. So for example, if you think about likeness, if models can just generate likenesses of people that don't exist at a whim,

and they look like great people, people you would want to represent your brand. You could even own a likeness as an IP of your company of a person that doesn't exist and have them be the spokesperson of the company. That sounds awesome. That sounds great. But that means that the value of the likeness is just going to zero. The average likeness is not worth anything because anyone can make one out of nothing. And what does that mean for the

The cost of likenesses in general or on the high end, I think is going to be determined by who's known. A likeness that is actually known by people, trusted, understood by thousands, hundreds, thousands, millions of people is valuable now. It's much, much more valuable all of a sudden. And by the way, that person may not have existed either. Someone might create a completely fabricated person

Post videos and stuff become famous. Yeah, little Michaela was way ahead of exactly shout out Trevor McFerrin's way ahead of the times. Yeah So that doesn't sound crazy in that world. I mean, I think you can go crazy with this stuff What are the surprising limitations of these things? What would people be surprised that they have an especially hard time doing? We've all seen video models struggle with people at the end of the day fingers. Yes fingers arms drinking. Yeah, and

Olympics. Spaghetti. Yeah. I think we're kind of taking the unique angle on this generally, which is that we are training specifically on people. Our data is all people. And we are specifically generating people. We also are going to have conditioning, the ability to provide like a skeleton, for example. This is the exact animation I want to play out. This is the exact TikTok dance I want you to do, for example. It'll just make it happen.

And that actually makes the model much more likely and better to be able to learn what human anatomy looks like and what's normal and what's abnormal. People do have six fingers. It does happen. The model doesn't know that. Obviously, it's not that that's the training data that's like causing it, but it may not fully realize that if not enough training data has been given to it.

that shows hands in all kinds of configurations and doing all kinds of things. So our goal is to solve that human generation problem, like just actors essentially in general. The scarcity aspect too is that some of these are not new problems. The corollary around movies is that a Michael Bay film, $250 million budget or something like that blows up half LA, Transformers or something, I don't know.

Tons of people go out and see it, blockbuster film, but all those people are paying $25 per ticket or something. The same thing happens for a low-budget film if they can get into the box office, but the ticket price is the exact same. I'm actually very excited about a world in which lower-budget filmmakers and video creators in general can just create more and do more complex things with not necessarily the budget restraints. That's a massive hurdle for film creators and just creators in general.

I think it just up-levels everyone. I think the craft maybe shifts a little bit or this and that, but those high-budget films, as Gaurav mentioned, technically it's generated. It's not synthetic or it's not real. Some of those things are real, I realize, but it maybe even creates more premium on some of those aspects. What does it feel like in the competitive landscape to have established something so successful and important? I love the idea that companies pass a level of maturity when someone else tries to kill them for the first time. Have you had that experience yet?

I'm really curious for like the sharper, rougher elbows part of building something so fast. Any experiences like that that are interesting?

Definitely. I mean, I think with all these types of things, we're always let's go with our mission and like not worry about what others are doing. But yes, a lot of people care about what we're doing. In fact, I think I would say in terms of bigger companies, I think we're seeing an interesting evolution. Like we kind of fall in an interesting spot where we semi collaborate with a lot of social networks because we're beneficial to their growth. We create content and all social networks need content. And we have un-watermarked content, content that is original.

And this was a big problem for Instagram. Remember, like when they launched Reels, everything had like a TikTok watermark on it. And it was recycled TikTok, basically. But we have a lot of that type of good content that's being generated on our platform, by the way, like hundreds and hundreds of thousands a day that's going to social media. And so we end up being a valuable partner for a lot of social networks. And we've seen like the social network landscape evolve in that sense. A

a lot of VCs ask the question, like, what if Facebook copies you? What if Google copies you or something like that? And I think what we're starting to see is like, Google and Facebook are not the copying companies anymore. They're not copying anything. They're just doing their own thing. And the copying company actually is TikTok or ByteDance more generally. I don't know how this shift exactly happened. Facebook suddenly became the good guys. I think Mark Zuckerberg is a hero now for

for putting all these models out, making all this open source stuff. Suddenly his vibe has completely shifted. And then I think TikTok has become essentially what Facebook was. Capture, kill, destroy everything that exists in every market that exists. Don't collaborate with anybody. And I think it'll be interesting to see like how that plays out. Obviously, there's many talks happening about a ban and all this kind of thing. We'll

We'll see where all that goes. But their leadership is very well aware of our existence. And they have tried many, many times to try to kill us. To their credit, they were the first to be aware of our existence of anybody else. What does that look like, them trying to kill you? Literally just copying the product? Blame and copying. They literally were to the extent of copying our app store description, our website, exactly putting that in their press release word for word, copying our brand colors, exact, precise brand colors, pretending to be us.

beyond anything you would imagine. And just kind of crazy that like a company of that size would even try these types of tactics. But at the end of the day, like the software that they just create is just very mediocre. And it just works because they have great distribution through TikTok. And I think we win because we just have better product.

It seems like early days of all these models getting built that research talent was one of the most important scarce resources in extremely short supply. Can you talk through what you've learned about that, how it's changed? Is it still a handful of people that you really need a couple of them to be able to build the cutting edge thing? What is the role of extreme research talent in building the models that fuel all this great product?

So the talent side is still, I think, evolving. I don't think it's completely solved for what it's worth. As the use cases are growing, as more and more people are realizing what's possible, as more companies are getting started trying to solve similar problems, there's only going to be a more and more of a shortage of talent. Talent isn't created overnight, right? Like it takes years and years of experience before someone can be considered experienced in an area.

And I think we will still see continued pressure on that talent side, especially for building

building generative models and foundation models and things like that. Obviously, the more VC dollars that are poured into this area, that'll have an effect. But I do think, interestingly, it doesn't take an army to build this type of stuff. It takes a few good people. And it might take an army to scale it and really make it big. But to deliver world-class results can be done with maybe a dozen people or less. And you can beat everybody in the world with a team that small.

if you had the right ingredients in place. A lot of the challenges becomes like finding those people. You can't get it wrong. You want a very specific set of people. You want a specific skill set. This is all new. So very few people have experience in it. It's all cutting edge. There's discoveries and inventions like happening every day, every week. So the more you care about it, you want to find people really close to the cutting edge who really know what's happening today.

And what are the small little wins and techniques that will get us the edge over everybody else? So it still is a challenge. What our duty ends up being then is we have to give them all the resources in such to be able to do their work. There are a lot of these folks, you know, in AI labs that can't release anything they're working on for better or worse, at least from that place's opinion. And when you're able to bring

bring some of the ingredients that we have, whether it be like compute data, the environment, it ends up not being that complicated in terms of recruiting negotiation and stuff. How do you think these products will price over time? This is like always the weird question with software where the marginal delivery of it costs nothing. A lot of people have been talking about, let's look at, let's say, Accenture's market cap or something like that. And it's a $250 billion company basically selling labor.

very high expensive labor, important labor. Do you think that AI applications will take labor budgets and be priced like heavily discounted labor because that's what they're doing and replacing? Or is it just going to end up pricing like all software does? We've run these playbooks for 20 years and we kind of know how to do it. What's your sense of how people should think about pricing AI software applications and what its equilibrium state will be in the future?

I don't know if we completely understand it yet. Basically, like I think it's almost too early to tell in a way because we aren't completely able to replace labor, all the different aspects. So we don't know what people will be willing to pay for it yet. In the use case graph, we're like a four, three, four or 5%, whatever, something in that range is just early. And we aren't able to like fully replace certain workflows or like very operationally heavy like processes that might exist in companies and stuff.

And we will get there slowly and steadily. We're moving towards that. And I think we'll see what people will be willing to pay for that. So I think we'll figure it out. One of the big questions there is like, how does that split between consumer and B2B? I think consumer pricing is evolving pretty clearly. I think we're starting to see what that looks like. It seems like it's coming down to consumer subscription.

And it also seems like people are willing to pay a little bit more than they would have otherwise. So for example, traditionally for like video related apps on the App Store, for example, web apps and Android, whatever, the standard price is somewhere in like the $7.99 to $12.99 range. That's just considered normal. And there's a freemium business model to it. I think what we've seen different is like,

For us, for example, we have been for a long time, like we were completely premium. There's no free product. You cannot even use it once for free. And that worked just fine. People were like, okay, whatever. Here's money. Let's move on. And so that wouldn't have worked in an older world. Without the newer technologies and stuff, people would have been like, yeah, I'm not paying for this, right? I'm moving on to the next one. I think the other thing we're starting to see is can we charge $25 a month? Yes, we can. People are paying that. And

And so people are clearly willing to pay much higher prices. Like if you look at a lot of different AI companies out there, the video generation companies and stuff, they're going across this range too. And people are paying all these prices.

People are paying up to like $2,000 a month consumer subscription. I think there's a lot more ability to go higher on the subscription pricing than there was previously. Now that might change. I think a big factor of that might be like, there's just not enough competition still. Still might be like, there's only maybe one or two models in the world that are like,

off that quality and people care about that quality. So you really don't have a lot of choice. Maybe if there's like a ton of these Tesla models floating around, maybe the price comes down in the future. So that's what we're seeing on the consumer side. And then on the B2B side, I think that's where we will figure out a lot of it, right? I think

Some of the big things that need to be solved there is will businesses buy models that are trained on unlicensed data? That's like an open question. And yeah, they are to an extent. We'll see kind of how all that plays out. We're planning to go much more on the fully licensed side. That's going to be one of our main differentiators because we are uniquely positioned for that. We actually collect data at like a massive scale so we can actually train fully licensed models.

My feeling is that towards the end game, not today, but as this area gets very saturated, so maybe many years from now, I think things like having fully licensed models will factor in because you'll be able to win on that very easily in like a competitive deal. And people will care about those types of things. And people might even be willing to pay more for that type of guarantee or just like the reps that it's licensed.

And then I think besides that, it really just comes down to like how much of the use case we'll be able to cover. And that's the big question. Okay, we're at 5% today, but is the limit 100%? Is it 75%? Is it 50%? Where does this stop? My guess is we can go all the way to 100%, or at least very close to, just because it's a solved problem. We know that this is solvable. And I think if we can get there, I think a lot is going to change about how video workloads work in the world.

And the pricing around labor and hot topic right now is seed licensing versus labor or like aligning to labor costs or something. I think people are maybe rushing into the labor argument that it actually has a very similar path or has had a very similar path. Turns out the CFO would like that number to go down. It's not some special number or something. And if you remove the human element to it, my guess is that that probably only puts more downward pressure on it. It's like, great, we can do more with less. And it's like, perfect.

whatever the software is doing, if it's writing code or if it's the automated SDR, whatever it might be, there is downward pressure against those things. I think people are getting a little excited about running towards that. And don't get me wrong, pricing towards output and such is pretty cool. I'm sure there is something there and there's some equilibrium we'll find. But I do think people are rushing to it maybe a little faster than they should be in that there actually might be more continuing alpha in the typical subscription. Yeah, sure. Maybe that's not the Salesforce seat that's

the classic comparable, but there's just some market exploration that needs to happen. And we probably haven't fully seen that yet.

I think the entire world of investors, VCs, growth equity investors, public investors, pretty much every single one of them is trying to figure out how to think about AI and its implications on their companies, prospective companies, equity valuations, all the normal important questions. How would you advise them from the other side of the table? You've talked, I'm sure, to a lot of the great investors. You have several of them that have invested in your company. What do you think investors understand

about AI well? What do they feel like categorically? They're not understanding as much detail as you do from the builder side. Give us your lay of the land of how you think investors are doing. Give them a grade or something at understanding. Yeah, if I have to grade it, maybe from a public equity side, there are a lot of smart people out there, so I'm not going to give them too hard. I don't think it's fully being appreciated how much this is changing. We're like everyone's saying that and all that, but I think there's continued talk of

oh, there's all this R&D spend or capital expenditure. And it's like, where's the value and stuff? And I think there's just so much attention on the large AI labs that are effectively what Gaurav was talking about before is that solving intelligence. That's a very, very different mission than the company who maybe is creating the automated software developer.

two very different worlds. And so I think paying more attention to things outside of that is pretty important for it to really understand how this is changing inside of their companies. I think it would be hard to find a company today, a successful company today that hasn't and or isn't exploring an AI tool of some sort to either completely replace an activity inside the company

or, quote unquote, do more with less in another capacity. I think that's true of every function of essentially every successful company today. And that's where you're seeing a lot of the adoption. So even discussing the foundation model versus some of these companies who are just fine-tuning the model as something open source or something, there's a ton of alpha in getting these tools inside of their company. So if you're talking to someone who's, how do we do this and roll it out as a larger enterprise?

I think there are already examples. There are massive enterprises that you go in. Someone was telling me the other day that L'Oreal, the beauty company, you can go in and they have like an internal GPT, basically, an internal LLM of some sort. Any employee can ask any question. I don't know how much that's really being baked into their thought. I think there's just so much attention towards these particular AI labs and the way that they're running their businesses, which is extremely different than some other companies, in particular, if they have the backing of Microsoft or something.

It's inherently just being driven differently. And yeah, I think if you were to go into those, I think your viewpoint would potentially change. And that could definitely inform a better understanding on how this is actually going to change work.

I love this framing of the unbounded problem nature of intelligence versus the bounded problem nature of video or some of these other things. Kind of a fascinating bifurcation. I actually think that that applies to the text side too. Even on text, we already have created essentially what is a tool for intelligence. It's like intelligence in a box. Intelligence you can just apply onto something to solve a bounded problem. So

So whether that's coding now, think of it in the coding context. I think as Dwight was saying, engineers are smart people. Does that mean we need AGI to solve coding? Not necessarily because...

Essentially, what it's doing really is just translating. Think of how computers evolved over time. We used to literally do the punch card thing. Then we were writing assembly language. Who knows that anymore? Then we were doing C++. Right, exactly. Just you. Then we were writing C++. And then there's these higher level languages like Python coming to the modern era. Scott from Cognition was the guest today. So he's building the next layer. Yeah.

Perfect. Yeah. And then we're kind of just saying like, hey, the new programming language is English. That's not a crazy job. It's actually a very bounded problem. It's a problem of like inventing a new programming language, essentially. Like a programming language that is even more understandable to people because they already know it. It's a language that we already know. Intelligence is a special case. Exactly. Like the general intelligence idea of, oh, we're like creating consciousness.

Oh, it's like a thing that's going to exist, go around, do things, like have its own thoughts and have its own dreams and hopes and stuff. And maybe it'll start a company at some point. That's a whole different mission than like solving intelligence in a box, which essentially already exists and it's getting better and better. I'd love to just extend the analogy one step further to business model. Most of the commentary on AI businesses has been, again, focused on foundation model companies that have, well, have had two problems.

huge capex outlays to train the models and then huge inference bills. So often early on, really negative gross margins just to service their $20 a month subscription product. Inference has fallen 100x in cost in the last 18 months or something crazy. These costs are going down. But those were the two criticisms of the business model was, oh my God, this unbounded race, I got to spend 10x every time to build the next thing. When am I ever going to make some money? It seems like

this other category of more bounded problems have pretty normal, great business models. Is that right? Is your sense that you guys are just going to have really high gross margins like a normal software company? And yeah, you have to spend money training your foundation models, but it's not $10 billion. And walk me through the business model expectation, margins, CapEx, things like that, what the J curve looks like in these businesses. Educate us a

The way we think about it for our business specifically is that there is a bounded cost that actually solves this problem. That bounded cost is probably in the hundreds of millions of dollars, but it actually gets us to a solution. It gets us to something that, hey, this is actually reasonably good at generating anything that a CGI studio might be able to do.

And that is the level that we need to be at. Now, will that evolve? Yes, it will need to fine-tune it. But fine-tuning is generally cheap. It's actually not even close to as expensive as training a foundation model from scratch. And yeah, new data will come in, which we already have a flywheel we're building for. And it's going to be massive amounts of data. We're going to be continuously training the model and making it aware of what's happening today and what things people might want to generate today. But that's just incremental fine-tuning. It's going to be a low cost that's underlying the business.

On top of that, inference costs are going down. So I think it's going to start looking more and more like a traditional software business. I think what's going to happen is initially with these Tesla models existing, whoever truly solves this problem will have a moat for a while, as long as they are ahead. I think for us, we're also trying to build that data moat simultaneously so that we are permanently ahead. And then once enough data is out there, enough people have raised enough money and have tried the exact same playbook and built these models,

And this could be many, many, many years in the future. It's going to become a software race, building the workflows, building all the traditional stuff that we know about. Pricing and packaging, like all this stuff is going to become really important. We've seen it all. People are going to do APIs, they're going to do like B2B consumer, all this stuff. There's going to be all these use cases. And I think that's where the real competition will happen. And there's going to be winners in that. I think our theory and strategy on this is the winners are going to be really determined by

who has the best model that's consistently outperforming everybody else. All that comes down to like data acquisition, flywheel, essentially, and the ability to constantly improve the model. I do think this won't be the end, though. I think new problems will get unlocked. And we already have line of sight into that, what those other problems look like. And those problems will have their own foundation models and their own data to be collected.

And essentially, you could imagine a series of foundation models that are solving like a family of problems across a whole set of a workflow that's broad across like video and maybe even other types of media, different types of use cases like film, TV, whatever you want, basically. Maybe it's dubbing, maybe it's hands, post-production, like I don't know, right? Lots of different possible use cases. So as always, that will happen. No doubt about that.

you actually can see that these models will reach a point of maturity. And I think on the like, what does this end up looking at a mature business end called some threshold? I genuinely believe that these can look like very high margin businesses, whether that be the deflationary behavior of GPU and just compute in general. Like it's incredibly early. Like we're talking about the video's latest trip, et cetera. You're already seeing the costs come down

from on H100s as the H200 architecture and stuff is being rolled out. Throughout history, these prices have never gone the other way. It's highly deflationary as the next one rolled out, because ultimately that is their business model. They make them more efficient, they make them more powerful, whatever it is. And so I think generally speaking that that is 100% guaranteed, at least from my perspective. I think the interesting thing though, is that when you're earlier stage and companies are earlier stage, and just talking about startups in general, is that

Higher margin businesses actually sound to me like perfect attack vectors for another entrepreneur. And I think you should be very wary of companies that are operating at really high margins in the particular earlier stage. For these types of businesses, there's a ton of margin expansion opportunities. And then that also goes for the later stage companies, though, too. I think you're seeing it right now. The CRMs and companies operating 80, 90% margins. It's great. Typical SaaS kind of stuff. They

They seem like great opportunities for companies to essentially go right after in reinventing some of this stuff. Those companies don't have the same pricing power that they did 15, 20 years ago. At the same time, though, the great ones are thinking about that right now and reinventing themselves. And so it does feel like a little bit of, you know, as we discuss some of these business model changes, I think it is a bit of a shifting ground. If you think about the future now, what is on the other side

of the mission accomplished banner of you just did all video and you can create anything you can imagine in CGI with $100 million budget. Now you can do it in captions. Then what? What do you think you would do then? I mean, I think if we actually achieve that within a reasonable timeframe, I think that would be just the beginning. Because I think you could go

so much beyond that. I think these industries are massive. Like you could imagine a social network based on something like this. You could imagine film and TV and stuff being dominated by these types of technologies. You could imagine education being completely transformed. The list is essentially like endless. This would be the starting point of a potential complete transformation across like multiple industries.

So I think today we're really excited about accomplishing this particular mission, but I think the possibilities beyond that are practically endless. Any major lessons from your time at Snap, which strikes me as a very unique culture and an extremely product centric, like a good place to train for product maybe. What lessons do you take from your time there and what lessons do you leave behind?

I mean, I think Snap had, as any company, a lot of good things and some bad things. I think the great things that I was able to get from Snap was the ability to work with a lot of great people. I think Snap was in a tough spot in many ways. They were in one of the most competitive possible businesses you can exist. Monopolistic by nature, where it's really difficult to get something started and very easy to get killed. And only the biggest one actually wins and survives.

And in that arena, they were able to make a place for themselves, mainly because of innovation. And this is just comes down to the CEO. He was able to, out of all the random noise, see something and understand, yes, this will work and nobody will see it, but I know why it will work. And I think at the core of it was he had like an understanding of the product and the customer in a way that nobody did.

And nobody even came close to it. There were many moments in the company's history where he was like, we're going to do this. And everybody was like, no, we shouldn't do that. Like, this is a bad idea. And he'd be like, I don't care. We're doing it. We did it. And it was the best thing we ever did. That's the level to which his intuition was there. Snap was famous almost for like constantly innovating. Stories came out of Snap.

The old Maps location sharing product and idea came out of there. There's so many things that we're innovating on. I think they kind of lost a little bit on sort of the public TikTok thing. But that actually was something that didn't fit in their like pillar vision strategy, basically, because they're a private sharing platform. Their whole purpose was low abuse. People don't feel like you can't even reshare posts because that's a way to like embarrass somebody by like reposting that thing to other people who are not supposed to see it.

Everything was designed around feeling good, having fun, and sharing with friends.

which was really everything that people cared about at that time. And I think they kind of missed the TikTok thing because it was the exact opposite of that. It was actually shared to everybody. And interestingly, it created similar dynamics where like sharing to everybody actually made you feel more private because there were so many people that people you know would never see it. Somebody else would see it. But that's the details. I think on the downsides of Snap...

One of the interesting things there is, and one of the learnings there is product market fit often doesn't have a lot to do with what people are doing day to day within the company. And once it exists, it can stay there despite the actions of the people. So I think what ends up happening sometimes in bad cases is like people think that the wrong actions that they're taking are

are the contrarian right view because, well, the company is growing. So of course, whatever I did was the right thing. But actually, the company is growing despite the wrong thing that was going on at the time. And so it's difficult to tell what actually is causing the company to grow and what's the good thing and what's the bad thing. And a lot of people walk away from these types of high product market companies thinking that all the things they did were good things and there were no bad things because the company grew.

But the reality is the company is growing despite those actions. I think identifying those was a skill that I had to like really work on building to understand like how can we truly measure what we're launching, what we're building and understand what's a good thing and what's a bad thing.

So what I'm really grateful for from that time is the ability to work with the CEO there. He really brought me into the circle. Like he had a great design team that he had built. A lot of the decision making was driven through the design team. It was a small set of people like 10 to 12 people on that team.

even when the company was many thousands and thousands of people overall post IPO. So being a part of that team, learning from the great people on that team, I evolved my design career through this process. And I think his ability to like identify this is a person who will fit in well and will be able to learn and figure these things out. Props to him. Definitely doing something right. The closing question I ask everyone in this show, it's fun to get to do this twice today. What is the kindest thing that anyone's ever done for you?

I mean, it's hard for me to not say the kindest thing is probably my wife. And we started this company. We were already married. We had our first kid. Pretty hard not to call it that. It could have obviously not gone that way. I decided not to start the company, not to do a bunch of this stuff. And yeah, enabled me to take more risk. And yeah, yeah. Now I can't use that answer. Yeah, exactly. Yeah.

Yeah, I mean, I think it's a little unfair. Besides that, I would say likewise, by the way, for me, if I were to give you a different answer, I think it would be just my parents making sure that I was born in the US. Literally, because like I was only here for like the first couple years of my life. I was born when my dad was doing his PhD at Northeastern University of Economics.

So he was there for like four or five years, basically, that I was born in the middle of that. They moved back to India after that. But I had the U.S. citizenship. Without that, I'd still be in India. Yeah. Simple and powerful. Guys, thank you so much for your time.

Thanks. If you enjoyed this episode, check out joincolossus.com. There you'll find every episode of this podcast complete with transcripts, show notes, and resources to keep learning. You can also sign up for our newsletter, Colossus Weekly, where we condense episodes to the big ideas, quotations, and more, as well as share the best content we find on the internet every week.

We hope you enjoyed the episode. Next, stay tuned for my conversation with Katie Ellenberg, Head of Investment Operations and Portfolio Administration at Geneva Capital Management. Katie gets into details about her experience with Ridgeline and how she benefits the most from their offering. To learn more about Ridgeline, make sure to click the link in the show notes.

Katie, begin by just describing what it is that you are focused on at Geneva to make things work as well as they possibly can on the investment side. I am the head of investment operations and portfolio administration here at Geneva Capital. And my focus is on providing the best support for the firm, for the investment team. Can you just describe what Geneva does?

We are an independent investment advisor, currently about over $6 billion in assets under management. We specialize in U.S. small and mid-cap growth stocks. So you've got some investors at the high end that want to buy and sell stuff, and you've got all sorts of investors whose money you've collected in different ways, I'm sure. Everything in between, I'm interested in. What are the eras of how you solved this challenge of building the infrastructure for the investors?

We are using our previous provider for over 30 years. They've done very well for us. We had the entire suite of products from the portfolio accounting to trade order management, reporting, the reconciliation features. With being on our current system for 30 years, I didn't think that we would ever be able to switch to anything else. So it wasn't even in my mind. Andy, our head trader, suggested that I meet with Ridgeline. He

who works with Ridgeline and neither Andy or I heard of Ridgeline. And I really did it more as a favor to Andy, not because I was really interested in meeting them. We just moved into our office. We didn't have any furniture because we just moved locations. And so I agreed to meet with them in the downstairs cafeteria. And I thought, okay, this will be perfect for a short meeting. Honestly, Patrick, I didn't even dress up. I was in jeans. I had my hair thrown up. I completely was doing this.

as a favor. I go downstairs in the cafeteria and I think I'm meeting with Nick and in walks two other people with him, Jack and Allie. And I'm like, oh,

Now there's three of them. What am I getting myself into? Really, my intention was to make it quick. And they started off right away by introducing their company, but who they were hiring. And that caught my attention. They were pretty much putting in place a dream team of technical experts to develop this whole software system, bringing in people from Charles River and Faxit, Bloomberg. And I thought, how brilliant is that to bring in the best of the best?

So then they started talking about this single source of data. And I was like, what in the world? I couldn't even conceptualize that because I'm so used to all of these different systems and these different modules that sit on top of each other. And so I wanted to hear more about that. As I was meeting with a lot of the other vendors, they always gave me this very high level sales pitch. Oh, transition to our company, it's going to be so easy, etc.,

Well, I knew 30 years of data was not going to be an easy transition. And so I like to give them challenging questions right away, which oftentimes in most cases, the other vendors couldn't even answer those details. So

So I thought, okay, I'm going to try the same approach with Ridgeline. And I asked them a question about our security master file. And it was Allie right away who answered my question with such expertise. And she knew right away that I was talking about these dot old securities and told me how they would solve for that. So for the first time when I met Ridgeline, it was the first company that I walked back to my office and I made a note and I said, now this is a company to watch for.

So we did go ahead and we renewed our contract for a couple of years with our vendor. When they had merged in with a larger company, we had noticed a decrease in our service. I knew that we wanted better service.

At the same time, Nick was keeping in touch with me and telling me updates with Ridgeline. So they invited me to Basecamp. And I'll tell you that that is where I really made up my mind with which direction I wanted to go. And it was then after I left that conference where I felt that comfort and knowing that, okay, I think that these guys...

really could solve for something for the future. They were solving for all of the critical tasks that I needed, completely intrigued and impressed by everything that they had to offer. My three favorite aspects, obviously, it is that single source data. I would have to mention the AI capabilities yet to come. Client portal, that's something that we haven't had before. That's going to just further make things efficient for our quarter-end processing

But on the other side of it, it's the fact that we've built these relationships with the Ridgeline team. I mean, they're experts. We're no longer just a number. When we call service, they know who we are. They completely have our backs.

I knew that they were not going to let us fail in this transition. We're able to now wish further than what we've ever been able to do before. Now we can really start thinking out of the box with where can we take this? Ridgeline is the entire package. So when I was looking at other companies, they could only solve for part of what we had and part of what we needed.

Ridgeline is the entire package. And it's more than that, in that, again, it's built for the entire firm and not just operational. The Ridgeline team has become family to us.

Gaurav Misra & Dwight Churchill - Building Captions - [Invest Like the Best, EP.405] 01:05:16 Share