Leading AI labs have hit scaling limits because synthetic data, generated by large language models, is not enabling further scaling in pre-training. In the pre-training world, scaling compute led to better models, but now labs are shifting to test-time compute, where models are asked to explore potential solutions and verify them iteratively.
The shift to test-time compute better aligns revenue generation with expenditures, making financial forecasting more manageable. It also shifts the focus from massive upfront CapEx to more efficient, usage-based inferencing, which can be more cost-effective and scalable.
Small teams and open source models are becoming more competitive because the shift to test-time compute reduces the need for massive pre-training compute and large datasets. Open source models like LAMA provide a solid foundation that small teams can fine-tune and optimize for specific use cases with minimal capital.
OpenAI's strategic positioning is strong due to its brand and consumer mindshare, but it is vulnerable because it may struggle to compete with free models from Google and Meta. If pre-training scaling is plateauing, OpenAI's future success may depend on its ability to innovate and maintain consumer lock-in.
Google's AI strategy is disruptive because it leverages its existing strengths in deep learning and self-play, but it is also defensive as it tries to maintain its dominance in search and enterprise. The challenge is whether its AI innovations can replicate the success of its current business model.
Private market valuations for AI companies are high due to the excitement around AI applications and the dramatic drop in compute costs. While these companies show promise, the high valuations also reflect the potential for intense competition and the rapid pace of innovation.
Test-time compute leads to more efficient and predictable infrastructure needs because it aligns with the bursty nature of inference tasks, rather than the constant, high-utilization needs of pre-training. This allows for better resource management and cost savings, making it easier for hyperscalers to forecast and meet demand.
Recursive self-improvement is significant in the path to ASI because it suggests that AI systems could enhance their own capabilities without human intervention. Examples like AlphaGo and poker bots show that algorithms can sometimes operate outside the bounds of their initial training, which is a key aspect of achieving super intelligence.
The agglomeration of AI innovation in Silicon Valley is important because it fosters multidisciplinary collaboration and the rapid synthesis of ideas. Human network effects and the concentration of talent in one place can lead to significant breakthroughs and advancements in AI technology.
Two fun facts about our newest sponsorship partner, Ramp. First, they are the fastest growing fintech company in history, reaching a level of revenue in five years that I can't quote exactly, but is eyebrow raising. Second, they are backed by more of my favorite past guests, at least 16 of them when I counted, than probably any other company that I'm aware of. A list that includes Ravi Gupta at Sequoia, Josh Kushner at Thrive, Keith Raboy at Founders Fund and Coastal Ventures, Patrick and John Collison, Michael Ovitz, Brad Gerstner, the list goes on and on.
These facts demand the question, why? Having been personally obsessed with the great businesses through history, one clear lesson is that the best of them are run by disciplined operators. These operators manage costs with incredible detail, and they are constantly thinking about how they can reinvest every dollar and every hour back into their business. This is Ramp's mission, to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects.
First, on expenses, the average American business has a profit margin of 7.7%. This means saving 1% on costs is the equivalent of making 13% more revenue. The average ramp customer is able to save 5% on their expenses each year. Of course, every entrepreneur is looking for ways to grow revenue by 50%. They should just as seriously seek to save 5% on their expenses.
Second, on time. Unnecessary complexity is why most finance teams spend 80% of their time doing operational work and only about 20% of their time on strategic work. Ramp makes spend management very simple by handling your company's expenses, travel, bill payments, vendor relationships, and even accounting.
It's notable that some of the best in class businesses today, companies like Airbnb, Anduril and Shopify, and investors like Sequoia Capital and Vista Equity are all using RAMP to manage their spend. They use it to spend less, they use it to automate tedious financial processes, and they use it to reinvest saved dollars and hours into growth. At both Colossus and Positive Sum, my businesses, we've used RAMP for years now for these exact reasons.
Go to ramp.com slash invest to sign up for free and get a $250 welcome bonus. That's ramp.com slash invest. As an investor, I'm always on the lookout for tools that can truly transform the way that we work as a business. AlphaSense has completely transformed the research process with cutting edge AI technology and a vast collection of top tier reliable business content. Since I started using it, it's been a game changer for my market research. I now rely on AlphaSense daily to uncover insights and make smarter decisions.
With the recent acquisition of Tegas, AlphaSense continues to be a best-in-class research platform delivering even more powerful tools to help users make informed decisions faster. What truly sets AlphaSense apart is its cutting edge AI. Imagine completing your research five to 10 times faster with search that delivers the most relevant results, helping you make high conviction decisions with confidence.
AlphaSense provides access to over 300 million premium documents, including company filings, earnings reports, press releases, and more from public and private companies. You can even upload and manage your own proprietary documents for seamless integration. With over 10,000 premium content sources and top broker research from firms like Goldman Sachs and Morgan Stanley, AlphaSense gives you the tools to make high conviction decisions with confidence.
Here's the best part. Invest like the best listeners can get a free trial now. Just head to alpha-sense.com slash invest and experience firsthand how AlphaSense and Tegas help you make smarter decisions faster. Trust me, once you try it, you'll see why it is an essential tool for market research.
Hello and welcome, everyone. I'm Patrick O'Shaughnessy, and this is Invest Like the Best. This show is an open-ended exploration of markets, ideas, stories, and strategies that will help you better invest both your time and your money.
Invest Like the Best is part of the Colossus family of podcasts, and you can access all our podcasts, including edited transcripts, show notes, and other resources to keep learning at joincolossus.com. Patrick O'Shaughnessy is the CEO of Positive Sum. All opinions expressed by Patrick and podcast guests are solely their own opinions and do not reflect the opinion of Positive Sum.
This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of Positive Sum may maintain positions in the securities discussed in this podcast. To learn more, visit psum.vc. My guests today are Chetan Puttagunta and Modest Proposal. If you're as obsessed as I am about the frontier in AI and the business and investing implications, you will love this conversation.
Chetan is a general partner and investor at Benchmark, while Modest Proposal is an anonymous investor who manages a large pool of capital in public markets. Both are good friends and frequent guests on the show, but this is the first time that they have appeared together.
The timing could not be better. We might be witnessing a pivotal shift in AI development as leading labs hit scaling limits and transition from pre-training to test time compute. Together, we explore how this change could democratize AI development while reshaping the investment landscape across both public and private markets. Please enjoy this great discussion with my friends Chetan Puttagunta and Modest Proposal. ♪
So Chetan, maybe you can start by just telling us from your perspective, what is going on right now that is most interesting in the technology part of the story of LLMs and their scaling? Yeah, I think we're now at a point where it's either consensus or universally known that all the labs have hit some kind of plateauing effect on the
how we perceive scaling for the last two years, which was specifically in the pre-training world. And the power laws of scaling stipulated that the more you could increase compute in pre-training, the better model you were going to get. And everything was thought of in orders of magnitude. So throw 10x more compute in,
at the problem and you get a step function in model performance and intelligence. And this certainly led to incredible breakthroughs here. And we saw from all of the labs,
really terrific models. The overhang on all of this, even starting in late 2022, was at some point we were going to run out of text data that was generated by human beings. And we're going to enter the world of synthetic data fairly quickly. All of the world's knowledge effectively had been tokenized and had been digested by these models.
And sure, there were niche data and private data and all these little repositories that hadn't been tokenized. But in terms of orders of magnitude, it wasn't going to increase the amount of available data for these models particularly significantly. As we looked out in 2022, you saw this big question of was synthetic data going to enable these models to continue to scale?
Everybody assumed, as you saw that line, this problem was going to really come to the forefront in 2024. And here we are. We're here and we're all trying to train on synthetic data, the large model providers. And now, as it's been reported in the press and as all these AI lab leaders have gone on the record,
We're now hitting limits because of synthetic data. The synthetic data as generated by the LLMs themselves are not enabling the scaling and pre-training to continue. And so we're now shifting to a new paradigm called test time compute. And what test time compute is in a very basic way
is you actually ask the LLM to look at the problem, come up with a set of potential solutions to it, and pursue multiple solutions in parallel. You create this thing called a verifier, and you pass through the solution over and over again iteratively.
And the new paradigm of scaling, if you will, the x-axis is time measured in logarithmic scale and intelligence is on the y-scale. And that's where we are today, where it seems that almost everybody is moving to a world where we're scaling on pre-training and training to scaling on what's now being called reasoning, or that is inference time, test time, however you want to call it.
And that's where we are as of Q4 2024. Just a follow-up question on the overall picture. So setting aside CapEx and all this other stuff that we'll talk about with the big public tech companies in just a moment, is it reasonable to say based on what you know now,
that the switch to test time scaling, where time is the variable, is like a who cares? As long as these things keep getting more and more capable, isn't that all that matters? And the fact that we're doing it in a different way than just based on pre-training, does anyone really care? Does it matter? There's two things that come up pretty quickly in test time or reasoning paradigm, which is
As LLMs explore the space for potential solutions,
Very quickly, as a model developer or somebody working on models, you quickly realize that algorithms used for test time compute might exhaust the useful search space for solutions quite quickly. That's number one. Number two, you have this thing called a verifier that's looking at what's potentially a good solution, what's potentially a bad solution, what should you pursue? And the ability to figure out what's a good solution and what's a
bad solution or what's an optimal path and not an optimal path. It's unclear that that scales linearly with infinite compute. And then finally, tasks themselves
can be complex, ambiguous, and the limiting factor there may or may not be compute. So it's always really interesting to think of these problems as if you were to have infinite compute to solve this problem, could you go faster? And certainly there's going to be a number of problems in reasoning where you could go faster if you just scaled compute. But oftentimes we're starting to see evidence that
It's not necessarily something that scales with compute linearly with the technology we have today. Now, can we solve all of that? Of course. There's going to be algorithmic improvement. There's going to be data improvement. There's going to be hardware improvement. There's going to be all sorts of optimization improvements here. The other thing we're still finding is...
The inherent knowledge or data available to the underlying model that you're using for reasoning
still continues to be limited. And just because you're pursuing test time, it doesn't mean that you can break through all previous data limitations by just scaling compute at test time. So it's not that we're hitting walls on reasoning or we're hitting walls on test time. It's just the problem set and the challenges and the computer science problems are starting to evolve. And
As a venture capitalist, I'm very optimistic that we're going to be able to solve all of them, but they're to be solved. So if that's the sort of research labs out view, Modest, I'm curious for you to give us the big public tech companies down view, because so much of this story has been the spend, CapEx, the strategic positioning, the quote unquote ROI on all this spend and how they're going to earn a return on this insane outlay of capital.
Do you think that everything Jathan just said is well reflected in the stance and the pricing and evaluations of the public tech companies? I think you have to start at a macro level and then get to a micro level. At a macro level, why this is so important is everyone knows the Mag7, a larger percent of the S&P 500 they represent today.
But beyond that, I think thematically, AI has permeated far broader into industrials, into utilities, and really makes up, I would argue, somewhere between 40% and 45% of the market cap as a direct play on this.
And if you even abstract to the rest of the world, you start bringing in ASML, you bring in TSMC, you bring in the entire Japanese chip sector. And so if you look at the cumulative market cap that is a direct play on artificial intelligence right now, it's enormous. And so I think as you look across the investment landscape, you almost are forced to have an opinion
on this because most people in some form or another are benchmarked against an index that is going to be a derivative play on artificial intelligence. At the micro level, I think that this is a fascinating time because all of public market investing is scenario analysis and probability weighting different paths. And if you go back to when we talked probably four months ago, I would say that the distribution of outcomes has shifted.
And at that point in time, pre-training and scaling on that axis was definitely the way. And we talked about what the implications were at the time. We've talked about Pascal's Wager. We've talked about Prisoner's Dilemma.
And in my mind, it was easy to talk about that when the cost of anteing up was a billion dollars or $5 billion. But we were rapidly approaching the point in time where the ante was going to be $20 billion or $50 billion. And you can look at the cash flow statements of these companies. It's hard to sneak in a $30 billion trading loan.
And so the success of GPT-5 clapped broadly. Let's apply that to all the various labs. I think was going to be a big proof point as to whether or not the amount of capital was committed because these are three, four year commitments. If you go back to when the article was written on Stargate, which is the hypothesized $100 billion data center that OpenAI and Microsoft were talking about, that was a 2028 delivery.
But at some point here in the next six to nine months, it's a go-no-go. We already know that the 300,000 to 400,000 chip supercluster is going to be delivered end of next year, early 2026. But we probably need to see some evidence of success on this next
model in order to get the next round of commitment. So all that is a backdrop. I think at the micro level, this is a really powerful shift if we move from pre-training to inference time. And there are a couple of big ramifications. One, it better aligns revenue generation and expenditures.
I think that is a really, really beneficial outcome for the industry at large, which is in the pre-training world, you were going to spend $20, $30, $40 billion on CapEx, train the model over 9 to 12 months, do post-training, then roll it out, then hope to generate revenue off of that in inference. In a test time compute scaling world, you are now aligning your expenditures with the underlying usage of the model.
So just from a pure efficiency and scalability on a financial side, this is much, much better for the hyperscalers. I think a second big implication, again, we have to say, we don't know that free trading scaling is going to stop. But if you do see this shift towards inference time, I think that you need to start to think about how do you re-architecture the network design?
Do you need million chip superclusters in energy low-cost land locations? Or do you need smaller, lower latency, more efficient inference time data centers scattered throughout the country? And as you re-architect the network, the implications on power utilization, grid design, a lot of the, I would say, narratives that have underpinned
Huge swaths of the investment world, I think, have to be rethought. And I would say today, because this is a relatively new phenomenon, I don't believe that the public markets have started to grapple with what that potential new architecture looks like and how that may impact some of the underlying spectra.
Chetan, I'm curious maybe to tell the story of DeepSeek and other things like it, where you see new models being built by small teams for relatively small dollars that are competing in performance with some of the leading edge models. Can you talk about that phenomenon and what it makes you think about or the implications for the landscape? Yeah.
It's really amazing in the last, call it six weeks of time, the number of teams we've met here at Benchmark that are two to five people. And Modest has talked about this in your podcast before, which is that the story of technology innovation has been there's always been two to three people in a garage somewhere in Palo Alto doing something to catch up to incumbents very, very quickly. I think we're seeing that.
now in the model layer in a way that we haven't seen, frankly, in two years. Specifically, I think we still don't know 100% that pre-training and training scaling isn't coming back. We don't know that yet. But at the moment, at this plateauing time, we're starting to see these small teams catch up to the frontier. And what I mean by frontier is where are the state-of-the-art models, especially around text, performing?
We're seeing these small teams of quite literally two to five people jumping to the frontier with spend that is not one order, but multiple orders of magnitude less than what these large labs were spending to get there. I think part of what's happened is
is the incredible proliferation of open source models. Specifically, what Meta's been doing with Lama has been an extraordinary force here. Lama 3.1 comes in three flavors, 405 billion, 70 billion, 8 billion. And then Lama 3.2 comes in 1 billion, 3 billion, 11 billion, and 90 billion. And you can take these models, download them,
put them on local machine, you can put them in a cloud, you can put them on a server, and you can use these models to distill, fine-tune, train on top of, modify, etc., etc., and catch up to the frontier with pretty interesting algorithmic techniques
And because you don't need massive amounts of compute or you don't need massive amounts of data, you could be particularly clever and innovative about a specific vertical space or a specific technique or a particular use case to jump to the frontier very, very quickly. I think that
is largely changing how I personally think about the model layer and potential early stage investments in the model layer. There's a lot of ifs here and a lot of dependent variables. And literally in six weeks, none of this could be true anymore. But if this state holds, which is that pre-training isn't scaling because of synthetic data, it just means that you can now
do a lot more, jump to the frontier very quickly with a minimum amount of capital, find your use case, find where you're most powerful. And then from that point onward, the hyperscalers, frankly, become best friends. Because today, if you are at the frontier, you're powering your use case,
you're not particularly GPU constrained anymore, especially if you're going to pursue test time inference or test time compute or something like that. And you're serving, let's say, 10 enterprise customers, or maybe it's a consumer solution that's optimized for a particular use case.
The compute side of it just doesn't become as challenging as it was in 2022. In 2022, you would talk to these developers and it just became a question of, well, could you get 100,000 clustered together because we need to go train and then we have to go buy all these data and then...
Even if you knew all the techniques, all of a sudden you would pencil it out and say like, "I need a billion dollars to get the first training run to go." And that just is not a model historically that's been the venture capital model. The venture capital model has been, could you get together a team of extraordinary people, have a technology breakthrough, be capital light, and jump way ahead of incumbents very quickly, and then somehow get a distribution foothold and go.
At the model layer for the last two years, that certainly didn't seem like it was possible. And literally in the last six, eight weeks, that's definitively changed.
I think it's important, the point about meta, open source, and the hyperscalers. Open source pushing the frontier, smaller models being able to scale to very successful points is enormously beneficial, particularly for AWS, who doesn't have native LLM.
But if you just take a step back and think about what historically cloud computing was, it was providing a set of tooling to developers and builders. AWS first articulated this vision, I heard it publicly in September when Matt Garman was at Goldman Sachs conference. But their view clearly has been that
LLMs are just another tool. The generative AI is another tool that they can provide their enterprise customers and their developer customers to build the next generation of products. The risk to that vision was an all-powerful, generalizable model.
And so again, this is where you sort of have to rethink if we're not going to build these massive pre-trained entities where you drive training loss down to near nothing and that in some form or another builds the metaphorical God. If instead the focus of the industry is at test time, at inference time and trying to solve real problems,
at the point of need for a customer, I think that again, re-engineers and re-architects the entire vision of how this technology rolls out. And I think we need to be humbled that we don't know what Lama 4 is going to come out with. We don't know what Rock 3 is going to come out with. Those are the two models that are currently being trained on the largest clusters ever. So everything we're saying right now may be wrong in three months.
But I think the entire job right now is to ingest all the available information and replot the various scenario paths. With what we know today, I do not feel like people have updated their priors as to how these paths may go forward if this is correct.
I'm curious, Jathan, how the idea that maybe now you would invest in a model company because of this change. I remember you telling me over dinner two years ago that as a firm, you just decided we're not investing in these companies. Like you said, it's just not the model that we do. We don't write billion dollar checks for the first training run. And so we're not investing in that part of the stack. We're investing more in the application layer, which we'll come back to in a little bit in this discussion. But maybe say a bit more about this updated view on how that could work out.
what a sample investment could look like. And whether even if Lama 4 is the pre-training scaling loss hold, if that even changes that, because it would just seem like something like DeepSeek just only benefits from, okay, now instead of 3.2, it's 4 and like we're still doing our thing and it's still better and cheaper and faster and whatever. So yeah, what do you think about this new view you have potentially investing in model companies, not just application companies? In Meta's last earnings call,
Mark Zuckerberg talked about them starting Lama 4 development. And he said that Lama 4 is being trained on a bigger cluster than anything he's ever seen out there. The number that was quoted was it's bigger than 100,000 H100s or bigger than anything I've seen reported for what others are doing.
And he also said, you know, the smaller Lama 4 models should be ready in early 2025. What's really interesting about that is that regardless of whether Lama 4 is a step function from Lama 3, it kind of doesn't matter. If they push the boundaries of efficiency and get to a point where even if it's incrementally better, what it does to the developer landscape is pretty profound because it's
The force of Lama today has been two things, and I think this has been very beneficial to Meta, is one, the transformer architecture that Lama is using is a sort of standard architecture, but it has its own nuances. And if the entire developer ecosystem that's building on top of Lama is starting to just
assume that that Lama 3 transformer architecture is the foundational and sort of standard way of doing things, it's sort of standardizing the entire stack towards this Lama way of thinking. All the way from how the hardware vendors will support your training runs to the hyperscalers and on and on and on. And so standardizing on Lama itself
is starting to become more and more prevalent. And so if you were to start a new model company, what ends up happening is starting with Lama today is not only great because Lama is open source, it's also extraordinarily efficient because the entire ecosystem is standardizing on that architecture. And so you're right, as an early stage fund with $500 million of capital, and we're trying to make 30 investments every fund cycle,
A billion dollar training run is essentially you're committing two funds to do one training run that may or may not work.
And so that's an extraordinarily capital intensive business. And by the way, the depreciation schedule for these models is frightening. Distillation as a technique makes defensibility of these models and these modes of these models extraordinarily challenging. And it really comes down to what's your application on top of it? What's your network effects? How are you capturing economics there and all of that?
And I think what is now the case as of today is if you're a two to five person team, you can take something like coding as an example. And you could push your way into a model that generates better coding answers faster by fine tuning and training on top of Lama and then offer an application with your own custom models that
that really produces extraordinary results for your customers, whether it's developers or something like that. And so our particular approach and strategy here has been to invest heavily in applications. Starting when we saw OpenAI APIs start to take off, we started to see developers talk
about these OpenAI APIs in the summer of 2022. And a lot of our effort starting then was to just find entrepreneurs that were thinking about leveraging these APIs to go after the application layer and really start thinking about what are applications that simply could not exist before this current wave of AI.
Obviously, we've seen some really incredible, successful companies come out of that. They're still early, but the kind of traction they're seeing, the kind of customer experience they're providing, the kind of metrics, all of that has been extraordinary. You had Brett Taylor on your podcast a couple of weeks ago. So Sierra is an example of this. In procurement, we have a thing called Level Path. Many other examples across the portfolio at the application layer where you can just go through things
every single large SaaS market and go after it with an application layer investment and start to really think about what's now possible that wasn't possible two, three, four years ago.
I'm curious to talk a little bit about the big foundation model players that we talked about Lama, but less about XAI, Anthropic and OpenAI. Maybe Mada starting with you, I'm curious for just your thoughts on their strategic positioning and the things that are important for each. And maybe
Maybe OpenAI is an example. Maybe the story here is just what a great brand they built and that they have so much distribution and they have all these great partnerships and people know it and use it and they have lots of people paying them 20 bucks or whatever. Maybe the distribution is more important than the product and the model. I'm curious what you think about these three players that have so far dominated but seem through this analysis so far like it's important that they keep innovating.
So I think the interesting part for OpenAI was because they just raised the recent round and there was some fairly public commentary around what the investment case was. You're right. A lot of it oriented around the idea that they had escape velocity on the consumer side and that chat GPT was now the cognitive referent.
And that over time, they would be able to aggregate an enormous consumer demand side and charge appropriately for that. And that it was much less a play on the enterprise API and application building.
And that's super interesting. If you actually play out what we've talked about, when you look at their financials, if you take out training runs, if you take out the need for this massive upfront expenditure, this actually becomes a wildly profitable company quite quickly in their business.
projections. And so in a sense, it could be better. Now, then the question becomes, what's the defensibility of a company that is no longer step function advancing on the frontier?
And there, I think this is ultimately going to come down to one, Google is also advancing on the frontier and they most likely will give the product away for free. And Meta, I think we could probably spend an entire episode just talking about Meta and the embedded optionality that they have on both the enterprise side and the consumer side. But let's stick to the consumer side. This is a business that has over 3 billion consumer touchpoints.
They are clearly rolling Meta AI out into various surfaces. It is not very difficult to see them building a search functionality. I joke they should buy perplexity, but you've also just had the DOJ come out and say that Google should be forced to license their search index. I can think of no bigger beneficiary in the world than Meta having the opportunity or at marginal cost
to take on Google's search index. But the point is that I think there will be two very large-scale internet players giving away what essentially looks like ChatGPT for free.
So it will be a fascinating case study in can this product that has dominant consumer mind share. My children know what ChatGPT is. They have no idea what Claude is. My family knows what ChatGPT is. They have no idea what Grok is. So I think for OpenAI, the question is, can you outrun free?
And if you can, and training becomes less of an expense, this is going to be a really profitable company really quickly. If you go to Anthropic, I think they have an interesting dilemma, which is people think Sonnet 3.5 is possibly the best model out there.
They have incredible technical talent. They keep ingesting more and more of OpenAI's researchers, and I think they're going to build great models, but they're kind of stuck. They don't have the consumer mindshare. And on the enterprise side...
I think that Lama is going to make things very difficult for the frontier model builders to try to grab great value creation there. So they're stuck in the middle. Wonderful technologists, great product, but not really a viable strategy. And you see they raise another $4 billion there.
To me, that's vindicative that free training is not scaling so well because $4 billion is not anywhere close to what they're going to need if the scaling vector is pre-training. I don't have a good sense for what their strategic path forward is. I think they're stuck in the middle.
XAI, I will plead ignorance on that one. He is a one-of-a-kind talent and they're going to have a 200,000 chip cluster and they have a consumer touchpoint. They're building an API. But I think if pre-training is the scaling vector, they're up against the same math problem that everyone else has, only possibly mitigated by Elon's unique ability to raise capital.
But again, the numbers get so big so quickly in the next four or five years that that may even be greater than him. And then if it's test time compute and algorithmic improvements and reasoning, what is their differentiation? What is their go to market when you have people who have staked their claim on the consumer side and then you have an open source entity on the enterprise side that's every bit as formidable?
So when you look at those three, I think it's easiest to see what OpenAI's path forward is. One thing I will say about OpenAI, though, is Nolan Brown, who I find to be one of the most effective communicators in the research world, he was on Sequoia's podcast recently, and he was asked about AGI. And he said, look, I think when I was outside of OpenAI,
I was skeptical of the whole AGI thing, that that was actually what mattered to them. And when I got inside of OpenAI, it was very clear to me that they are very serious about AGI and that that is their mission and everything else is in service of AGI.
It's easy for us to sit on the outside and articulate the strategy that we might pursue if we were in charge there. But I think we need to be cognizant of the fact that part of the reason they've gotten to where they are today is because they are on a mission. That mission is to develop AGI, and we should be very humble about ascribing any other endgame for them than that. And my personal belief is that AGI is very close by.
Say more. And why is it not already here? These things are smarter than almost everyone I deal with. Yeah, I think so. AGI as narrowly defined or maybe expansively defined, depending on your viewpoint, is a highly autonomous system that surpasses human performance in economically valuable work.
In some cases, it's very easy to argue AGI is here using that lens. I think what is pretty clear is that if you look at the announcements made by OpenAI and their execs that have given interviews in recent weeks, an example that's brought up is end-to-end travel booking as something where...
That's something we can expect to see in 2025, where you can prompt the system to book travel for you and it'll just go do it. And that is a new way of thinking, which is end-to-end task completion or end-to-end work completion.
That involves, obviously, reasoning. That involves agentic work. That involves using computers, as Claude has come out with. And you're combining multiple ways of these large language models interacting with the ecosystem itself, putting into a very nice package that then is just able to do the end-to-end work and fully automate it and do it better than humans. And
In my view, from that lens, we're very, very close to it. And I imagine that we'll be pretty close to or at AGI in 2025. I don't see how, given the current progress and the current innovation, and now moving to test time compute and reasoning, AGI is not around the corner with that lens. And it's funny because we've sort of become the frog boiling in water where we passed the Turing test pretty quickly.
easily. And yet nobody sits here anymore and talks about, holy crap, we passed the Turing Act. It just came and went. And so it could be that this declaration of AGI is something along the same lines where it's like, yeah, of course the model can book end-to-end travel. That's not actually that difficult. Whereas two and a half years ago, if you had said,
hey, there's an algorithm that you can tell them what you want to do. It books it end to end and sends you a receipt. You would say, no way. So there may be some of this boiling frog to it where all of a sudden you wake up one day and a lab says, hey, we've got AGI and everyone's sort of like, cool. There is one particular reason, though, that lab declaring AGI is interesting in a broader sense, which obviously is the relationship with Microsoft.
And Microsoft first disclosed last summer that they have the full rights to the IP of OpenAI up until AGI is achieved. And so if OpenAI elects to declare that AGI is achieved, I think then you have a very interesting dynamic between them and Microsoft, which will compound an already very interesting dynamic there.
which is at play right now. So that's something to watch next year, certainly for public market investors, but also for the ramifications of the broader ecosystem. Because I do think, again, if we're right about the path that we're pursuing now, there will be a lot of reshuffling of relationships and business partnerships as we go forward.
Chetan, was there anything else in Modest's assessment of the big players? And we'd love to hear your thoughts on Google since we didn't talk about them as specifically. Anything that he said that you disagree with or would press further on? No, I think what we just don't know is we don't know the underlying discussions in all of these rooms. And we can speculate and understand what we might do. But
I think ultimately every internet business or technology business ultimately has come down to either on the consumer side distribution, then combines with some kind of network effect and lock-in effect. And then you're able to just run away with that and separate from the field. And then on enterprise, it's largely been a business that's driven by technology differentiation and delivery of that technology effectively.
with great SLAs, with great service, with very unique approaches to solution delivery. And so modest comments on consumer and how consumer is going to evolve, I think is exactly right. You have meta Google products
and XAI with consumer touchpoints. You have OpenAI with an extraordinary brand today with ChatGPT and a ton of consumer touchpoints already. On the enterprise side, the challenge has been that these APIs have largely to date not been as reliable as what developers expect. Developers have gotten used to, because of the excellent work of hyperscalers, that if you are out there with APIs for a product,
That product should be infinitely scalable, available 24-7. And the only reason the API ever goes down is because some giant data center lost power or something. There's very few reasons why an API should fail. It has become the developer mindset to enterprise solutions. And over the last two years, the quality of AI APIs has been a huge factor.
challenge for application developers. And so what's happened as a result is people have figured out workarounds and have solved all those problems with pure innovation. But going forward in this, again, we keep going back to this, if pre-training and scaling is not the way to do it and it's all about test time compute,
This is where, again, we go back to the traditional way of hyperscalers. And I think this is where AWS is extraordinarily advantaged because Azure and Google have great clouds, but AWS has the biggest cloud. It has really built for resilience in a way that's very, very differentiated. And even today, if you're running Lama models, you want to run Lama models on
on AWS, or if for some reason you have some very specific use case and you need to support on-prem customers, you can. At very large financial institutions that have complex regulatory environments or compliance reasons, you can run these models on-prem if you choose to. And AWS has even gone there with VPCs and DevCloud and all this kind of stuff. And so
If we assume that retraining and scaling there is done, then all of a sudden AWS becomes extraordinarily powerful and their strategy here to just be friends with everybody in the developer ecosystem over the last couple of years and not pursue their own LLM efforts, well, they are pursuing, but not sort of like in the same way that others have, will likely end up becoming a
a pretty good strategy because all of a sudden you have the best API service. The other part I think is Google, which we haven't talked about yet, is their cloud is very good at certain things. So they have an enterprise business. That enterprise business is actually pretty scaled now if you look at the latest earnings. And obviously their consumer business is dominant. And there has been a perception that
They're getting disrupted today. I think these forces are very disruptive to them, but it's unclear that the disruption has already happened. What are they doing about it? Obviously, they're trying, and it's pretty clear they're trying very hard. But I think it's an interesting one to watch and the one that I like to watch because it's the classic innovative dilemma, and they're clearly trying to be on the good side of not being innovated away as an incumbent.
They're trying very hard. And so there's very few cases in business history of the incumbent preventing the innovator's attack. And if they do defend their business through this era, that'll be an extraordinary achievement. Yeah, Google is so fascinating because...
You had a brilliant sell-side analyst, Carlos Kertner, who unfortunately passed away. But in 2015 and 16, he spent...
many, many reports writing about Google's progress towards artificial intelligence and the underlying work that they were doing at DeepMind. Actually, he was so fond of it, he went to go work at Google ultimately, but sort of first exposed this idea of the underlying work that they were doing there in neural nets, in deep learning. And
It's clear they were caught off guard by the brute force scaling of the transformer, that what advanced this wave of technology was literally throwing compute at them. But if you read any of the interviews with people who foreshadowed this data wall, one of the things they talked about was that self-play might be a mode to overcome the lack of data. And who is better at self-play than DeepMind?
And if you look at the pieces that DeepMind brings from before the transformer and what they bring together with the transformer and scaling of compute, it seems as though they have all the pieces to win.
Now, the question I have always had is not, can Google win at AI? It's, is winning whatever that looks like ever going to possibly replicate how good winning was in the current paradigm? That's really the question. To Chetan's point, it would be amazing if they overcome the dilemma and win, but I think they have the pieces there. The question really is, if they can build a business out of the assets that they have,
that in any way looks as good as what is arguably the greatest business model we have ever seen, which was internet search. So I'm equally as fascinated to follow them. I think on the enterprise side, they have incredible models and incredible assets.
I think they have a lot of trust to earn. I think that over time, they've come and gone in that world. And so I think that's a harder axis of attack for them. But certainly on the consumer side, and certainly in the model building side, they have all the assets in place to win. The question is just, what does that prize look like? Particularly now, if it doesn't look like there may be one or two models to rule them all.
Chetan, I'm curious, as an investor seeking a return, what path you personally hope for?
I personally hope for AI to continue for a really long time. You need big disruptions as a venture investor to unlock distribution. And if you just look at what happened in the internet or in the mobile and where value accrued, value predominantly accrued at the application layer in those two waves. Now, I'm not going to go into that.
Obviously, our hypothesis and my hypothesis was that this layer, again, was going to be very receptive to distribution unlock because of innovation at the AI application layer. I think that's largely played out so far. It's still early days, but the application vendors that have come out with production AI applications for both consumer and for enterprise,
have found that those solutions which can now only exist because of AI are unlocking distribution in ways that was frankly not possible in the world of SaaS or prosumer SaaS or whatever. We'll give you a very specific example. With an AI-powered application, we're now going to CIOs at Fortune 500 companies, showing these demos and
Two years ago, there were really nice demos. Today, it's a really nice demo combined with five customer references of peers that are using it in production and experiencing great success. And what becomes very clear in that conversation is that what we're presenting is not a 5% improvement over an existing SaaS solution. It's about we can eliminate significant amounts of software spend and human capital spend
and move this to this AI solution and your 10x traditional ROI definition of software is easily justified and people get it within 30 minutes. And so you're starting to see these what used to be a very long sales cycle for SaaS in AI applications, it's 15 minutes to a yes, 30 minutes to a yes. And then
The procurement process for an enterprise completely changes. Now the CIO says something like, let's put this in as quickly as possible. We're going to run a 30-day pilot. The minute that's successful, we're signing a contract and we're deploying right away. These are things like three, four years ago in SaaS was just completely out of the realm of possibility because you were competing against incumbents, you were competing against their distribution advantage, their service advantage, and all this kind of stuff. And it was very hard to prove
why your particular product was unique. And so since 2022, and I'll call it since ChatGPT, November 2022, that seems like a really good line of pre and post in this world. We've made 25 investments in AI companies. And for a $500 million fund with five partners, that's an extraordinary pace. The last time we had that kind of a pace was...
when the App Store came out in 2009. And then the pace that we had, that kind of pace was again in '95, '96 with the internet. And in between those, you see us and our pace being pretty slow. We average around maybe five to seven investments a year in non-disruptive times. And clearly now our pace has dramatically increased.
And if you just look at, of those 25 companies, four have been infrastructure companies and the rest have been application companies. And we just invested in our first model company, which hasn't been announced yet, but it's two people, two extraordinary, brilliant people that are jumping to the frontier with very little capital. And so...
We've clearly bet and anticipated there's dramatic innovation and distribution unlock happening at the application layer. We've seen that happen already. These products are truly, as a software investor, are absolutely amazing. They require a total rethinking from first principles on how these things are architected. You need unified data layers, you need new infrastructure, you need new UI and all this kind of stuff. And it's clear that the startups...
are significantly advantaged against incumbent software vendors. And it's not that the incumbent software vendors are standing still. It's just that innovators dilemma in enterprise software is playing out much more aggressively in front of our eyes today than it is in consumer. I think in consumer, the consumer players recognize it and are moving it and are doing stuff about it.
Whereas I think in enterprise, it's just even if you recognize it, even if you have the desire to do something, the solutions are just not built in a way that is responsive to dramatic re-architecture. Now, could we see this happening? Could a giant SaaS company just pause selling for two years and completely re-architect their application stack? Sure, but I just don't see that happening. And so...
If you just look at any sort of analysis on what's happening on AI software spend, something like it's 8x year-over-year growth between 2023 and 2024 on just pure spend. It's gone from a couple of hundred million dollars to well over a billion in just a year's time. And you can see this pull. You can feel this pull if you're in any one of these AI application companies. It's like
More of these companies are supply constrained than demand constrained. We talked to the CEOs of these application companies and they just say things like, well, I see demand as far as I can look out. I just don't have the capacity to go service all the people that are saying yes to me. So I'm going to segment it and go to where they are and sell.
My hope as an investor is that it continues to play out this way and that we have stability to just pursue these angles. And frankly, the model layer stabilizing is a huge boon for this application layer. Primarily because as an application developer, you were sitting there watching the model layer take step function leaps every year. And you kind of didn't know what to build and what you should just wait on building. Because obviously you wanted to be completely aligned with a model layer.
Because the model layers are now moving to reasoning, this is a great place for an application developer. One thing you know as an application developer is humans are not patient. And so you need to always build solutions that optimize on performance and quality. You cannot go to a user as an application developer and say, like, I'm going to deliver a high-quality response. It's just going to take longer. In 30 minutes.
That's not been a winning argument. Now, for certain use cases, is that possible? Could you have it run in the background for 24 hours? Absolutely. But those use cases are not widespread and predominant. And people aren't going to be willing to buy that kind of stuff. And so...
If, as an application developer today, all of my board meetings over the last couple of weeks have been these companies saying, in this new reasoning paradigm, we're really confident that we can invest in these four things that we've been super hesitant to in the last year and a half. But now we're going to go all in on these bets and the kind of performance gains you're going to see out of our system is going to be huge.
Sorry, why is that the case? Why does the reasoning thing make it so that their confidence goes up? Just like spell that out. Well, if you are an application developer and you're looking at the models today and you're saying, I can see clear efficiencies for my use case, but I have to invest in these five infrastructure layer things and these UI things. But if a new model comes out in six months and blows all that investment away just because the model itself can do it,
then why would I ever invest in these things? I'm just going to wait for the model to do it and then bet on that. But in this reasoning paradigm, if all the labs pursue reasoning and reasoning is intelligence on the Y-axis, time on the X-axis, and that's where we're going, then any improvement that I can make in my own tool to make either that reasoning time dramatically compressed because of the way algorithmically I'm feeding reasoning and I'm able to take the data and manipulate it and all that kind of stuff,
I should invest in it now if reasoning is now the new paradigm. And the last mile delivery at the application layer against these reasoning models means that I'm building technology and tooling that model companies are very, very unlikely to build. And as those reasoning systems continue to get better, my last mileage and last mile delivery systems are still advantaged and defensible.
Do you both have favorite examples of this beyond coding and customer service, which seem to be the two dominant and incredibly exciting and cool use cases with lots of companies chasing after versions of that? Do you have other favorite examples that would fit the CIO of the Fortune whatever company saying like, we need this in our company now? Jathan loves all his children, so he's not going to be able to give you specific examples. I can give you 20 of them. Yeah.
Maybe like categorically is my question. There's coding, there's support. Essentially top down, look at the biggest spends of enterprise software and you could attack that with an AI powered AI first solution. And so-
We've got a great company called Alavanex that's going after sales automation. We've got a great company called Leia that's being used by lawyers to dramatically increase the efficiency of their work. I think legal has been a very interesting question because people assume that lawyers work on billable hours. If you're automating billable hours, aren't their economics going to change? Well, now the evidence two years into this is that actually lawyers end up becoming way more profitable by using AI. And the reason is, is that a lot of the work
that was wrote repetitive and hard and done by junior people inside of law firms,
Law firms weren't able to bill for that stuff anyway. And so if you can take down the time to do document analysis from three or four days to 24 hours, all of a sudden you free up all your lawyers to do all the strategic work that they can bill for and stuff that is extremely valuable for clients. We've got a company that's automating accounting as an example and financial modeling. We've got a company that's
changing how game development is working. We've got somebody that's going after circuit board design, which has been a hugely manual and human intensive thing and computer systems are particularly really good at. And we recently invested in something going after an ad network. Now that's been something that's not been touched for a long time from startups, but it turns out matching the people that have
inventory with the people that want to do advertising in the AI world is just way more efficient. And so we invested in a company that's got a new document processing model and they're going after open text.
When was the last time a startup thought about OpenTex? It's been a long time where these huge incumbent SaaS markets were thought to be open to new startups. So you had to pursue more niche, more vertical. And I often joke because like I saw this, it was pay
payroll for field workers working in Eastern Europe was a SaaS company that you legitimately had to think about in 2019. And now we're back to large swaths of horizontal spending again to say like, hey, there's an incumbent here that's worth 10 billion plus. The market here is 10 billion of annual spend. AI makes a product here easily 10x better than
faster and all the things that users want when they see it. And you need a new platform to come out with that kind of advantage. And that's what this is.
Patrick, you asked back at the beginning about the big debate on ROI and CapEx and all this. When you listen to Chip and when you listen to other investors in the application layer, when you listen to the hyperscalers, the big takeaway over the last three months is the use cases are coming. Yes, everybody knows about coding. Everybody knows about customer support. But this is really starting to permeate everybody.
and get out into the broader ecosystem and the revenues are becoming real. The challenge on the ROI question always was, okay, you put the capital in here, you then amortize it over the inference period. But meanwhile, you're then stacking the next quantum of capital for the next model.
And so everybody could draw those extrapolations and say, oh my God, it's not just that Microsoft is going to spend $85 billion in cash CapEx, inclusive of the leases in 2025. It's what does it mean for 26, 27, 28? Because the pre-trained models were getting so big.
If, and again, it's an if, we are plateauing and we're spending less money on pre-training and moving that capital towards inferencing, we know that spend is coming. We know the revenue generation of the customer is coming. And so it becomes much more easy to say this spend is warranted.
I think it is important that people remember the underlying clouds of these companies, meaning just the normal storage and compute, are still growing high teens to low 20s. So there's some capital that needs to be allocated towards that when you're a $100 billion business growing 18%. You're a $60 billion business growing 25%. It's the incremental capital above that everybody was very concerned about six, nine months ago.
My personal takeaway coming out of Q3 was, okay, I see it. There's use cases here. The inferencing is happening. Technology is doing what it's supposed to. The cost of inferencing is plummeting. The utilization is soaring. You put that together, you get a nice rising pot of revenue and everything's good. Satya Nadella talked about this. The challenge is you spend the money for the model, you get it on inferencing, but then we're spending on the next model.
If we can start to say, hey, maybe we're not going to spend the next $50 billion on the model, the ROI calculation looks a lot better. And something that you asked, Chasen, was why is stability in the model layer important? I think Sam Altman gave the right answer on this, which was six months ago, he was on a podcast and said,
If you're scared of our next model being released, we're going to run you over. If you're looking forward to our next model coming out, then you're in a good position. Well, if the actual reality is the next model is going to be at inference time and not pre-training, you probably have less worry about them steamrolling.
So I think everything that we're talking about in this one path is very conducive to a favorable economic reality for the entire ecosystem, which is all the attention capital being put towards infancy. The real concern was, do we need to spend $1,500, $200 billion to build these ever more accurate models in pre-train? Where do prices most reflect inflation?
extreme optimism or hype still. I've certainly seen my fair share of private markets companies, let's say series A type companies that price at extremely high valuations. They're often incredible teams and very exciting, but they're also playing in spaces where if something works, you could imagine lots of other very smart investors funding some competitor. So you see these scenarios where it's like great team, high price, high potential competition, really exciting. Everything's moving fast.
I'm curious what signals you both read from valuations and or multiples right now.
In the private markets, one of the things that's happening is just the dramatic drop in prices of just compute, whether it's inference or training or whatever, because it's just becoming way more available. If you're sitting here today as an application developer versus two years ago, the cost of inference of these models is down 100x, 200x. It's...
frankly outrageous. You've never seen cost curves that look this steep, that fast. And this is coming off of 15 years of cloud cost curves, which were amazing and mind-blowing by themselves. The cost curves on AI are just a completely different level. We were looking at cost curves in the first wave of application companies that we funded in 2022. You look at the inference costs and it would be like $15 to $20 per million tokens. And
on the latest frontier models. And today, most companies don't even think about
inference costs because it's just like, well, we've broken this task up and then we're using these small models for these tasks that are pretty basic. And then we're like the stuff we're hitting with the most frontier models are these like very few prompts and the rest of the stuff we've just created this intelligent routing system. And so our cost of inference is essentially zero and our gross margin for this task is 95%. You just look at that and you're just like, wow, that is a totally different way to think about
application gross margins than what we've had to do with SaaS and what we've had to do with basically software for the last decade plus.
And so I think that's where you're starting to look at and saying the entire application stack for these new AI applications, and it starts with people that provide inference. It starts with the tooling and the orchestration layer. So we have a portfolio company that's extremely popular called LinkChain and the inference layer, we have Fireworks. These kinds of companies are seeing extraordinary usage by developers. And then all the way up the stack to the applications themselves,
I think just the pace of innovation, pace of commercial success is driving a lot of excitement.
with private investors. What is also appealing of model stability is now we can finally assume if this sticks, that all these companies are going to be fairly capital light. Because if you're not having to spend a lot on pre-training, if you're not going to have to spend a lot on inferencing, because most of the hyperscalers are now going to present you with really reliable APIs at these kinds of costs.
It's a great time to be in the application development business, and it's a great time to be in the application development stack. Modest, what do you think on valuations? I think you have to start in general with animal spirits. If you go back to the week before ChatGPT was released, if you go to the fall of 2022, that can probably just suffered its most brutal bear market.
since the dot-com collapse. It was arguably worse for the median tech stock than even the financial crisis. You had some of the very large growth funds down 60, 70%. You had the hyperscalers laying off people for the first time ever. You had CapEx cuts. You had OpEx cuts. It was a very different vibe in the tech world and in the public markets at large.
The release of Chachi PT catalyzed the reemergence of animal spirits, and it's been a progressive process. But I think where you are today, you have the public markets trading at 24 times earnings. And again, this goes beyond just the MAG-7 at this point. I mean, Google trades at, I think, 19 or 20 times. So they're not one of the offenders here.
And so I think in general, there is a lot of optimism baked into the public markets, a lot of which is tied thematically to this idea that we're in a new platform era and that the sky is the limit for a lot of various new concepts. So there's that global overhang.
If we are right, I think that what it really comes down to is understanding what does this new path forward look like? If CapEx and hyperscaler OpEx is more closely tied to revenue generation. If you listen to AWS, one of the fascinating things they say is they call AWS logistics business. I don't think anyone externally would sort of look at
cloud computing and say, oh yeah, that's a logistics business. But their point is essentially what they have to do is they have to forecast demand and they have to build supply on a multi-year basis to accommodate it. And over 20 years, they've gotten extraordinarily good at it. What has happened in the last two years, and I talked about this last time, is you have had an enormous surge in demand hitting inelastic supply because you can't build data center capacity in three weeks.
And so if you get back to a more predictable cadence of demand where they can look at it and say, okay, we know now where the revenue generation is coming from. It's coming from test time. It's coming from Chasen and his companies rolling out.
Now we know how to align supply with that. Now it's back to a logistics business. Now it's not grab every mothballed nuclear site in the country and try to bring it online.
And so instead of this land grab, I think you get a more reasonable, sensible, methodical rollout of this. It may be, and I actually would guess that if this path is right, that inference overtakes training much faster than we thought and gets much bigger than we may have suspected. But I think the path there in the network design is going to look very different
And it's going to have very big ramifications for the people who were building the network, who were powering the network, who were sending the optical signals through the network. And all of that, I think, has not really started to come up in the probability weighted distributions of a huge chunk of the public market.
And look, I think most people overly fixate on NVIDIA because they are sort of the poster child of this. But there are a lot of people downstream from NVIDIA that will probably suffer more because they have inferior businesses. NVIDIA is a wonderful business doing wonderful things. They just happen to have seen the largest surge in surplus. I think that there are ramifications far, far beyond
who is making the bleeding edge GPU, even though I do think there will be questions about, okay,
Does this new paradigm of test time compute allow for customization at the chip level much more than it would have if we were only scaling on pre-trade? But I think this question, whenever I have this in normal conversations, people overly fixate on NVIDIA. I think people like to debate that particular name. But I think there's a lot of other derivative plays of the AI build-out where...
The distribution of outcomes have shifted and that has not been reflected. I just think it's really important to think about in the test time and the reasoning paradigm from an application layer, how many of your prompts actually utilize reasoning as a way to respond to those prompts. And
Yes, application developers, as the technology becomes more available and usable, will use way more of it than they are today. But if you just look at the current techniques and the wows you're getting from the application layer already, what percent of prompts or what percent of queries are going to use reasoning effectively?
it's very hard to squint and say it's going to be 90% of queries. That doesn't seem like it's going to go that way because again, your users are not going to wait. Humans are inherently impatient and you have a solution that's like just spinning and thinking your users are gone. It doesn't matter what sector they're in. They're just gone. And so, yeah, you can have a certain set of tasks that take a long time and deliver great accuracy, but speed is by far the most important consideration for these application developers. And so, yeah,
Are we just going to have a system that continues to just go back and through and back and through and utilize all this compute? And what market share of queries use that? It's hard to imagine that being super majority of queries. And so then the implication, at least from a private market perspective,
early stage investor, which take huge grains of salt on what it means for anything other than my world. But the implication there is simply that you just don't need as much compute to
as you did with training. Training is just a constant exercise. You're scaling and you're just really hitting all your compute power all the time and just going. At the application layer, it's extraordinarily bursty. You're going to have certain tasks that need a lot right away. And for a lot of it, you just don't need a lot. And so this is where, again, like hyperscalers and things like EC2 and S3 were incredible. And now in this new world,
The solutions from hyperscalers are really terrific. I think AWS's training them and the TPUs from Google are really, really terrific. And they offer a great developer experience. I think part of what has been known for application developers is that GPUs are really tough to use
For this use case, getting max utilization out of GPUs chained together, whether you're buying that from Dell or whether you're buying it from a hyperscaler, is just really hard to use. But with new software innovations, that's obviously going to get better. And then the stuff that's coming out from the hyperscalers themselves, they're really, really terrific. And you just don't need to hit them as hard as you did when you were doing training, when you're doing test time compute.
I think it's a really important point in the utilization of the GPUs. If you think about a training exercise, you're trying to utilize them at the highest possible percent for a long period of time. So you're trying to put 50, 100,000 chips in a single location and utilize them at the highest rate possible for nine months. What's left behind is 100,000 chip cluster
That if you were to repurpose for inferencing is arguably not the most efficient build because inference is peaky and bursty and not consistent. And so this is what I'm talking about that I just think from first principles, you are going to rethink inference.
how you want to build your infrastructure to service a much more inference-focused world than a training-focused world. And Jensen has talked about the beauty of NVIDIA is that you leave behind this in-place infrastructure that can then be utilized.
And in a sunk cost world, you say, sure, of course, if I'm forced to build a million chip supercluster in order to train a $50 billion model, I might as well sweat the asset when I'm done. But from first principles, it seems clear you would never build a 350,000 chip cluster with two and a half gigawatts of power in order to service the type of requests that Chetan's talking about.
And so if you end up with much more edge computing with low latency and high efficiency, what does that mean for optical networking? What does that mean for the grid? What does that mean for the need for on-site power versus the ability to draw from the local utility? I think these are the types of questions I would be very interested to read about.
But to date, a lot of the analysis is still focusing on what's going to happen when we light up Three Mile Island, because the new paradigm is really too soon to change.
Do you think that we still need and will see, though, tons of innovation in the semiconductor world and layer, whether it's networking, whether it's optical, whether it's chips themselves, different kind of chips? I would imagine this would accelerate it even more because it was very difficult to foresee a world where you took on big green in training.
The way I think about this over hundreds of years is you have a gold rush, a land grab, and everybody's just doing whatever they can. But in technology, then as some stability sets in, you get an optimization period. You've already had that on the inferencing side. It's what Chasen referenced, is people had time to optimize the underlying algorithms in compute and inference has fallen 99%.
It's the same thing that happened with internet transit at the end of the bubble, which was people said, no, you can never stream a movie. Do you have any idea how much that would cost? And the cost of transit has fallen 25% a year, like clockwork for 20 years. The literal profit pool of that business is static for 20 years. And so I think we've had this mammoth demand surge.
And I think that if we get a little bit of stability and everyone can take a breath, there will be the two guys in the garage optimizing every single thing possible. And that's the beauty of technology over the long term is it is deflationary because it's an optimization problem. But you don't have time to optimize when you're in land grab mode.
I quoted this to you last time. The data center industry, they were power neutral. There was no demand growth in power for the entire data center business for five years. That was because you were in the fully mature stage of the cloud data center buildup. I don't know when you'll reach that point. I mean, we know that these guys on a three, four, at least 26 or 27 are committed to their buildup.
When in that path will everyone have time to take a deep breath and say, okay, now let's figure out how to run these more efficiently. That's just the nature of things. Same thing on the compute side. I just think we haven't yet gotten to the point where technologists have been able to apply their optimization. They've been in the implementation.
And I'll give you a couple of data points from my end. So my partner, Eric, is on the board of a great semiconductor company called Cerebrus, and they recently announced that inference on LAMA 3.1, 405 billion is
for Cerebrus is it can generate 900 plus tokens per second, which is a dramatic order of magnitude increase. I think it's like 70 or 75 times faster than GPUs for inference, as an example. And so as we move to the inference world, the semiconductor layer, the networking layer, et cetera, there's tons of opportunities for startups to really differentiate themselves. And
And then the second thing I would bring up is I was just recently talking to the CIO of a large financial services institution who said that over the last two years, they were rebuying a lot of GPUs because they assumed that they were going to have lots of AI workloads. And who knows if maybe they needed to do some training themselves. And so those systems are now being installed into their data centers and they're now online.
And we're in this world where you don't need to create your own model. And even if you did, you just fine tune an open source model. It's not that heavy. And so his view is like, look, if you have AI applications that run on-prem, like it's essentially free. I have all this capacity. I'm not using it for anything. Inference is light. And so I have at the moment infinite capacity to run AI applications on-prem and
And it'll cost me zero marginal dollars because all the stuff is up and running and I'm not using it for anything. So I'm ready to buy. So not only are all these application things that you're talking about hugely exciting because they unlock RR and all the stuff,
But the minute you can run any of this stuff on-prem on our stuff, that dramatically decreases the cost for us. And so it's just like win-win-win all over the place when you have something like that. And that's the current state of play. Now, how long does this overcapacity last? Application developers are famous for using all the capacity and pushing the limits. And all of a sudden, what used to be overcapacity ends up becoming...
under capacity because all of a sudden we have all this breadbath build out and we decide to stream video on it. And so of course, AI applications are going to get more sophisticated and swallow up all this capacity. But that is just a much more predictable world and much more sane world from an investment perspective than scaling to infinity on pre-training. The one thing I am curious to monitor is
It's important to remember that the reporting was not that the models weren't getting better. It's that the models weren't getting better relative to expectation or the amount of compute applied to them. So I think we do need to be cautious to conclude that the labs are not going to keep trying to figure out the unlock on the pre-training side. I think the question there is, one, what should we be looking for?
But then two is, if they continue to push on that vector, do we believe, and this was the question I always wrestled with was, if scaling laws held in pre-training, would people be willing to spend $100 billion? And I know that everybody says, if you're playing for the ultimate prize, you would. But has enough doubt been cast that simply brute forcing pre-training
is the path to that ultimate unlock? Or is it now some combination of pre-training, post-training and test time compute? In which case, again, I think that the world is just the math is much more sane.
And I've seen a lot of notes coming out saying people are declaring the end of AI progress and all that. And hopefully the takeaway from today is none of that is what I think people really in the weeds looking at this are saying. People are saying AI is full speed ahead. I think the question is just what the axis of advancement is. And from my seat,
The math seems much more sensible. Everything seems much more rational pursuing this path rather than the upfront cost being spend any amount you can to build this hypothetical God. So I think this is a much better outcome if this is the path that we end up going down.
I'm curious what you think, if anything, is the most under-discussed part of this whole story. Are there things that you find yourself thinking a lot more about than you hear discussed from your friends and colleagues? On the public investor side, just
just reading sell-side reports that we haven't seen sell-side reports or analysis on what this new paradigm of test time compute means and how things change. And so I'm really looking forward to way more sell-side analysis on this new paradigm shift. I think there's also little coverage in the private markets. I think it's known to people that are meeting these entrepreneurs is just how capital efficiently these entrepreneurs are getting to the frontier today. And this is just a shift that's happened very, very recently.
And you're seeing people just show up and having spent under a million dollars to match performance, not broadly, but in specific use cases with the frontier models. And that's just not something that we were seeing two years ago or even a year ago. And so I think that's pretty dramatically undercovered. Retraining is a big test of capitalism. If we pursue down this path,
I feel much better with a microeconomic background analyzing what's going to happen because you don't have to put in the NPV of God. And I just think that that's much better. In terms of what I'm looking forward to reading and hearing, yeah, I'd love to see some thoughtful analysis really wrestle with. Right now, I feel like it's a little defensive. People are defending the fact that scaling is not done. It's just moved.
So that's great. But let's now work through the second order effects, the third order effects, and how does this really manifest itself? I think it's very good for the overall ecosystem, the overall economy. But I think there's going to be a lot of surplus shifting from pockets that look like winners before and pockets that look like losers. What outcome in the next six months would most disorient you?
Well, two dramatic examples. On the positive side, if somebody came out with the results that pre-training was back on and there was a huge breakthrough on synthetic data, and all of a sudden it's go-go again and $10 billion and $100 billion clusters are back on the table, you would go back, but all of a sudden the paradigm shift would be wild. All of a sudden we would now be talking about a $100 billion supercluster that was going to pre-train
And then obviously, if my expectation comes out that next year we're going to call AGI, we're going to have AGI and we're building a $100 billion customer because we had a breakthrough on synthetic data and it all just works and we can just simulate everything. That would be pretty dramatically disorienting. I think another scenario is it's pretty clear now that while we've exhausted data on text, we are not close to exhausting data on video and audio.
And I think that it's still TBD on what these models are capable of on new forms of modes. And so we just don't know because the focus hasn't been there. But now you're starting to see large labs talk more about audio and video. What these models will be capable of from a human interaction perspective is
I think it's going to be pretty amazing. I think you've just seen already how much leaps have gone into image generation and video generation. And what does that look like in a year's time, in two years' time? Could be pretty dramatically disorienting. Yeah, I think the hard part as a non-technologist is for the last year, year and a half, the question has been,
what would GPT-5 bring if it adhered to the scaling law? And no one could really articulate because all we know is, okay, training loss would be lower. So you'd say, okay, this thing's more accurate at next token prediction
But as far as what does that actually mean from a capability standpoint, what's the emergent capability we were unaware of before it was released? So I think it's really hard to know ex-ante what you're looking for, other than the labs coming out and saying,
This is so good in its accuracy that it warrants staying on this log linear trajectory of SPET. And if someone comes out and says that, I think irrespective of this entire conversation or what you may believe, you have to say, okay, that's happened. Again, I just think you have to have a super open mind. And if we were at a conversation three months ago, there were whispers, but it wasn't in the open.
I just think you have to be updating your priors constantly. So clearly, like Jason said, I'd be looking for that. Personally, I watch Llama closely. There's clearly a risk at some point that they decide not to keep open sourcing.
And if I were other players in the ecosystem, I would be doing my damnedest to make sure that Lama stays open. And there are certain ways you could go about doing that. But I think that's one thing because their willingness to spend at the frontier and make those models available the way they do, I think has completely changed the strategic dynamic of
in the model industry. So that's another one that I would be paying attention to. I have a philosophical question as we near the end of the discussion, which is around ASI. So if AGI is here or coming next year,
the both of you would even think about, I guess it builds on that point about what do we even expect from a GPT-5 that stays on the scaling lull? What does it mean? Because there are fewer and fewer things, at least in a simple chat interaction, that I could imagine it doing a much, much better job on or even what that would look like. And again,
Again, we're probably just in the early innings of application development and fine tuning and improvement and algorithmic updates and blah, blah, blah. So I'm curious just philosophically what you think the litmus test could or might be for something beyond what we have naturally as the existing models get tweaked and tuned and better. What does the ASI even mean? Does it mean it solves previously impossible math or physics challenges or something else? What does that idea mean to you both?
These are my words. I don't remember who originally said them, but...
Humans are really good at changing the goalposts on expectations. And AI in the 1970s meant something different than what it meant in 80s and 90s, 2000s, and in 2024. And so if a computer can do it, humans have a really good way of describing that as automation. And whatever a computer can't do, that now becomes the new goalpost for AI. And
And so I think that these systems are already extraordinarily intelligent and are extraordinary at replicating human intelligence and sometimes exceeding human intelligence. I think if you just look at the path that some model developers like DeepMind and several startups are pursuing with things around math and physics and biology...
It's very clear that there's going to be applications and outputs of these models that are going to be things that humans were simply not capable of doing before. We already have seen that in things like protein folding today. We're starting to see a little bit of that as it relates to math proofs. I'm confident we're going to start seeing that as it relates to physics proofs.
And so my optimistic hope for humanity is that, I don't know, we'll be able to open wormholes or something. We're going to be able to study general relativity at a scale that we haven't been able to before, or study backholes or simulate backholes in a way that we haven't been able to. All of that sounds a little bit ridiculous at the moment, but certainly the way things are progressing and the way things have progressed, we don't know what is possible and what's not possible. And to bring it back to an investor point of view, we're
When you have the unknown future where the possibility is up to your imagination, that's usually a really great time to be an early stage investor. Because that means that technology has unlocked. And usually when technology unlocks in a grammatic fashion, distribution also then unlocks. And you can now go get customers that were very expensive to get.
And so previously, if you wanted to build a consumer application, you then had to factor in the tax of the app stores, search, ad networks, and all that kind of stuff. And all of a sudden, it just became a very quick process.
exercise in unit economics. And similarly in SaaS, it was like a productivity, gross margin and infrastructure costs. And you just tried to do a spreadsheet exercise and early stage investing started to become more spreadsheet like than true technology innovation. I think when you have big breakthroughs like this, everything sort of changes again. Like distribution is nearly free if you have something unique and there's like a word of mouth and virality factor to it. The technology spend is
is really, again, you go back to just investing in your developers and your research scientists. And the R&D and the ROI on R&D starts to become remarkable again. That's what's most exciting as an early-stage investor is that we just don't know what the future holds, and therefore it's back to human ingenuity and people able to push these boundaries.
When it's exciting for an early stage investor, I think it's terrifying for a somewhat skeptical public market investor. Prices are based on vibes and not math in the spreadsheet. With ASI, I think we've talked about this before. This whole concept, the reason people spend so much time on it is because it is so profound, ultimately. You have people who are invoking quasi-religion in their view of what we're building. Anytime that...
comes into play, I think the stakes are just higher. It's kind of unknowable and it's super complicated. So we all love to debate it. But I think the one thing we haven't touched on here, which is there's a pretty fervent belief amongst a group of people that there will be recursive self-improvement at some point in time.
And I think that that would be a big path to unlock in whatever hypothetically ASI means is when the machines are smart enough to learn themselves and teach themselves. On a less sort of dramatic view, the way I think about this, there's AlphaGo, which famously did that move that no one had ever seen. And I think it's like move 37. Everybody was super confused about and ended up winning.
And another example I love is Noam Brown, because I like poker, talked about his poker bot. Confused, it was playing high stakes, no limit. And it continually overbet dramatically larger sizes than pros had ever seen before.
And he thought the bot was making a mistake. And ultimately, it destabilized the pros so much. Think about that. A computer destabilized humans and their approach that they have, to some extent, taken on overbetting now into their game.
And so those are two examples where if we think about free training being bounded by the data set that we've given it, if we don't have synthetic data generation capabilities, here you have two examples where algorithms did something outside of the bounds of human knowledge.
And that's what's always been confusing to me about this idea that LLMs on their own could get to super intelligences. Functionally, they're bounded by the amount of data we give them up front. And so if you have examples like this where algorithms are able to
get outside of what they're initially bounded by, that's super interesting. I'm not smart enough to know where that leads us, but that's the kind of thing that I feel like is the next thing to come is how do you escape the bounds of what you're given upfront? I think what's remarkable from my perspective is how much of this innovation is happening in the United States and how much of it is happening in Silicon Valley. We've had a rough couple of years since the pandemic and
It's really amazing. There was an investor friend of mine who's now based in Silicon Valley, and he was just saying, I can't believe it's happening in Silicon Valley again.
And it's just become this beacon where all of the labs are based here. A lot of the people that are working on these applications, these infrastructure companies, et cetera, are here. Or even if they're not here, they're somehow connected to being here and are often visiting here a lot. And I would say that the focus on the innovation here is really extraordinary on AI. The progress being made in the United States, in Silicon Valley specifically, is extraordinary.
I do think that there's a level of attention that investors and entrepreneurs now have is how fragile the system is and how much we need to protect it and continue to invest in it. And I think there's a lot of focus now that innovation is something that needs to be protected. And I think a lot of people are now paying a lot of attention to make sure that all of this innovation that's happening in the United States continues to benefit the
everybody. And I think that's the really optimistic and cool thing to recognize.
The agglomeration effects are real. If the reporting is right, the way the transformer paper came to pass is that someone was rollerblading down the hall and heard two guys talking about something and went in and whiteboarded. Two more people came by and who knows how much of that is apocryphal or not. But it is fascinating to see from an economist standpoint that these human network effects are real.
and that COVID did not destroy them and that work from home did not destroy them and that there really is something tangible to being together and the synthesis of ideas, multidisciplinary coming together to build this world-changing architecture. ♪
Guys, it's always such a blast talking to you both. I'm lucky to get to do this in private. It's fun to do it in public. Thanks for your time. Of course. Thank you. If you enjoyed this episode, check out joincolossus.com. There you'll find every episode of this podcast complete with transcripts, show notes, and resources to keep learning. You can also sign up for our newsletter, Colossus Weekly, where we condense episodes to the big ideas, quotations, and more, as well as share the best content we find on the internet every week.
you