We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Ep 53: SemiAnalysis Founder Dylan Patel on New AI Regulations, Future of Chinese AI & xAI’s Scrappy Surge to Hyperscale

2025/1/21

Unsupervised Learning

AI Deep Dive AI Chapters Transcript

People

Dylan Patel

Topics

我观察到美国对AI的监管存在短期和长期两种影响。短期来看，这些监管措施确实有助于美国保持在AI领域的领先地位，尤其是在与中国的竞争中。然而，从长远来看，这些监管可能会限制美国的竞争力，阻碍创新。美国对AI的监管也导致了全球数据中心建设格局的变化。中国公司将数据中心建设转移到马来西亚等地，这使得马来西亚在短期内获得了巨大的数据中心产能。但与此同时，这些数据中心也面临着未来监管政策变化带来的风险。此外，这些监管措施还使得大型科技公司受益，而其他公司受到了限制，这导致市场竞争减少，甚至可能形成垄断。这可能会扼杀创新，因为大型科技公司获得了不成比例的优势。对于小型云生态系统和初创企业来说，这些监管措施造成了严重打击，因为它们主要依赖于外国客户。而对于中国AI公司和向其提供服务的云公司来说，这些监管措施更是沉重打击。中国AI公司可以通过利用监管漏洞来应对美国的AI监管，例如通过设立大量空壳公司购买GPU。但从长远来看，中国AI公司需要在工程能力上取得显著提升，才能弥补在计算能力上的不足。美国对AI的严厉监管旨在阻止中国发展先进的AI模型，但这种做法可能存在风险，因为它可能会导致中国加大投资建设自己的国内供应链。总的来说，美国对AI的监管是一把双刃剑，它在短期内可能有效，但长期来看，其负面影响可能大于正面影响。

Deep Dive

Shownotes Transcript

Translations:

中文

Dylan Patel has become one of the go-to thinkers on all things hardware and AI with his writing over at Semi Analysis. Today on Unsupervised Learning, I had an awesome time chatting with him on a host of different things. It was particularly fun because we talked right after the AI diffusion rule came out. So I got his live reactions to what it means for hyperscalers, what it means for China, the UAE, and others, and what the future of geopolitics looks like given these regulations. We also talked a lot about some of the key questions in AI.

Here's Dylan.

Dylan, thanks so much for coming on. Yeah, thank you for having me. It's a very cozy, comfortable place. I can't believe we got you like during AI Diffusion Rule Week. I feel like this must just be madness right now. Yeah, I mean, the fun story was it was supposed to be originally Monday. They dropped.

On Sunday night slash Monday, right? And so I was like, I was up all night working on it and had to cancel and we shipped it back a few days. So thanks for being flexible. And you shipped a pretty long piece in the interim. Yeah, yeah, it was great. I'm curious if you just give our listeners some like quick context on what's going on on the regulatory side with the ad diffusion rule. And then maybe given the goals of what they're trying to do, what kind of grade would you give it?

Yeah. So I think originally, right, the October 2022 regulations were primarily on the semiconductor industry. But the wording there was like not minced at all, right? It was, we want to regulate AI. We believe it's going to advance rapidly. It's October 22, which is like- They were scaling filled early. Pre-ChatGPT, but they were scaling filled, right? Exactly. And like, you know, if you get a chance to talk to some of the people like-

you know, Ben Buchanan, who's like the special advisor in the White House, like, he's like, no, no, no, this is exactly why we did it. We knew it was coming. And it's like, oh, so the government is actually quite competent. And it's quite interesting because, you know, they hit really hard. They have the goal of like, the US must be ahead of China on AI, right? Because the next five years of progress, next three years, next two years of progress are going to shape the next, you know,

century of hegemony for the world. This is taking an axe to try and stop Chinese progress. As far as the grading, October 22 was really well-intentioned. If you have this worldview that AI in the next five years is going to transform the world...

you know, if you have the view that it's going to take 20 years, then it's a very different like story. And actually these regulations are actually pretty bad because they're going to limit us competitiveness longterm. But at least over the next five years, they definitely will like keep the U S further ahead. So October 22 is really well intentioned, but there was like quite a few loopholes. So they did another round in 23. Then they did another round in December and

And all of these, like, have, like, slowly patched over loopholes. There's still some, you know, major ones, right? Namely, Chinese companies can get GPUs from, you know, foreign firms, right? So if you've seen the fuss that Oracle's been throwing, they've just been, like, ranting about it. Well, it's like, look at who their largest, one of their largest cloud customers is. It's ByteDance, right? Right.

or, you know, hey, you can build a data center in Malaysia. Now, all of a sudden, it's fine, right? And so Malaysia from 2024 to 2027, not the country itself, but companies operating there, mostly Chinese companies, many of them claiming they're now Singaporean companies, right? Like the largest operator in China of data centers in China, GDS, moved to Singapore and says they're Singaporean now. But now these companies are building three gigawatts of data center capacity, right?

To put that in context, at the beginning of 2024, that was roughly Meta's global footprint. So it's like, oh, wow. The entire world's social media is running off of what Malaysia is putting up in three years. So what happens to all these Malaysian data center builds, given obviously a big part of this regulation is both tiering countries and then also probably much more regulatory oversight of who's actually training models within your data centers? Obviously, all this

planning has gone into building these things, but now it seems like the rug's kind of been pulled out in some way. Yeah, I mean, this is definitely the most far-reaching regulation that I've ever seen, right? Probably there were bigger things during the Cold War or World War II, I'm not sure. But like,

This is very far reaching in that like it's regulating clouds overseas, right? It's regulating foreign companies. It's limiting what they can buy to a huge extent. And so Malaysia has a huge amount of data center capacity that's going to be, that's being built. We'll see if they continue building it, right? You know, I think most people still continue building it. Companies like Microsoft and Oracle have significant assets being built in Malaysia. They probably won't stop. But other companies, you know, what happens now?

now. You know, at the end of the day, the U S is still very limited on data center footprint as well. We're building as fast as we can. There's a lot of regulatory bottlenecks. There's, there's a number of different, uh,

things in that nature. But what happens as you deploy, American companies deploy AI globally, right? It's like, is OpenAI going to serve GPT in Japan and Taiwan and South Korea? Or do they just run it out of Microsoft data centers in Malaysia maybe, right? This could be a reasonable way to subsume this. But the really important thing here and the sort of

Even if you're very pro-America, anti-China, there's still huge negatives to this regulation in that a ton of buyers...

can't buy that data center capacity in Malaysia and actually deploy. American companies can't do that. Oracle is a good example. One of the parts of the rule is you can only have 7% of your data center capacity in any country that's not a US ally. These tier one countries. Oracle was planning to have 20% of their capacity in Malaysia. Now it's like, what do I do with these data centers that I was going to build? I'm obviously an American company. Fine, I won't rent it to Chinese companies, but like

What do I do now? So there is a bit of like, you know, the companies that have so much capacity in the U S and

um, are the only ones that can now take over these data centers in Malaysia and build them up. Right. Because of the 7% rule. So like, it's like Microsoft, Meta, Amazon, Google, right? These four companies have, you know, you know, 70 plus percent of their data center, AI data center capacity in the U S. Um, and so they can, they can like deal and they're going to, they're building super fast. Yes. They can deal with like, Hey, I'll take, I'll take 500 megawatts. I'll take a gigawatt in Malaysia. And that won't break me out of the ruling. Right. Um,

everyone else is kind of like SOL. So now you've like reduced competition massively in the market. Right. And you've created sort of a monopoly. And so this is sort of like why these regulations, while they are like, yes, if your belief is like this whole AI, like,

acceleration and you need to combat China. This is very important. US must maintain hegemony. That's one side of the fence. The other side is like, well, if you want to maintain hegemony, it turns out like, what did we do during World War II? We're like, Henry Ford, here you go. You're going to be absolutely freaking rich because you're going to make all the tanks or whatever, right? Trucks. This is sort of the same thing they're doing with AI is like, here you go, Satya Nadella. You get to

Do this in AI, right? And so there's a bit of like antitrust, like, you know, I like decentralized power. You like decentralized power. You like innovation. And does this like stamp out innovation in some ways? You know, probably not for American like startups, but for like infrastructure hardware, it definitely is like pretty, pretty bad.

obviously there's been, you know, like proliferation of players like CoreWeave and these folks. Like how do these regulations, like, I mean, ultimately, you know, I've heard you say it obviously favors, you know, the giant tech companies, but presumably like they can have footprints in these tier one countries as well. Like what happens to the kind of like, you know, the mini cloud ecosystem? So I think CoreWeave has rapidly like become

quite close to the hyperscale level, right? And then their next set of build-outs, like large build-outs are continued in the US, of course, but then also Europe. I think they could expand overseas for a while and still be fine. They're going to have to hit certain regulatory levels for being able to monitor their customers' workloads and all these. Just when you throw up- The more requirements you add, it's like-

The best people who can do it. Yeah, exactly. The rest of the cloud ecosystem though, right? Excluding CoreWeave like is very much hit hard, right? There's a lot of clouds that are in foreign countries that are trying to build up their sovereign AI, which is like, you know, a lot of startups are in the Bay are serving sovereign AI firms in Malaysia and Singapore and India and the Middle East, whatever. That's their biggest customer. And it's like, these guys are heavily, heavily impacted by these regulations. There's a lot of...

you know, cloud companies that are, you know, I guess like tier one is like the hyperscalers, tier two is like Core, even Oracle. And then like below that is like everyone is just kind of like really hit really badly. And so, you know, I think the startup ecosystem is going to have to learn like, what do I do if potentially, you know, my big Middle East customer or my big Singaporean customer or my big Korean customer or whatever, maybe not Korea because Korea is like allowed tier one. But, you know, one of these countries, sovereign AI countries, are they...

Am I now regulated? Like this is actually like potentially bad for startups in some way. It also is diversity on the foundation model side, right? I guess like on the sovereign AI point, like maybe contextualize before this regulation, what was happening both in kind of the data center build out world, you know, that's government fueled as well as like the kind of training core models like for, you know, for a specific country or geography.

It was basically the wild west, right? The US had an executive order a while back. It was basically just like if you're building a model that's 1E26 flops, which is like very large. That's like twice as large in terms of flops as Lama 405B, right? So I presumably like very few people have done that yet, but it's like a notification requirement. That's it.

But that's in the US, right? So if you're in China and you're doing this, or if you're anywhere else, it's fine. Of course, there were not really data center buildouts of the scale to hit that quite yet, but they were coming next year and the year after, right? Or 2025, 2026, these buildouts were coming. And so there was not much regulating foundation models, regulating other countries from building foundation models. Obviously, OpenAI and other companies were told like, hey, I know the

the UAE is telling you they're going to build you a massive cluster. You're not allowed to do that, right? Like there's obviously things like that tacitly under the table happening, but there were no like, here are the regulations. So, you know, if you were a company in China, you would, you know, do whatever you want. And if you're a company elsewhere, you do whatever you want and you can rent Jeep use from wherever you want and you could build whatever you

on. Um, and this like smacks it down, right? You can't like export model weights right out of outside the U S slash outside of like trusted clouds, which is again, the hyperscalers, right. Um, of, of foundation models that are really the like biggest, big ones, right. Um,

you know, like think Lama four and GPT five and that kind of stuff. Um, or you can't like, um, there's like various things around like protecting against synthetic data generation, which all the Chinese companies are doing a ton of synthetic data generation of GPT four. So their models can be good, right? Like this is part of the like dirty secret about why like, yes, Alibaba and deep secret really good, but they're also just generating a bunch of data from GPT four and using that to like,

post-train, right? Partially. There's a lot of regulations around, yeah, accessing clouds. And so a lot of company, a lot of clouds, you know, they didn't have a core, you know, you know, like when you look at like

the big tech companies, they've sort of all paired off, right? You know, Meta's themselves, Google's themselves, but then Microsoft, OpenAI, Amazon, Anthropic, Elon's world is, is XAI, Tesla, you know, there's, the lines are blurry, right? X, the lines are very blurry between these companies. But like, these are sort of the hyperscalers in the US. But in China, Alibaba was doing good at AI. Yeah.

and they were building out. But besides that, like there's a bunch of new players, there's moonshot, there's deep seek, and there's all these other players that are popping up, but they don't have a hyperscale partner, um, uh, uh, bite dance, of course. Right. And bite dances, they do run a lot of their infrastructure, but they, a lot of it, they rent, um, you know, what do they do? Well, they decided, well, let me just like, not by a deep seek, but like bite dances, it's like, Oh, I'll just rent from any GPU cloud out there. Right. So there's random clouds and like

In Europe, there's random clouds all over Asia, right? And it's like their business case is like, you're going to make...

a decent profit, right? And you're going to build GPUs and rent them to me, right? And I'm not, you're not going to have any observability in my workload. I'm just going to do whatever I want. Most of it is just serving TikTok, right? Let's be clear. It's not actually like illicit, but you know, the potential is there and they're obviously working on language models and generative video models. And there's always the like fear that they're going to manipulate our teenagers' minds and, you know, destroy us all. Not after Sunday, I guess. I mean, they're still going to be able to do the latter thing, right? But they're heavily, heavily, heavily limited on, you know, the size of clusters they can get.

now and who and what and the notification requirements and like the companies have to be able to observe their workload, which is like bad security. So it's a really, really big blow to, of course, ByteDance and many Chinese AI players, but it's also a huge blow to all these random cloud companies whose business case was like, I'm going to sell to ByteDance, right? Or I'm going to sell to this Chinese company.

So where does Chinese AI go from here? And like, what, you know, do you see any kind of like, if they're looking around for loopholes, like what, you know, what is the path forward? There is, there's one obvious loophole, which is, well, there's like strict caps, right? Like each country can only buy 50,000 GPUs for the next four years. And it's like,

that's kind of nothing when Nvidia is making, you know, 6 million plus this year. Right. Um, it's like, huh? Um, so, so each country can only buy 50,000 GPUs, but then there's still loopholes, which is if you buy 1700 or less, that doesn't count to the 50 K. Um, so, so the obvious thing is you just spin up a bajillion, you know, shell companies and you just buy, you buy 17, you buy 1600 GPUs and, and then you route them to China or whatever. Right. So there are still some loopholes. Um,

but generally, you know, this is, this is much harder to do. I think, I think more, more so is just like, you know, China has a ton of amazing engineers and they will have to innovate, right? Like deep seek is like amazing at engineering, right? So it's like, um, you know, they're, they're crushing companies with,

similar levels of compute, right? Because they're just engineering better, but they're not quite like open, I anthropic level. Now they need to take this like same, like compute for collabs continues to scale like crazy and there's cannot scale nearly as fast. So it's like, we not only have to be better engineering, we have to be like way, way, way, way better engineering with this new push to like test time compute. Like, is that feasible with, with like a huge compute deficiency? Yeah. So, so I think there's, um,

some interesting aspects of test time compute, right? Um, it is way more compute intensive than people think. Like people are like, Oh, training is dead. And it's like, no building a test time compute capable model requires tons of, of training. It's just not really interesting data about this. It's not just like pre-training it's post-training, right? So you have to generate a ton of data. You have to throw most of it away. Um, you have to verify, verify the data to make sure it's like actually accurate, right? Um, the chain of thought or the reasoning chain. Um,

And then you have to train the model, right? And you have all these reward models. It's like very complicated push training stuff. So, you know, and it's very compute intensive, actually. So you can make reasoning models today, right? Like O1 and like, you know, DeepSeek's R1, I think it is. And Alibaba is a reasoning model. I can't remember the name off the top of my head. But these models like are like

The way you should think of it is like, you know, scaling laws are log log, right? So you can scale, you can scale data, you can scale parameters, you can scale. And it's like, well, both of those are sort of like petering out, right? Because data is not scaling fast enough. And, you know, just infinitely growing parameters doesn't give you anything, right? Or it gives you diminishing returns.

But we've scaled up this rung many times, right? The training runs that these frontier labs are doing are billions of dollars, right? And next year, probably tens of billions of dollars. It's like scaling very fast. You can only get like a log rhythmic improvement on that front every so often, right? Whereas test time compute, we're like at the bottom rung of the ladder. And it's like,

Like, oh, I can climb up this ladder really rapidly because, oh, right now we're spending hundreds of thousands, millions, tens of millions, hundreds of millions, billions, right? And the compute deficiency doesn't matter until you get to the billions plus scale, right? And so there are many rungs that they can out-engineer with the limited compute resources, but...

and sort of catch up. But there's also the flip side of like, well, Anthropic, OpenAI, DeepMind, et cetera, right? XAL, these guys are, and many startups in fact, right? Like Mira startup and so on and so forth. They're all scaling up this ladder really quickly as well, as fast as they can. And so it's interesting to see what's going to happen. I think that, you know, that's on the training side and on the inference side,

you know, it's like if you actually like do a query of like, Oh one versus GPT four Oh, it's like, yeah, it costs like 20 cents for a query and it costs like $6 for a query. Right. It's like, it's like a dramatic difference. Now, obviously the quality is better. And so that like, and that $6 is still way cheaper than like paying, you know, someone who makes way more than $6 for that work. Um, but it is like,

And it's scalable, right? Whereas human resources are less scalable. It's kind of like this is...

a huge difference in cost. And that $6 is all inference. And it's like, well, you're limited on GPUs you get into China. You're limited to 50,000 in all these countries that are friendly to China or even neutral to China, right? So it's like, it's going to be very hard to get that scaling of compute on inference, right? And, you know, you can throw as much money as you want on training, but to actually like change the world, right? Again, like if you believe AI is going to change the world, then you have to spend...

a ton on inference, right? Totally. Even if the margins are low, right? You know, even though like Anthropix and OpenAI's gross margins on inference are like 75%, a lot of these other companies' gross margins on inference are like 10%, 20%. Even if it's 10%, 20% gross margins...

you know, to amortize the cost of like a $5 billion model, right? Like, okay, you need to do like $10 billion of inference revenue. And it's like, okay, well then that's like $8 billion of hardware cost, which, you know, and that hardware cost of $8 billion is like actually depreciated over multiple years. So in that year you bought like $30 billion of hardware. Congrats. Right. And it's like this, and that's like way too much, right? Like that's way, that's like exceeds the caps dramatically. So I think, um,

you know, test time compute, they'll be able to like deliver impressive results. Probably the question is if they'll be able to scale it and sell it. And, and so that's like the, the, the issue. And like on the flip side, it's also like China, the cost of developing software in China is like,

way cheaper than in the US, right? They just have way more talented engineers, software engineers. And so, you know, their development practices and all these things are different as well. But like every company there, you know, there's a lot of like platform SaaS doesn't work in China, right? This is like pretty well understood as like, in the VC world, it's been like amazing for VCs is that platform SaaS type stuff has been awesome. Right.

Um, because you can sell to all these companies because develop it, you're, you're developing it in one place and you're selling it to everyone. Right. Whereas in China, it's like, okay, you're developing in one place, but everyone can just develop it internally. It won't be maybe as good. It won't be best in class, but whatever devs are cheap. Right. Which kind of seems like the future we're headed with all this AI coding progress. This is what I'd be curious about your take. Yeah. No, I mean, I definitely think it's, it's a, it's a huge risk for these SaaS providers. Right. Um,

I mean, obviously, Klarna and folks have talked about it very publicly, but I think the kind of moat that was spinning up these applications in the past is certainly a lot weaker now. And I think like the ability also, you know, that combined with the ability to more easily migrate data out of whatever system of record you're in and basically all the like painful stuff that probably stopped someone from ever really upgrading their software in the past is now just like a relatively simple thing to do with a bunch of like AI agents. You throw it at go away for a few days and there it is.

yeah, it's a real threat to SAS going forward. But a lot of countries have, you know, I think clearly from the beginning have realized the importance of having this, you know, this expertise and models in their own country. But if anything, I feel like this regulation makes incredibly clear, you know, could at any point be cut off. I think like if you look at UAE, this is exactly what's happened. And I think it's been

pretty well reported in the media. At one point, G42, which is the UAE sort of champion of sovereign AI, funded by all sorts of UAE money, and it is based in UAE. They were at one point threatened of being cut off, and they were like, the CEO was a Chinese guy from China, and it was like CCP links potentially, blah, blah, blah, all this stuff.

And then like, you know, it goes like all the media reporting on this goes silent for like six months. And then the next, you know, uh, thing to come out from like, whether it was Bloomberg or Reuters or wall street journal, whatever was like, yeah, G 42 and Microsoft are going to do a partnership. And it's like, huh? I wonder like what happened here? Like, is it, is it purely organic or was there like some sort of like tacit, like, well,

if you're just going to run your, all your own infrastructure, you know, maybe there's something bad here, but you know, now that we know Microsoft has partnered with you, you know, it's good. And like, you know, now when you look at it, it's like G42 and the U S government are actually like talking all the time. And they're like, like G42 will be a big cloud player. And they're deploying GPUs in the U S even not just in, you know, the UAE, but in Europe and the U S. So it's like through these hyperscalers, I guess you can enforce the American sphere of influence of like what, you know, both obviously, you know,

of what models are out there. But also, ultimately, who gets that? I think right now there's some human rights terms involved in the tiering of the countries. And you could imagine all sorts of ways that administrations use this stuff going forward. Yeah.

Yeah, I mean, like, the Biden – the regulation itself does mention human rights. Obviously, the new administration is coming in. If you had me guess, I'd say, like, 95% probability the Trump administration just keeps these rules and maybe, like, obviously tweaks them to some extent. But the human rights thing, they're probably going to throw out and, like, you know, and switch for, like –

you know, American economic influence or like industry or like whatever, whatever the priority of like, you know, purchasing energy and weapons from the U S government. Right. Like these are the sort of things that might matter more for the Trump administration versus like human rights policies might matter more for the Biden administration. But at the end of the day, yeah, it's like a weapon for sort of the arsenal of democracy. You know, this is like a regulation. So I don't know if it's like, yeah,

But I mean, taking a step back, I feel like you've talked before about, and I think it was maybe in the context of the chip bands, about this like Goldilocks that you're trying to strike of like preventing Trina from training their own cutting edge models, but not making it like so barren that like everything gets built up within the country itself. And I wonder like, you know, back to the, I guess the grade question, like, does it feel like this, you know,

hard-handed approach. Like, is that the right way to further these policy objectives? So I think, um, October 22 and then the 2023 regulations, they were, they were more like, you know, we're going to try and limit you. Um, but they didn't like fully limit like China building their own domestic supply chain and all these things. Right. Um, because the intention is you can't build your own domestic supply chains and you can't access it, but you can access it at like some tier right now, like with these most recent regulation, it's like,

no access at all to this stuff, right? Like you have, you have order of magnitude less compute than like any of the U S like labs, right? Um, that's what we're trying to enforce on you. Um, in which case, like obviously there's going to be like some leakage. And so maybe it's not a whole order of magnitude. Maybe it's like, you know, 75% less compute, whatever, right? It's still a humongous amount. So

Now it's like China only has two options, which is like build your own supply chain or, you know, and, and if we limit that effectively, right? Like, so like Ben Thompson has this take and he's like, yeah, if you, if the regulatory regulations are so effective on preventing China from building their own semiconductor industry, um, or, you know, it's sort of like, or they like build their own semiconductor industry and it's like competitively

then like action on Taiwan is much more likely, right? Whereas like if they feel that they're still benefiting from Taiwan, right? They feel that they still can build it up and get access through Western sphere, right? This is like the sort of like geopolitical stuff I have no idea about, by the way, right? Talking out my butt, but like there's sort of the like, you know, the Goldilocks zone of like, yeah, you don't want to ban them too hard. You don't want to let them just have everything because then that like induces risk, right? But if there's like that Goldilocks zone of like,

You just can't, right? Like you're kind of like you're behind, but not so far behind. There's hope of catching up. Then it's maybe fine. Well, I guess switching gears to, you know, the cluster buildups that we're seeing today, maybe just give some context for our listeners on, you know, so you're the guru of tracking all this, like the size of cluster buildups we have today, those being planned, and maybe just talk a little bit about kind of the biggest blockers that we have to larger clusters today in these tier one countries. Yeah. So, I mean, when you take a step through sort of like GPD for 2020, you know,

2022, sorry, roughly 20, you know, few thousand GPUs, A100s, right? And they train this model, change the world, right?

what's the next sort of step up, right? You know, you got to look at like from two to three to four, right? There's, you know, a couple orders of magnitude of compute increases. Now, you know, with GPT sort of five, quote unquote, or Orion or whatever's coming out there, you know, working on it. Um, this is a scale of, you know, Hey, well we can't necessarily go for, you know, order magnitude more GPUs because it's just too expensive. And there's a lot of, uh,

with building that out. Um, so we, we go up, you know, we go from 20,000 GPS to a hundred thousand, right? Um, and so XAI is built a hundred thousand metas built a hundred thousand, uh, both of them have actually building more than that, right. In one, one spot to do training, uh, open AI's plans are also an anthropic, you know, so on and so forth. Anthropic has 400,000 tranium server, uh, GPU is coming up this year. Uh,

Or Tranium chips, not GPUs. People are in the hundreds of thousands range for this year. But what's built today and what's training models today is on that order of magnitude of 100,000. And the GPU that's being used is the H100. So you have 5x the GPUs and you have 3x the performance per GPU, roughly. So then you're at 15x more compute. Yeah.

And we'll see what comes of this, right? Like, you know, the thing is like, you don't build a cluster or run a model and immediately get like, you know, something great out of it, right? You have to like do tons of experimentation and, you know, training takes months and post-training takes months and safety takes months, right? It's not like...

snap of the finger, you get the model right now that you've built the cluster. And so there is a bit of a lag, right? And so, you know, this year we'll see models come out that are from that sort of quote unquote last generation cluster, but the next generation one is being built now. And what's this next generation one is like hundreds of thousands, right? And for context, right? Like

you know, each GPU, right? Once you include networking, building, all this sort of stuff, maybe is, you know, the GPU itself of H100 is like 24,000. But once you add everything else up, it's like 40, $45,000 per GPU, all in of everything. Right. And so the cost of like a hundred thousand is like, like,

a lot, right? It's like $5 billion for 100,000 GPUs, right? And so Elon's round was 6 billion and, you know, he built 100,000 GPU cluster, right? Like, you know, obviously there's like other costs, right? They had to buy data. They had to do like, you know, employees and they had to do like, they had to convert a factory into a data center, right? And like set up all this crazy shit that they did, right? Which is like, you know, I'll talk about that in a second, but like there's a lot of stuff that needs to be done to get, you know,

you know, a model train. And now, now that Elon's built the cluster, it doesn't mean a model is going to come out, you know, I'd be surprised if they have a leading model, even this quarter, right. Um, it's going to take some time. And there, and, and so like, what is, what is, you know, that's on the training front. Um, and, and these next generation clusters, they can be built this year, not $5 billion. They're, they're, you know, 15, right. You know, and obviously you can't scale the money up, you know, like insane amounts, but three X, right. And, and obviously the context of that anthropic racing was like $2 billion at, you know, in, in this current VC round, it seems,

Well, so what's nice is, you know, Anthropic, OpenAI, Anthropic especially can, you know, and OpenAI and, you know, some of these companies that are the startups, right, can raise around and that money can only be enough for maybe a year, year and a half of GPU rental, right? And that's rental, right?

They're not the one buying the GPUs, right? So Amazon might be the one spending, you know, five, $10 billion, but they might for one year rent it for three, four, right? And, you know, when you look at the revenue, the credits they get from Amazon, plus the round they raise, right? That's enough to pay for something like that, right? Now, and the Tranium server is not $10 billion because it's like a cheaper internal chip for Amazon, but like,

you kind of get this like benefit if your cloud partner really loves you. Right. And in the case of like open AI and Microsoft kind of, but they were able to make a deal, strike a deal with Oracle, right. Oracle spending like $10 billion plus for them this year to build out data centers and GPUs. Right. Um,

for open AI, uh, not Microsoft. And interestingly enough, Microsoft is doing stuff as well, but like, but they're not, they're not charging $10 billion this year. Right. They're charging a few billion dollars this year. So there is, there is that benefit, right. That you don't need to necessarily, you might have signed the dotted line and like Oracle may like have taken the credit risk, right. Of like, we'll, we'll open it on this contract exist in three years. I mean, you and I believe yes, but like, this is something that you have to consider as a possibility. No. Um,

And so there is that risk. Um, and like the numbers that like companies are raising are not enough to pay for the whole cluster, but that just means, you know, anthropics going to raise even more nine months from now or a year from now, whenever they released their reasoning model, you know, it's a constant fundraising thing, right? That's like Sam strength is like, he's a God tier fundraiser. Right. Um,

And so that's like one aspect of it. The other aspect is like when you look at the hyperscalers, right? Meta, Google, Amazon, XAI, you know, all these guys are building their own data centers. They're the ones delivering the CapEx. It's really hard to get the electrical infrastructure, the substation, build it all out, get it to work, get all the chips to be networked properly, deal with all the failed chips. Because when you're talking about, you know, this number of chips, actually quite a few are failed. A lot of them are silent failures. There's a lot of difficulties in getting a cluster to work.

And so like, you know, the big data center guys have like sort of like known this, right? The hyperscalers. XAI kind of came in and was like, Elon's like, well, we're just going to do it. And it's like, well, we can't find a data center anywhere because all of them are already taken, right? For the timeframes we need. So what do we do, right? And like...

What they did is like just like the most like Elon awesome thing ever, right? Which is like they found a random appliance factory – closed appliance factory in Memphis, Tennessee, right? Located next to like a giant power plant, a water treatment facility, like a garbage dump. Like just like literally like what most people in SF would be like, this is the worst place on fucking earth, right? Like not really but like –

You know, you talk to people at XAI that actually go to the data center, like this place kind of sucks, right? Like Memphis has really good wings, by the way, you know, like there's good food across the board. Yeah. Great food in Memphis, but like wings, barbecue, et cetera. But like,

You know, it's like, okay, we bought this appliance factory, but why did we buy it? Well, one, there's a gigawatt natural gas plant right next door, right? Two, there's a natural gas line, a main that they tapped and they set up their own generation capacity on site. You know, three, they're upgrading the substation, take more power from the grid, right? So they're like doing, they set up their own like fricking mobile generators to like create power, setting up Tesla battery packs, right?

they're now planning, they've filed permits to build their own fucking power plant, right? Like on site, not just like mobile generators, but like a massive natural gas combine, you know, so they can do a million GPUs or is what he claims, right? Or what he says he's going to do, which, you know, we'll see. I won't bet against Elon, right? There's, he's doing all these like things to like,

Get the data center now, right? Because there's no power. The substation wasn't ready, right? The power's right there, but the substation was too small and it was being upgraded too slowly because US power grid upgrades take a long time. They'll take at least six months a year. They had already thought about it, but it was only like six months a year. Like, okay, we'll just generate power on the site, right? So they tap the natural gas line that's next door and they set up all these mobile generators, right? Oh, the power from these generators is pretty dirty and GPU training is like, the moment I stopped training, power just goes to zero. Yeah.

Or, you know, maybe if I'm doing a gradient update, now all my GPUs go idle and like they're exchanging weights and it's like, oh shit, power went to zero. Then it's like, oh shit, power went to 100 megawatts. And like this sort of stuff can blow up grids. And so, you know, Elon's like, oh, we'll just throw a bunch of Tesla battery packs. There's doing all sorts of crazy things. Oh, how do we, you know, how do we cool this facility? Because it wasn't,

ready to be able to cool this much amount of stuff. Oh, we'll water cool everything and then we'll rent a bunch of chillers, water chillers, like many of them restaurant grade, like freaking container things and just place them outside and like, cool, right? It's like they're doing all this like crazy stuff and it's like way more complicated than this, right? And it works, right? It's like very impressive how they deal with all the problems. I guess one more funny story is like this power problem I mentioned. Meta, they accidentally open sourced this code. It's literally called power plant no blow up. It's a flag.

And so when you're doing the gradient update, right? When you're like exchanging weights, instead of like having the GPUs go idle, right? Basically when you're sending all the weights in the network, instead they just have them do fake matrix multiplication. So the power stays stable, right? Because otherwise like the power plant might blow up if you go like this. And it's like, what the hell? So like, this is like the crazy stuff that's happening. It's like...

You know, there's so many complicated things going on in these, you know, $10 billion buildouts that people are doing. It's fascinating. If you were running with government right now, I mean, what, like, obviously it seems like the energy side is like a massive policy and just like bureaucratic blocker to more of these data centers happening. Like, what's the best thing that government can be doing to unblock this stuff? And, you know, is every future data center buildout like this going to have to pack together Elon style, you know? I mean, like, to be clear, right? Like.

The number of normally built, conventionally built data centers is like this. It's just like what we want is straight line up, right? Totally. And so, for example, gas generators are sold out for four years from GE, right? And it's like, oh, wow. Vernova, I guess now, right? Substation equipment, same thing, right? It's sold out for four years. So now you have to get creative, right? Yeah.

and so like, even though it's like growing really rapidly. So like some of the stuff is like held up by like, I think in one part of like, uh, the Midwest, it might be Ohio or Indiana. The, the cost of like, if you build a new power plant to transport that power on the grid to a customer cost more than the cost of actually generating the power. It's like, what the flip, right? Like it's like, so like the grid needs huge investments, right? Um, there needs to be like a removal of a lot of the environmentally regular environmental regulations. Um,

But also to an extent, there needs to be like more vibe shift of like ESG type stuff, right? It's like, what is like, is like, and this is like the attitude of like people at AI labs. I don't know if it's actually true or like it's the correct one, but like,

Maybe you just say, screw it. To build AGI faster, we do it with natural gas. And then because we're doing it with natural gas, and then AGI will create enough economic wealth and prosperity that we can just do carbon sequestration and it will all be good. It's like this is a pretty wild take, but this is what some people believe. And to some extent, meta has kind of...

you know, either for either, you know, they laid off their whole like DEI thing. They, they uncensored politics. There's been a massive vibe shift. One of the things they're doing is like, they're setting up two gigawatts, right? I told you earlier for, sorry, three was like their total global capacity beginning at 24. They're setting up two gigawatts in Louisiana alone for,

basically all powered by natural gas. And it's like, yeah, now they can get in two years, right? And they've like figured out how to get the substations and all that, whatever. They've gotten all the supply chain stuff. And it's like, yeah, if we just throw ESG out the window, can go faster, right? And like you're doing it in states that don't have as much environmental stuff, Texas and Louisiana and so on and so forth, right? Many other states, you know, all over the country, but it's like,

you know, that's part of it, right? Now to this moment, right? Like Google and Amazon are still quite like committed to their green pledges. Microsoft is sort of in the middle and Meta and XAI obviously don't give a fuck, right? So it's like, or they don't, not that they don't give a fuck, right? Like obviously like XAI, like Tesla does way more for green than what they're destroying, right? So it's like all sorts of like, you know, it's complicated, but like,

Is there a way to like change the vibe on those two and therefore they move faster, right? Or is there a way to like make it so you don't have to do crazy stuff like grids are upgraded, power generation is, you know, like on-site power generation, like all these different things like help build solar and wind and batteries faster and cheaper in addition to doing all the gas, right? There's a lot of complicated things to do, right?

on the energy side and there's ways to still make it green and fast right like you could just throw up a lot of solar you could throw up a lot of wind and you know the the power generation of like renewables are like this but wind and solar do not don't actually like correlate that well so then you're like most the times you have power obviously have excess power sometimes um but most of the times you have power and then you throw on like a small amount of battery and now it's like almost all the time you have power but there's still like times where the sun isn't shining and the wind isn't blowing for a long enough period that your batteries are drained so now you have some backup gas

Right. And it's like these are ways to like make it greener and still do it fast. But yeah, I think I think like grid and industrial policy around that is like the biggest like thing that needs to be done to accelerate this. Kind of like the domestic version of now that we've done the the international regulations, it's the next next area of focus. You would think, right. It's like like I myself have like far more of like an attitude of like.

we should win harder rather than we should like try to make them lose. Yeah. Right. Like that's sort of like, I think like most people think that's a better ideal and like what,

stands for more, but you know, we kind of like, yeah. So we have these hundred thousand, uh, GPU clusters obviously now where we're going to see the output of these models at some point in 2025. And I assume that will obviously impact a ton of the, you know, future investments based on what we see there. But obviously people have to invest way ahead of that. And so we already kind of know what we're getting in for, uh, for, for build outs in this next year, as you kind of think, you know, uh, just kind of putting your betting hat on for like two years from now, five years from now, like how big clusters do you think we'll have? Two years from now? Yeah. Uh,

I would imagine it'd be like, you know, like there's a couple of things, right? Like, you know, you could say it by number of chips, but it's not fair, right? A100s were like 400 watts. H100s are 700 watts. Blackwell is 1200 watts, right? You know, sort of the power per chip is scaling rapidly. So saying the number of chips is not fair. Yeah, I guess energy.

Right. So energy is like a good metric because it's like how much useful work you get per minute energy is like growing rapidly. Right. Um, but then like the amount of energy you're devoting to this. Right. So, so 150 megawatts is these a hundred K clusters. Right. Um, we're seeing like, you know, 400, 500 megawatts, uh, cluster sort of being built out now. Right. Like, you know, and, and then, you know, in 2026, I'd imagine it'll be like gigawatt scale clusters.

Right. Or that's what it looks like based on or, you know, one gigawatt in twenty twenty six ish and metas metas like trying to do like two gigawatts by early to mid twenty seven. So it's like and others are all like sort of doing similar stuff in different parts of the nation. It's like these build outs are are rapidly like sort of escalating to like like huge.

earth shattering amounts of like power in one spot. Right. So, you know, and then, and then like, obviously the like dollars per chip and then the chips power, all this, these are variables that change over time. But yeah, the, the, the buildouts are, are scaling, right. When you go from like 20,000 GPUs, 400 Watts each, you know, that was, you know, that was like on the order of like, you know, 20 megawatts to two gigawatts from 2022 to 2026, 27. It's like, Holy crap. You just scaled, you know,

that much power in, you know, two orders of magnitude in, in like five years. So if we're sitting here at the end of 27, how much money is opening? I raised, I think, I think it'd be, you know, really difficult to say, you know, I, I really do think it would be like north of a hundred billion, right? Like that's, that's like, yeah, like, like, uh, it's, it's, it's pretty clear that they've been trying to convince like, um, all these like sovereigns to invest and like, they've been like passive on getting them, but like,

I think like, you know, you're going to see O3 release. You'll probably see another reasoning model release. You might see GP5 release. It's like, how are they not going to raise a ton, ton more money at a ton higher valuation revenue skyrockets, right? Like it's like pretty, I think, I think that's what's going to happen. Um, and I think the similar stuff, obviously opening, I will lead the pack, but like others will also get, you know, similar trajectories happening like anthropic XAI. Um,

I think the amount of capital that will be deployed is going to be insane. On the model side, obviously we've seen this massive scaling of test time compute. It seems to work really well with easily verified data, coding, math. I think there's this big question of what other tasks will fall into these test time compute models. What's your thinking on that? I really like... When opening, I released the five levels thing. I thought a lot of people were puzzled. What the fuck is... It's like reasoning and then agents. Because everyone's been... They start up because systems like...

Chatbots agents, right? And I think that it's like, I think it makes a lot of sense because test time compute, like you're, you're like verifying your outputs and you're making sure they're accurate. Right. So I think, I think like, and these, these like test time compute methods, right. Of synthetic data generation and then throwing away a lot of it and verifying it and using it. Right. This is where other modalities can come in. Right. So like computer use is,

is kind of predicated on all these synthetic data generation pipelines and test time compute working. Agents are all predicated on the reliability of a model getting high enough to where you can actually...

you know, get the answer reliability and chain multiple together. Right. And it's like these different paradigms I think will stack together. Right. Like, it's like, you can't do computer use without test time compute. You can't do agents without test time compute. And when you combine test time, you know, computer use agents, it's like, Oh wow. You know, it's like,

this now it's like autonomously working on the computer without even like, you know, having to like manually code in all the APIs that it's accessing and all this, right? It's like, I think like what tasks it can do will like branch out drastically once you get the core of like reasoning, like sort of not solved, but like at a really strong point.

and the pipelines for doing all this stuff. Right. Um, and so I think that's probably now like what kind of use cases that is, is like, I think it's like whatever, you know, SAS business model, of course, like coding is great, but it's like, what about software engineering? Right. Cause coding is very different than software engineering. Um, you know, it's like, it's like customer service, like great chatbots do that kind of, but like not great. Right. Like, Oh, searching through documentation, right? Like rag is okay. But like, actually you need to like

rag is really bad because there's no way to verify that you properly pulled in the right stuff. Right. Your retrieval actually worked. Um, you know, people can be like, Oh, our retrieval is really accurate, but it's still like not accurate enough for an agent. Um, and so like, you know, when you stack these things together, I think like any, and like, it's really any like information task. I don't see why it wouldn't work now. Obviously what, what gets done in 25, 26 is very different than what might get done in 2030. Um, I,

I don't know if I'm like the best to like say like what specific use case though, right? I think I can just see sort of the writing in the wall is like clear. What about like the open source side? I mean, obviously open source models have been relatively quick to catch up to, you know, the through GPT-4 type level models.

I think there is a big question on, you know, with these test time compute models, even in the regulations themselves, right? I mean, I think the way they framed it is like, you know, open source models, all fair game. And there's much more, you know, whatever the open source models capabilities are is up till then is fine. And then basically it kind of almost assumes there's going to be some massive gap between

between closed and open models that just increases over the next few years. Is that how you see things too? I think so. I agree, right? Like I think meta will release Lama 4 and it'll be better than GPD 4, right? You know, obviously like, you know, you can say open source models have closed the gap, but it's like, does anyone have a voice mode, right? Does anyone have like...

quite the level of inference cost that Anthropic and OpenAI do. No, not really, because their models are really efficient, right? O4O is much smaller than Lama 405B in terms of active parameters, right? Way smaller. So it's like, there are still open models are behind, maybe not in capabilities entirely, but in some ways. I think what actually ends up happening is that reasoning models are...

Not necessarily going to be fully open sourced. Even like, you know, like you see the deep seek model, you see open eyes model, right? The metal will release better models as long as they think it makes sense to, but will they only release chat and completion models or, you know, internally they have their own code model that like is trained on their own code database. And in addition to all the stuff in Lama and it's like, well, that's not being released. And that one's far more capable, right? At least at coding. Right. And it's like, so there's like things like this that like,

I think it meant point to like, okay, meta is not going to just open source everything. Right. Right. I don't think that necessarily like means that like you're putting together the reasoning system. Right. Or the agent system. Right. It might mean here's the based weights. Congrats. We didn't do any of the reasoning training or like here's the instruct model or maybe we even do some reasoning. Right. But it's like we're, we're,

I guarantee you if meta has the best model in the world, they will not open source it. Like that's like, I would, I would bet like, you know, can it feel like, Hey, there's six, nine months behind with a reasoning model. Like, you know, when, when, uh, will there be like an open source equivalent to Oh three and six, nine months? Yeah. I mean, that's, that's a great question. I'm not, I'm not exactly sure. Um, I don't think so actually. Yeah. Um, I think like Oh three, like, especially depending on like what level of like compute you throw at it. Like, like if you looked at the arc challenge, like see,

unlikely to me. Yeah. But at the same time, Google's already got a reasoning model. Anthropic allegedly has one internally that's like really good, better than O3 even. But, you know, we'll see when they eventually release it. Like it's like what actually ends up happening is going to be

It feels like the whole question is how much CapEx is Meta willing to spend to keep up with these folks. I think they'll keep up on compute. It's like the science is hard too, right? It's not just like... Well, on some level of compute, I mean, the numbers you're counting, if OpenAI by the end of 27 has raised $100 billion, that's a lot of money to put toward free open source models. Yeah, I mean, but to be clear, most of Meta's GPU purchases are for recommendation systems on Instagram Reels. Yeah, that's a big benefit, obviously. Right? It's like...

you can say like Zuck can be like, Oh yeah, I bought this many GPUs on like some stream or whatever. But then when you dig into it, it's like, well, more than half of those are like for recommendation systems, like chill. Like they're not like, you know, you're not spending $30 billion on LLM in front and for our transformers, you're, you're doing this on like all your products. Right. Um, I think, I think, yeah, I just think metal won't open source like everything they do. Right. Um, and I think that like,

Llama 4 will be open sourced, but like the moment that they have something at such a capability level, I don't see why like

they would open source it because they get benefit from their get to attract a lot of talent they get to have the whole community tell them what they're doing wrong with their models right like you look at llama 2 llama 3 people are like oh yeah you're doing your um your your rl wrong these ways right you're like lobotomizing the model in these ways and people are like and meta is learning from that right you're like you know i think and they're attracting all this talent that you know

If they have the best model, they're not going to do that. Or if they're like six to nine months behind, maybe they're six to nine months behind on capabilities, but then the cost is way higher for inference. Or maybe cost is the same on inference, but their capabilities are worse, right? And it's like, if you look at any specific benchmark, it's like, oh, five percentage points, great. But it's like those benchmarks that are in the 70s and 80s are irrelevant. It's about like, oh, you're like 10% on this benchmark versus 50%, right? And like when you look at like SweBench and like things like that, it's like, oh, you're- Oh, the filters are getting really big.

Yeah, the deltas are really large, right? Or like arc, right? It's like, oh yeah, there's only like two models that like even score, you know, any reasonable amount, right? It's like, these are the sort of things that I think matter. It's like, can you do a task or not versus like, oh yeah, it's 5% better at this task. As you think about like the broader, you know, all the kind of big questions you're thinking about, I'm wondering if there's like, you know, a set of two to three questions that are like,

Most, if you could just like zoom forward two, three years and get the answer to them, you know, I assume one of them will be the performance of these models on the hundred thousand GPU clusters. But like, what else is, do you feel like are the big questions that if you could flip, you know, if you had a time machine go forward two, three years, you'd most want to know the answer to?

I think a lot of it would be like how they do stuff right in the model level. Right. Cause there are a lot of outstanding questions on like with model development, how that changes hardware development, like how that changes networking development, like how that changes data center build outs. Do you need all the data centers to be right next to each other? You need them to be, can they be geographically spread? What level of bandwidth do you need between them? What model, you know, research stuff comes out, right? Like if a year ago you explained reasoning and test time compute and how that works,

to me, I think like that would have been huge, right? But like, I didn't know that was even the question to ask, right? Like, okay, I knew synthetic data was a thing, but I didn't think it was like that huge of a deal, right? It was a big deal, but not like, that was more for like distilling stuff, right? But it's not, right? There's a lot of other use cases. So it's like, I don't even know what question on the model side is like the important one, right? Like what really gets agents to work reliably and well, right? What is the trick specifically

that gets computers use models because anthropics computer use model today sucks. Right. But like, I mean, it at least is something right. No one else has something. Um, but now that you have something, it's clear to see like how you can like climb up and be like, Oh, this is good. So what, what are the tricks that people figure out and how does that impact? Right. Like what is the average, like when you're doing inference, like what is the average sequence length now? Right. Like is reason what's like, what's reasoning or what's like test time compute in terms of like search look like, right. Like these are questions that are like, I don't even know what the answer would be. Um,

And I don't know, I guess I'm definitely not smart enough to know what question to ask in two years. Like besides like, what is the secret? What is the biggest development? Right? Like it's like, ah, with this development, I guess of, of, uh, you know, more compute being used to create synthetic data and a difference like, you know, that obviously has massive implications on the data center hardware side. It feels like we might be able to get to potentially more distributed. Like for example, right? Like when you're doing synthetic data generation, you're

You're verifying which reasoning chains that you've generated are good. You're throwing away a bunch of them and then you're training on them and you're grading them with all these reward models and you're training on them. Like how much of this is online versus offline, right? Can you fully separate the synthetic data generation from the post-training like RL with all the reward models? No, you can't because you need to update the model that's generating the reasoning data constantly. But then like how often do you need to update it? Do you need to update it every –

30 seconds, right? Like if you look at like training of Lama, I think the way the gradients got updated every 15 seconds, right? And it took a few seconds and then you're training for like 10 seconds or 15 seconds or whatever, right? And it's like, oh, okay, you're spending this much time in network versus compute. Like I can like think about how you need to design your network now. But what is that ratio? What does that mean on

reasoning, like when you're generating that data, how much time are you generating data and then doing post-training and then before you update the model and send it back to the data generation nodes to do it again and grading their outputs. This stuff, it's kind of unknown to me and probably unknown to anyone besides OpenAI and Anthropic and SSI. What is actually the future and what matters there? What can you...

Like, how would you design a cluster to accommodate that? Or how would you design your network? Or can you be not co-located, right? Maybe just really, really good fiber layouts between these two data centers is good enough. Or maybe they do need to be co-located with super high bandwidth, right? Like, these are sort of questions that, like, I don't know, right? The paradigm exists, but, like, there's so many details.

Totally. No, I mean, I think it obviously has implications. I feel like you must get asked all the time. And I think you've talked about very publicly on other podcasts. You know, I'm sure people are like, well, NVIDIA, like, you know, paint the likeliest picture where they lose or, you know, what are like the actual, you know, long term moats here. And I know you've talked about, obviously, the hardware, software, networking side of things. I'm sure you've seen a bunch of hardware startups and folks going after different parts of it. What of that is like most interesting to you?

I think there's a couple of classes. There's the old AI hardware startups, Cerebrus, Grox, Menova, TenseTorrent, quote unquote old. They've been around for a while. And then there's a new age of them. There's Etched and Positron and

Maddox, and there's a number of other ones. There's a couple that are some pretty high-profile ones that are, I think, still under stealth, so I won't say anything. But there's quite a few AI hardware startups, and their approaches are quite different. The way they think is very different. What they're focusing on is very different. Each hardware company has a quote-unquote gimmick, right? Because, like...

you're just not going to beat NVIDIA by engineering better than them on all axes and doing stuff the same way. So you have to do things like quite differently. Meanwhile, NVIDIA every generation is like making, you know, large enough changes to the architecture. So it's like not necessarily like the same, but, you know, they're making large enough changes to their infrastructure. And so, you know, the question is whether or not like

you know, what are the like quote unquote gimmicks people are doing? How are they solving them? And or do those give them like, does that differentiate them enough? And then at the same time, you have the like big problem of like all the models are being developed on NVIDIA hardware,

cognizant of what's going to happen with the drawbacks and benefits of what NVIDIA hardware does and what's the next generation one going to do. And so like all the model research ideas, like, hey, this research idea would work, but it's like, if it runs really inefficiently on GPUs, I'm not going to pursue it because I don't actually care about like how many operations I did. I care about how much time it took, right? And if like, even though that is theoretically 10x less operations, but it took 10x longer because it's just an algorithm that doesn't work well on GPUs. And all of a sudden it's like...

Like all of a sudden it's like, this is not worthwhile pursuing. Yeah. So the entire research landscape is, yeah. So like, like Nvidia's dominance influences where research goes. Right. And so like, you have to be different, but you can't be too different because then you're like, you know, like, well then models didn't develop the way I wanted them to. And now I'm SOL. Right. Like,

Could you have a model that's like extremely, you know, far fewer parameters, but needs way more bandwidth and, you know, could run all entirely on chip potentially, but like, how are you going to train that model? Right. And it's like, well, I'm going to train it with NVIDIA GPUs, which doesn't make sense because I can do something better. And how am I going to deploy it? Well, I'd rather deploy, have it be able to be deployed on all the GPUs I've already bought. Right. So it's like, you know, you've got the chicken and the egg kind of problem too.

I think some people are explicitly targeting inference and training, and that'll be interesting because...

Um, when you talk to like, you know, Nvidia, you talk to the chip teams at various hyperscalers, like the TPU or tranium or, you know, et cetera. Right. Like these various, like one of the labs has an AI chip team, like all these companies, all these people believe that training chips are inference chips, which is very funny. Um, and I think it's because like they, they see the flexibility, they see like the workload changing. It's like, so like does an inference chip, like what does it materially mean? Cause there are a few of those in works. Right. Um,

by the time you get your chip out, maybe you projected a five X cost difference or 10 X cost difference versus Hopper. They've already captured it. But now like Blackwell and videos pitching 10 to 15 X improvement in costs. It's like, well, you know, they're massaging the numbers marketing. Actually, I think they said 30 X in their GTC, but like, you know, massaging the numbers, 10 X, 15 X even as like, probably could be like reasonable or not. Um, depends on the workload standard chat inference. It's probably not that, but for like, uh,

for medium-sized and small models, but for really, really large models or reasoning models, it's actually, I do think it's going to be like 10, 15x. So it's like, oh, wow, okay, so you've narrowed the delta massively, right? So that's like a big...

That's a big like question also is like, so, so what is interesting is like, yeah, a lot of these companies are taking approaches that are super cool. Right. Like, and the three that I mentioned, the new age ones like Maddox, Positron and etched are taking approaches that are super cool. Right. And could work out, uh, or models could develop in a way that isn't good for them. Right. Um,

So that's that's the big question, right? I think you've called Tranium like the Amazon basics of chips. And so you're obviously the hardware guru. But like one thing I've just been curious about is like to what extent is Anthropix use of them like a sacrifice versus, hey, it doesn't really matter.

Um, yeah, so it's, it's the Amazon basics TPU because like it's topology of networking looks exactly like a TPU and like a lot of different design choices they have were like, this is just like TPU stuff. Right. Um, obviously they're doing a lot of their own independent stuff. There's a lot of unique things, um, that they've implemented like stochastic rounding and all these other things. But like, it's like, you know, these are unique things, but, um, it is the Amazon basics TPU, right. Uh, from Google, right. Google TPU. Um, and so like, cause

Because it's worse, but it's cheaper. Now, Anthropic, they barely use Tranium today, right? They barely use Tranium in Inverentia. They've primarily used Hopper. Hopper at Google, Hopper at Amazon. They did use some TPU at Google, of course. But now this year, it's very clear they've bet all of their eggs into the basket of Tranium.

How much of this is forced, right? Like, you know, how do I do, did I need to do this to get Amazon to invest in me and like make me their partner? Partially. Yeah, absolutely. Absolutely. Right. But did I do this out of my own volition because it actually made sense? I think partially. Yeah. Because like the Amazon basics TPU gives you on some metrics, it gives you the best, uh,

performance per dollar, right? And those metrics are like memory bandwidth and memory capacity per dollar. It actually has better than any other chip in the market, including TPUs or GPUs, right? Because it's like some of the like supply chain choices that Amazon chose, right? Yeah. And things around that, right? So it's like, what...

you know, I think, I don't think it's as simple as like Anthropic was forced to. I think, I think like they had to weigh the benefits, right? They're like, we could partner really closely with Amazon. We could raise more money from them and do this. We're going to raise a little less money and, and like try and get GPUs from them. Right. It's like they had to partner with someone to do the CapEx of course. Right. Cause the whole like rental versus like CapEx of like the cluster. Um, so they definitely had to deal with that. And then they had to have the distribution channels of Amazon. Right. They needed those things. Um,

But like, I'm sure they could have struck a smaller deal with Amazon and gotten GPUs if they really wanted to. Now, was that better or was it better to like take more money, go Tranium, go all in, you know, do like try and emphasize your future model architecture on the benefits that Tranium offers you versus GPUs? Because there are some, right? Majority, no, they're worse, right? Just straight up worse, but some benefits exist. So that's the sort of question that...

You know, it's a good question. And I think mostly like it's like hand wringing and a bit of like, yeah, we can engineer our way out of this. Does it make sense for open AI to like build its own ship? I think it does, right? I mean, like it doesn't make sense if you think like,

you know, their growth is like linear, like, you know, it's going to go from like, you know, you know, like, what do you think open eyes revenue this year is going to be? If it's less than $10 billion and the year after that, less than $15 billion should not make a chip. Right. But like, if you think their revenue this year is going to be north of $10 billion by $10 billion, I mean run rate. Cause for

we're, we're, we're at a VC, right? We don't, we don't think about things in a, no run rate is more important. I think it's just simply more important, right? Whatever the higher one is, we'll take. Yeah, exactly. Exactly. I mean, it's like more indicative of like what the current state of the business is. Right. So like, you know, if they go like well north of $10 billion on run rate and the year after that, what are they at? Like north of 20, like, you know, you know, the costs start to scale rapidly and like, it makes a ton of sense to make their own chip. Right. And so like,

Just because they have the chip team doesn't mean that they're not collaborating super closely with NVIDIA and working with them on new architecture. There's a wall in OpenAI. The people working on the chip do not get to work with NVIDIA on what their next generation architecture is. NVIDIA is no stranger to this competition. All of its major customers. Exactly. Meta's doing it. Google's doing it. Amazon's doing it. They're used to it. Their attitude is very much like,

Good luck. We're going to crush you. Right. Like we're just better. Right. And like, so far that's worked. Right. Um,

And I think their attitude to OpenAI would be the same. And OpenAI's attitude is always like, well, you know how much... It's the same as the hyperscalers, right? Like, well, if we spend a few hundred million dollars, billion dollars a year on this, it gives us a call option. Also, we can always just use it as a negotiating chip, right? Like, if I spend a billion dollars on my own chips in 26, 27, right?

But then, you know, the alternative is, you know, and, and, and I'm not really deploying them in huge scales. I'm mostly still buying Nvidia chips, but if I got a 10% discount on Nvidia chips, it paid for itself. Yeah. And it's like, did,

Does Tranium help Amazon get a 10% discount or a 5% discount on $30 billion of GPU purchases? Like, okay, great. It's like worth it, right? And it doesn't even... Like, you could just not do anything with it and it'd be worth it, right? Like, obviously, it's a game of chicken. Like, Jensen knows how much you're making and all these things. But yeah, I think that's the big...

And I think, I think from my perspective, it absolutely makes sense, even if it's not successful to do this. And if it's successful, it's humongous. Yeah. There's this whole crop of companies too that popped up, like, you know, the fireworks and togethers of the world, you know, that are, they're obviously just, you know, focused on inference optimization today and places to run a bunch of models. What, what happens to that market long-term?

I think that Enterprise AI is actually got like really decent legs in some regards.

Because of, again, all the unique data that enterprises have and unique use cases. And enterprises like to think their data is good, but now most of the times it's pretty dirty and garbage, just to be clear. But now with the synthetic data pipelines, reasoning, all this sort of stuff, people can actually figure out how to generate a bunch of data that's specific to a business and verify whether it's good or not. And then they can actually improve the models and they can do...

reasoning on a much smaller scale for enterprises. I think like this works hugely, right? This is massively good. Right now, obviously like fireworks and together, I think are like, there's like 20 inference API providers out there that are just taking open source models and serving them.

You obviously named the best two in class, right? Two companies that build their own inference engine, are very efficient, much more efficient than VLLM or Tensor RT-LLM, all of these sort of things, have good go-to-market, all these sort of things. These companies are good. I think they will be good because they're going to continue to be A, Meta is going to open source models, B, China will continue to open source models, C, enterprises will be able to work with people to make

better, you know, reasoning type models right now, like together helps people train models, right? And so does Databricks and like sort of like all these companies like help train models and there's many other like sort of services or like people can partner with a startup to help them make a reasoning model. I feel like the whole, yeah, the whole mosaic pitch was like, you know, train your own models. That went out of fashion for a bit because the general purpose models were just getting better at everything way faster than like a Bloomberg GPT was. But now with these reasoning chains that need to kind of

verify this, you actually might be back in that world. Yeah, yeah. I think, like, Mosaic really lost its kind of, and Databricks kind of lost its footing for a bit. It'll be really interesting to see if they can get RL and reasoning to work because I think that, like, you know, if, like, and I'm sure Ali at Databricks kind of knows this is, like, you know, this is the thing that, like,

would make training for your customers make sense again. And maybe not pre-training, right? Because that's what Mosaic was doing. Mosaic was doing really, really good pre-training. You know, they had the best model in the world twice, two different instances, open source model for short periods of time, but they had the best model in the world for short periods of time, twice, like once with MPT and then once with DBRX. But the, you know, like,

They've kind of fallen off because it's like, well, I can't really compete and spend the capex that meta is, right? Or, you know, even like Alibaba, right? So, but now, again, right, it's like I take the open source pre-trained model, I generate all these pipelines for reasoning, synthetic data, and my customers' data, again, like, you know, it kind of sucks, but like, you know, now I can like verify it in some way, figure out the recipe and like actually apply it to everyone's business use cases. I think that is...

That is like potentially, and these people need to serve models, right? So it's not just like Databricks, Mosaic ML that's going to do this, but like everyone's going to do this. There's going to be startups that do this as a service, like Adaptive ML, like they're going to do this as a service and they've actually been doing it as a service for like a year now, which is like really cool. Like these people can partner with enterprises. They can do white glove services. They can do consulting services, like whatever it is, help them build models that like are uniquely customized

you know, unique use cases for this stuff. Right. Um, and then, and then like those will need to be deployed, right. Or companies will want to do it on their own. Right. There's always going to be, and there's always going to be like security stuff. Like, so I think, I think like together and, and fireworks will be like the partner of choice for inference of these models. Right. Even if it's not necessarily on their GPUs, it's like, let me, you know, do a virtual private cloud on Amazon with your software, uh,

running the model, right? Because you've built all these like variable batching and all these like various things and it's way more efficient, right? Than like the public software. I'm going to ask like a very basic question about CoreWeave just for my own edification. And that's just like what made them so successful? I mean, from the outside, sometimes it feels like there was this decision from Jensen to allocate GPUs among a bunch of players. They were kind of

rewarded in that. And I think it kind of goes to, and you're an expert on this, like the extent to which some of these mini clouds can differentiate beyond like just access to the GPUs, which it felt like in the first bit was one of the big things. So I think, I think like part of it is, um,

I think, I think it's, there's three factors, right? One is the allocation bit, right? Like Jensen is no idiot. His customers are all trying to build chips. Um, they also rent most of the chips to other people. You want to have more competition, right? So let me spark competition. Let me do a small investment in like four different clouds. I think I've invested in four different clouds so far. Um, and, and they haven't been large investments. They've been pretty small. Um,

But it's sort of giving other people confidence like, "Oh yeah, they're going to get allocation." And then those GPUs are allocated there, the world's so tight in GPUs, they're able to rent them for a lot of money, make money and get the engine going on a cloud. So CoreWeave now has like 200K plus GPUs, right? A lot of GPUs, right? They're doing billions of dollars of revenue. It's not mundane business now.

And so, and you know, they're worth like whatever, like $20 billion or something. Like I think on the last round, whatever. Right. And they're going to go public this year, probably at like more than that. Right. Well, more than that. It's like, this is a real business now. Right. Um,

How do they differentiate, right? Like one is, yeah, they still continue to have access, newest generation. They're still going to have access. They're still going to, you know, maybe Google gets for every, you know, 10 that Microsoft gets, Corweave maybe in the beginning can get three. Now, at some point, you know, the production capacity is not 13. The production capacity is 100. And, you know, again, I'm just pretending there's only two people. Now, you know, Microsoft gets 90 and Corweave only gets 100.

you know, three, right? So now it's, you know, 93, right? Whatever, right? Like, you know, like the, but like at the beginning is of, of a new GPU is useful life. You know, it is worth the most because at that time you can rent it for way more than over time. It narrows as more and more competitors enter the market and there's more and more of them. Um, that's one. Two is the speed of build outs, right? Like,

Microsoft's data center organization and RAC and all these things, like server organizations, same with Amazon, they're custom. There are tens of thousands of people. They've always been used to modifying servers, customizing them to squeeze every penny out. And in the era of GPU compute, it's like,

Well, what if I just do what Jensen says, right? Like I build, you know, I might make some tweaks. I might put a little bit less memory. I might change the storage around. I might change the network a little bit slightly. But like in general, what if I just take the reference design and like modify it a little bit?

um now i am paying a little bit more right not you know that that but then i have the like time to market right like the moment the gpu is being produced by tsmc it's packaged it can go to the server maker reference design is ready to ship to me right whereas like amazon you know especially amazon their blackwell rollouts are much later than many of the other players because they have to do all this customization yeah right um and and then like lastly like sort of there's like um

I think the other aspect is data center capacity, right? We talked about this ESG stuff. CoreWave gives like, they're very, they've been very aggressive always on like renting GPUs, getting credit at like what at the time seemed like crazy terms, like double digit loans, like what the flip to deploy GPUs? It was the right decision, right? Now they're established and so their loans are single digits again, right? Like, you know, higher single digits, but it's like,

Getting data centers is very hard. And they've been very, very creative. Initially, they were just like getting anything and everything. That capacity all dried out and the hyperscalers woke up and they went to try and get everything and everything. But then, Corey, we've started going to crypto companies, right? Like,

You know, they went to a couple different crypto companies, offered to buy one of them out, offered to, you know, and eventually like just like struck a deal where they're retrofitting all their data centers. Like, yeah, throw all this crypto crap away. I'm going to convert your thing into a data center, right? And one of these crypto data centers has a, like a 200 megawatt natural gas plant in the middle of like seven buildings, right? And each of these buildings is a data center. So they're, they simultaneously, and it's like a natural gas plant. It's like, like meta, what they're doing is like, okay, the gas plant is like,

20, 30 miles away. And I, I, I struck a deal with the solar power company. That's putting, you know, tons of solar panels, like, you know, in a similar distance away, but I'm not necessarily consuming those electrons. That's not clean. Right. Um,

In the case of like CoreWeave, it's like, no, no, no, I don't care, right? Like the power is here, like right on site. I'm converting this to crypto data, these crypto data centers to real data centers. There's a metric called PUE, power usage effectiveness, is basically how much power are you pulling from like the grid slash power generation versus how much are the chips consuming, right? And so, or like the servers, right? And so like the ratio that like hyperscalers try to design to is like 1.1.

Right. Only 10% of the power is lost in transmission and cooling. Right. But like this crypto data center, once they convert it, because it's like a really, like a really dumb design because the crypto people didn't know how to build the data center properly. Right. Um, even though it's being retrofitted, it's still going to be like 1.4. So it's like you go from 1.1 to 1.4. It's like, Oh, if it's consuming 200 megawatts, um,

What was 20 megawatts of power being burned on cooling and electrical transmission and conversion now is like 50, right? 60, 70, right? And it's like, okay, this is a lot of inefficiency, but it's like, screw it. We need GPUs now. It was worth it. I guess, yeah, the inefficiency both on the financing side, on the power conversion side. And then they do like certain things like really efficiently. Like they're just a small organization that's well run. Yeah. Right? And their software...

their cloud software for GPU rental is literally better than Amazon's and Google's. Yeah. It's actually hilarious. We'll have a piece about what makes it better. Just the way they do their managed services, the way they manage their network, the way they do storage. Like there's a number of things here that are just like objectively better with core reef. And it's like, what? What?

how did you guys build a stack that's better? But it's like, it's like, well, you have this legacy and, you know, innovators dilemma and you don't build a brand new thing that's purpose built. You like morph and you have all these like weird requirements for random customers, like random defense company. You also need to make sure your GPUs are rentable to them or like, you know, and it's like,

Core review is like, well, that's like, they're starting the problem anew and they have a brand new team, clean slate, really good engineers. And they're building what needs to be built purpose built for this business. Right. And they don't have like a team of like 2000 people who work on storage architecture that used, that used to be primarily targeted for these kids, this use case and accelerators are different, right? Like what you need an AI, not exactly analogous. It's similar, right? Like, you know,

But they're trying to morph this massive code base to that rather than like ground up, build something new. Right. And it's like, this is the case for every startup in the world. Yeah. Like I think I think like there is innovators dilemma with the big companies. Yeah. What are like, I mean, you obviously must get see also kind of interesting investment plays on this massive hardware buildup going on. There's the obvious stuff. There's data centers, liquid cooling, any like second or third order effects you've seen where you're like, that's a clever way to play the the massive data center buildup.

Yeah, I think like, I mean, like some of the stuff is like yesterday I had like an hour long discussion with a guy who's building a data center in Armenia because there's a nuclear power plant there. It's like, great. This is like awesome. And he's like doing a venture fund is he has a venture fund in New York, but he's kind of just like,

Screw it. I'm going to work on this pretty much. It's like, oh, this is awesome. Right? Like it's like, you know, so there's silly stuff like that. Not silly. Like it could work. Right. Although the regulations kind of screw him. I talked about that. But like, I think like a lot of like interesting plays around networking and optics, right? There's a lot of stuff going on in networking and optics, which is like a big bottleneck. Especially as we continue to scale the density of GPUs and the, you know, the,

the amount of data that needs to be communicated between GPUs as models get a larger and B the context lengths get longer and longer because of reasoning. Um, you know, this is sort of like, and search, like all these things require like more interconnected GPUs. Um, I would say that there's a lot of interesting stuff around like transformers, right? I told you they're sold out for like four years, but like there's all sorts of interesting startups working on like solid state transformers and like partially solid state transformers and like all sorts of like hardware stuff. That's like sexy, right? Like there's like

There's funny stuff. There's cool companies on carbon sequestration because yes, Meta and Microsoft, Meta especially and XAI have said, screw the environment. Not screw the environment, but we're going to build AI as fast as we can. They're at some point going to be like, yeah, let's be green again. Carbon sequestration stuff wrapped up in a data center play is actually pretty interesting. There's all sorts of

interesting stuff around you know storage what companies are doing there uh because video models are going to require a lot of like interesting work there right like that's like you know purpose-built solutions for the plethora and sea of innovation that's happening um and targeting all these things i think if you look at every layer of the stack there's more innovation happening today um you know not just in each stack but also like bridging multiple layers of the stack um

to come up with a huge improvement. Some of that is happening in the big companies, some of that is happening in startups. I think some of these startups will be acquired by bigger companies, some will actually find a product market fit. I think some of them will license technology. Whatever it is, I think there's a lot of hardware technologies that are interesting. And then I think the really fun area is

software infrastructure is also getting very diverse because not everyone can just build their own stack. I think that where people are very in tune with the hardware,

understand how to make things run fast. Also in tune with what's going on with models and like bringing, you know, you bring your data and all your capabilities and I bring you something that lets you build and serve your actual use case. Right. So this is software infra is not, you know, was not always sexy. It was always like, you know, SAS was more sexy or like, you know, like and, or like, you know, some sort of like

I mean, I guess software infra is to some extent SaaS, but not really. I think these are the sort of the businesses where there's a lot of innovation happening as well. The infra layer has just been so hard to invest in because the models change so quickly that any kind of like scaffolding or things you're building for this current generation of models, you're like, is that relevant to you?

in six, 12 months. This is why like some companies have like, you know, bet on things that were like further out. Right. Like, like, like I mentioned the adaptive, right? Like they were, they were working on pre they like, it's like a few French guys who moved to UAE and they built the Falcon model in UAE. Right. And then they like moved to the back to France and now they're in New York. Right. But like, while they were building the Falcon model for UAE, they were doing pre-training. They did 180 billion parameter model to Chinchilla optimal. It's like fun. Right. It was not like, it was a good model when it got released. It was just like, you know, whatever. Right. Like Lama just came out too. Right. It was like,

it was like really good, but I was like, yep. They're like immediately pre-training doesn't matter. Right. It's like, we're going to work on synthetic data and reasoning. It's like, what? Okay, sure. And then like for a while, I was like, what are you guys doing? Like, I don't know that. And then like all of a sudden it's like, oh wait, wait, this is right. Like, so you have to like, you saw something. Yeah. It's like, it's like, there's a lot of companies that are like,

doing something that may not matter today and you just have to like, doesn't seem to be the focus, the popular thing today, but could be and will be in the future. - 100%. What about, what do you think of like these AI hardware efforts? Like folks that are using, you know, AI for chip design or, you know, circuit board layout or some of these other things?

I think it's really interesting that YC all of a sudden decided to invest in 12 different AI for chip design companies all at once in the last few batches. A couple of them are cool. I actually even invested in one of them myself, personally. Just to be clear, I'm not hating on the idea. It's just very interesting that this happened. But like...

You know, AI chip design is very expensive, right? It's very difficult. The number of engineers working on chip design has not grown that meaningfully over the last two decades, right? In the, in the US. So this is the start of like every AI app pitch. It's like whatever insert your profession here is stagnating in growth. Right. But, but the output has been dramatically improving, right? And that's because of EDA software, right? Electronic design assistance.

And EDA software, as much as you can hate on it, as much as you can say it's like old shitty software, it's actually really awesome because it's like the productivity gains that chip designers are getting year on year are meaningfully large, right? And that's pre-AI. Now, Cadence and Synopsys and Siemens are all, Mentor Graphics, right, are all investing in AI for chip design, but there's also a lot of companies doing it, right? NVIDIA has written multiple papers about their internal tooling. And they've hidden a lot of it too. They haven't said a lot about it. Google, same way, right?

This is absolutely going to be huge. Now, is AI going to chip design chips right away? No. But is it going to be a force multiplier for a very like in demand profession? Like, and yeah, like, and like the other thing is like today, like a lot of workloads just have to use like, like CPUs and then GPUs are the general purpose parallel computing platform. Right now, obviously they're generally making more and more enhancements for certain types of AI models, but like GPUs are not good at running CNNs. Yeah. Right.

Right? Like, and this is a conscious design decision by Nvidia, right? Like they're okay. They're good, but they're not like amazing at running CNNs. Like they aren't transformers. Right. And, and this I believe is like a conscious design decision. Um, what, you know, how do you go from like CPU, which is the catch all general purpose GPU to like general purpose for parallel processing to like, what is the next step? And like, and Nvidia is like branching their GPU architecture out. They have different like architectures for like automotive versus, uh, gaming versus, um, yeah. Yeah.

you know, pure data center AI. So they're not like just like sitting flat footed, but it's like, how can we proliferate architecture to target use cases better? And today for it to make sense for you to design a chip for a TAM, that TAM needs to be billions. If you can bring down the cost. Right. So like Google made a YouTube chip, right? Because that was a billion dollar, multi-billion dollar TAM for them alone internally. That's what ASIC made sense. But when chip design costs can drive down and it's like AI for chip design will do this, right? Like I just think it won't be like

you know, it's not going to go from zero to one. It's going to be like a force multiplier. And I think it's, it's, it's one of the areas I'm super, super excited about, whether it be floor planning or, or RTL generation or many other things like verification. Verification is like half the spend of a chip design. It's like quite funny as like more than like half the cost is making sure what you designed actually works. I was like, okay, interesting. Like there's tons of like stuff there that I think,

could be done. And yeah, it's like, I'm very excited about that space actually. I feel like this has been a fascinating and really wide ranging conversation. We always like to end with a standard set of quick fire questions. And so maybe to start, what's one thing that's overhyped and one thing that's underhyped in the AI world today? Overhyped rag. Um,

under-hyped various other forms of like semantic search and like retrieval. There's some interesting papers that came out recently. Will model progress this year in 25 be more or less or the same as in 24? More. What's your weirdest prediction on like the implications for all this AI progress on the future? I think, you know, there's a lot of like worries that like there's going to be huge inequality. I'm actually like on the flip side, I think that like

blue collar type jobs that were generally hated on were like are going to do well until robotics comes which I think is pretty soon but you know so for that but like but like no but like I think that like generally the poorest people in the world will also improve in quality of living massively right and it's not going to be like a have versus have nots ultra elite versus not I don't think that's going to be the future

Yeah. I mean, I guess you better hope the inference costs in these reasoning models comes down. Yeah. I like that. What is opening eyes thing? Intelligence per dollar, like maximizing intelligence per dollar. And it's like, well, right now it's really freaking expensive, right? Yeah. And I expect the models to get even more expensive, right? Like a query from like GPT-505 or whatever, right? Like reasoning version of GPT-5 is going to be like

dozens of dollars, if not hundreds of dollars, right? But it's good. The work it puts out is going to be crazy. Do you have a go-to query you use when like a new model comes out, test it out? It's funny. I actually asked them about semiconductor manufacturing because I know the training set quite well. I've read a lot of the papers and then I see if it actually understands it. And a lot of times models get it wrong in

And today, even to this day, Claude is the only model that you can like, once you create enough, can get the patterning stack for lithography, like somewhat accurate. And that's just because no one publishes on it. So it's like, it has to like understand the papers and infer what people are actually doing, which I thought was very interesting.

That is a cool way to do it. And we've talked about a bunch of different parts of the AI startup world, but any like spaces within the AI startup world we haven't talked about that you find particularly interesting or exciting? I think there's a lot of cool stuff happening on the distributed training side, right? Like with noose and with, uh, um, you know, although I'm not, you know, like, you know, there still needs to be a lot more to prove what they did is like good and real, but like very exciting. And then prime intellect, same thing. Um,

And yeah, they also like, you know, like are doing really cool stuff. I think distributed training and inference is like going to be really cool. I think that I think that a lot of these like multi agent reasoning startups that started a while ago, but have actually been focused on reasoning like like like synth labs and adaptive and like people like that rather than like, you know, agents and pivoting to reasoning because everyone's doing a reasoning startup now. Right. Like I think that's like a really cool area. Yeah.

And I really just like people who in startups that are like, have been singularly focused like forever on the same thing. And like, they're like, and they're right. It's just like, you don't know when they're going to be right. Right. And it's sort of like, you know, me be loving chips so much, you know, I had a good business, but like, it did not pop off until like, you know, AI, like there were a few people working in my company, not 20, right. So sort of like same thing as like being so internally focused and good at it. But then all of a sudden the time does come and you were right. Right. Like, it's like, this is like a,

Those are the kind of startups I like the most. Not necessarily people who are like outsiders to a field and just want to revolutionize it. Well, we always like to leave the last word to you. I can't imagine there's much of a Venn diagram of folks that listen to this podcast and aren't subscribed to Semi Analysis. So I assume some folks are at least a high level familiar with you, but...

Uh, would love to just, you know, where can folks go to learn more about you, the work you're doing, any plugs you want to make, uh, the floor is, uh, is yours. Yeah. I would say if you want to see the serious side, go to the website, uh, check out the newsletter or check out the institutional sort of products, which is like the main business, like the data sales and consulting. Um,

And we sort of have industry experts all the way from... Former industry experts all the way from have worked at ASML and LAM Research and Intel all the way up to Microsoft and NVIDIA formers and everything in the middle. And then also half the company is ex-hedge fund people. So it's a cool mix. And then we also have... And then I would say if you want to see the lighthearted bit where I will...

tweet jokes like, hey, AI regulations came out on Monday, Arsenal for democracy. Oh, look, Israel-Palestine ceasefire two days later. That's an absolute joke, but that's what I post on my Twitter. I think there's a range, and that's Dylan522p. I think that's the shilling I'll do for myself. Amazing. Well, Dylan, thanks so much for coming on. This was awesome. Yeah, thank you. Thank you. Thank you.

Ep 53: SemiAnalysis Founder Dylan Patel on New AI Regulations, Future of Chinese AI & xAI’s Scrappy Surge to Hyperscale 01:24:15 Share

Unsupervised Learning

Deep Dive

Shownotes Transcript

Ep 53: SemiAnalysis Founder Dylan Patel on New AI Regulations, Future of Chinese AI & xAI’s Scrappy Surge to Hyperscale