Hello and welcome to a free preview of Sharp Tech.
And hopefully a better job by the commanders next year, but a hell of a ride. So I'm not mad about it. On that note. We're just mad about the Philadelphia Eagles. It's really upsetting. Our good friend, Spike Eskin, I spent six days talking trash to him and it really got out of hand in the second half there. So I'm ready to move on and let's get into this show. We'll begin with a question from Jeremy. He says, hi, folk.
Ben discussed DeepSeek's shockingly efficient ability to train models on the last podcast and
But I haven't heard anyone discuss whether DeepSeek may simply be lying about the $5.5 million training cost. If they were forced to train their models in brute force fashion, as we originally assumed, I imagine they wouldn't want to report this and therefore indicate that the chip ban was working. Nor would they want people to know if they found a workaround to the ban, such as training through a U.S. shell company or foreign data center.
So I'm going to go back to the first line of Jeremy's email here.
DeepSeek's shockingly efficient ability to train models was discussed on the last pod, but I haven't heard anyone discuss whether DeepSeek may simply be lying. So Jeremy...
I just want to say, hey, if you read your tech story, you were on it early before everyone else. Indeed. I mean, people, Jeremy sent that email on Thursday, I believe, and people then spent about 72 hours arguing about whether to trust the representations made by DeepSeek. Do you want to weigh in further on the cost question after a weekend of heightened DeepSeek scrutiny?
Yeah, so I actually debated including this question when I wrote about it on Tuesday because Dylan Patel, who's kind of like the authority on what resources companies have sort of all over the world, SemiAnalysis.com, they have a whole model. Actually, they make a lot of their money by selling these models that are much higher price than the subscription. That tells you down to the detail what all these data centers have.
in them. And he said back in November that they have like 50,000 H one hundreds, uh, which are the more advanced, uh, or, you know, at that time, the most advanced video chip, which theoretically they're not supposed to have. But I would say the reason I didn't include it is because,
At a high level, I think it's a moot point. And at a lower level, I think there's reason to believe them. But let's start with the chip band question. The chip band went into effect. So they claim they're using H800s or H100s, which is the or whatever.
H 800 is the one that NVIDIA developed as a workaround to continue selling into China, but it's not as performant as H 100. So the, the main limitation on the H 800 is it has similar processing speed, but it has more limited memory bandwidth. So the initial version of the chip band targeted the memory bandwidth. And I thought at the time, this was a very clever sort of approach because you needed, uh,
where you really need the memory bandwidth, the ability to link tons and tons of chips together is for training these very large models. And the breakthroughs that they talked about are absolutely
are actually explicitly about this point, which is they figured out ways to have to need sort of less overall memory bandwidth. This gets into their sort of mixture of experts approach that I talked briefly about sort of on the last episode. Basically, the idea is you have a super large model, but most of the time you're not using most of the model. You're only using some of the model.
And so GPT-4, for example, was a mixture of experts model, but even there, their experts were fairly large. What DeepSeek has done is they have way more experts that's divided much more. And then they have a very clever approach to load balancing where they decide which experts to call. They have multiple copies of some of the most used experts so that you're not sort of constrained on the ones that are being used all the time. And sort of this, a very clever sort of load balancing approach that means they need a
less sort of memory at any given time. And this applies to inference as well, to sort of as far as these models go. They also did a lot of very clever, very low-level work to optimize their communications layer. So they're actually, they went below CUDA to like near assembly language level to program. So you have all these shaders on a chip, all these GPU units, and
And they have some number of them dedicated to managing communication that is sort of independent and more fine-grained than what you're sort of getting from NVIDIA. Okay. And so it's a very interesting approach. They have a bit in their paper about V3, basically, or it might have been V2, I can't remember, encouraging GPU makers, look, you should actually do this for us. We shouldn't have to go to this low level to do it. And basically encouraging...
GPU makers to change slightly how they architect their chips. This is legit stuff. It's super high level. And this is where the sort of capabilities of their team are real. Everything they put forward as to how they did this is very plausible and is very plausible that it was done using these chips that were...
not a violation of the chip ban. So is it possible that they also have all these chips? Yes. Is this price, the price that it costs to create one of these models? No, like it doesn't include the R and D costs. It doesn't include all the runs they did to figure out this, this, the way to do this.
All they calculated was the marginal cost of the final run that actually produced the model, which, by the way, they were clear about in the paper. This isn't like you only need $5 million to make this model. It's the final run only cost or $50 or $5 million, which is, again, it is a plausible number. So I decided to not wade into it because what they do is plausible. And also, they're selling inference here.
which again, a lot of the things they're doing take advantage of these, particularly this mixture of experts approach. There's another thing they're doing where a big part of the challenge is you have to store all these parameters, particularly anything that's in the context in what's called a key value store. And you're storing all these numbers that you have to have awareness of all the time. And so when I've talked about the challenges of inference, that it's really a memory issue. Like this is why like the, we talk about Apple's,
Apple's chips are very compelling because they have this unified memory architecture that means the amount of memory available to an Apple graphics chip and their neural processing unit is way higher than anything else. It's much higher than like if you get an NVIDIA gaming GPU, for example. And so good memory and it's this key value store is a big thing. They figured out a way to instead of storing the whole key value store, you actually store a compression of that that basically represents a bunch of these multiplications and
And you get about 90, 85, 90% of the accuracy, but it's good enough that lets you do way more with way less memory. And, and, and again, all these, these people are out there right now trying to reproduce that and they've reproduced some of the steps, but.
But it's all plausible. And so is it possible that they have all these chips? Is Dylan right about that? I think probably yes. Like we know we have all these cells to Singapore of these chips that exceed the data capacity or the electricity capacity of Singapore. They're not going to Singapore, right? They're probably going to China. So is China getting chips? Yes. Are they maybe using more powerful chips than they said? Yes. Yes.
Are there real legitimate breakthroughs that make it plausible that they're drastically more efficient than the other providers? Also, yes. And the other thing that goes into these comparisons is I just mentioned the inference bit. They are offering inference at very low prices. Now, are they being subsidized to make those prices much lower than they would be otherwise? Possibly. But they claim they're doing it profitably.
It's plausible they are. And also the thing people are missing. Whether they are or not is a big question though, right? I mean, that's why there's now more attention on whether or not this story is actually what it appears to be. Well, I mean, the problem with doing the comparison points to OpenAI's API pricing and Anthropic's API pricing is OpenAI and Anthropic,
are, and this is not reporting, but as far as I understand it, making significant margin on inference, like software-type margins. So they're charging way more than their costs are. And so I think if DeepSeek for inference is just charging slightly above their marginal cost...
And OpenAI and Anthropic are charging way more than their inference costs. And by the way, what happens after this announcement, suddenly 01 mini is going to be available to free users. And there's going to be way more like, so they clearly have some latitude of pricing like of margin, you know, that can be sort of given away. And so I don't think it's,
I don't think it's fair to say that it's 30X more efficient or 45X more efficient because we don't know the true cost of Anthropic and OpenAI. I think it's definitely more efficient. It's probably not that much. And by the way, Google...
which by the way still exists, they came out with their Gemini sort of thinking model also this week whose prices are actually comparable to DeepSeek. And we know Google has very efficient sort of a very efficient cost structure. They have all these TPUs, things along those lines. So...
Is it plausible that they're lying? Yes, of course it's plausible. It's China. It's sort of, you know, and we can get into some, like, I think the meta reactions in some respects more interesting than the details here, which again, strategic color covered on Tuesday. Like everyone here listening should already know, know most of this, but the, the, so the meta reaction is interesting, but the, the, the plausibility is there. And I actually don't think the,
I think there's a lot of cope going on, to be totally honest. No, I definitely think people are now over-correcting, and at least over the weekend, there were people on Twitter basically treating the
idea that deep seek may be using 50 000 gpus as evidence that all of this is overblown and not as well here's the thing if they're doing 50 000 gpus they still created this model which is excellent right like like that that's the that's the thing now here's the anti-deep seek point which i think all the people that are going crazy about china has passed the u.s are missing
All right, and that is the end of the free preview. If you'd like to hear more from Ben and I, there are links to subscribe in the show notes, or you can also go to sharptech.fm. Either option will get you access to a personalized feed that has all the shows we do every week, plus lots more great content from Stratechery and the Stratechery Plus bundle. Check it out, and if you've got feedback, please email us at email at sharptech.fm.