I basically was trying to help my organic search ranking of my little YouTube tool. And then it's like, in the process, I may have inadvertently contributed to $2 trillion getting wiped off global equity markets. Because, you know, the fact is, all of the news headlines came out saying, the stock market crashed because DeepSeek. I'd like to point out that the DeepSeek V3 technical paper, it came out December 27th.
A month ago.
So they can understand it. And I published it like in the middle of the night on Friday. And then it started taking off.
And then it got shared by Chamath, who has, you know, whatever, 1.8 million. Right? And Chamath, and it's been viewed over 2 million times. Naval Rav account has 2.5 million. And then, like, the Y Combinator, Gary Tan and the Y Combinator account, between them they have millions of followers. And not only did they share it, but they were, like, very effusive in their praise about this is really smart. And that went crazy.
Everyone is talking about this new DeepSeq AI model from China that is reportedly 45 times more cost efficient than US-based iModels and charges 95% less money to use than ChatGPT.
As a result, NVIDIA is down 20%, wiping out $600 billion in market value. And both OpenAI and Meta's AI labs are scrambling to discover how a relatively unheard of Chinese AI lab was able to outperform their very expensive models with a Chinese-grown model that just cost $1.
$6 million to train. The guest on the show today is Jeffrey Emanuel, who actually thinks that this part of the story, the DeepSeek AI model part, is over-indexed on, and it's actually a confluence of other factors that is contributing to the unbundling of NVIDIA's market share.
And it's not the release of DeepSeek that triggered the 20% drawdown, but instead a 12,000 word article that he wrote on his blog that quickly went from just a few handful of readers to over 2 million readers over the weekend that actually coincided with the 20% drop in NVIDIA price when the market opened up.
on Monday. In this episode, Jeffrey and I go through his article and reasoning behind why Nvidia is under threat of getting unbundled by other chip suppliers in addition to DeepSeq's impact upon the entire resource supply chain of training and inference around LLM models. Let's go ahead and get right into this episode with Jeffrey, but first a moment to talk about some of these fantastic sponsors that make this show possible. Are you ready to swap smarter? Uniswap apps are simple, secure, and seamless tools that crypto users trust.
The Uniswap protocol has processed more than $2.5 trillion in all time swap volume, proving it's the go-to liquidity hub for swaps. With support for growing numbers of chains, including Ethereum, Mainnet, Base, Arbitrum, Polygon, ZK-SYNC, Uniswap apps are built for a multi-chain world. Uniswap syncs your transactions across its web interface, mobile apps, and Chrome browser extensions, so you're never tied to one device.
And with self-custody for your funds and MEV protection, Uniswap keeps your crypto secure while you swap anywhere, anytime. Connect your wallet and swap smarter today with the Uniswap web app or download the Uniswap wallet, available now in iOS, Android, and Chrome. Uniswap, the simple, secure way to swap in a multi-chain world. With over $1.5 billion in TVL, the METH protocol is home to METH, the fourth largest ETH liquid staking token.
offering one of the highest APRs among the top 10 LFTs. And now, CMEth takes things even further. This restaked version captures multiple yields across Kerak, Eigenlayer, Symbiotic, and many more, making CMEth the most efficient and most composable LRT solution on the market. Metamorphosis Season 1 dropped $7.7 million in Cook Rewards to METH holders,
Season two is currently ongoing, allowing users to earn staking, restaking, and AVS yields, plus rewards in Cook, METH Protocol's governance token, and more. Don't miss out on the opportunity to stake, restake, and shape the future of METH Protocol with Cook. Participate today at meeth.mantle.xyz. What if the future of Web3 gaming wasn't just a fantasy, but something you could explore today?
Ronin, the blockchain already trusted by millions of players and creators, is opening its doors to a new era of innovation starting February 12th. For players and investors, Ronin is a home to a thriving ecosystem of games, NFTs and live projects like Axie and Pixels. With its permissionless expansion, the platform is about to unleash new opportunities in gaming, DeFi, AI agents and more.
Sign up for the Ronin Wallet now to join 17 million others exploring the ecosystem. And for developers, Ronin is your platform to build, grow and scale. With fast transactions, low fees and proven infrastructure, it's optimized for creativity at scale. Start building on the testnet today and prepare to launch your ideas, whether it's games, meme coins or an entirely new Web3 experience.
Bankless Nation, very excited to introduce Jeffrey Emanuel. He is both an investor and a technologist.
He, however, is a very specific flavor of both of those things. On the tech side, he is deeply informed about the research advances that come out of major AI labs like OpenAI, Meta, Google. And on the investing side, he plays in the markets as a value investor, one who dares to go short at times.
Jeffrey released an article on his blog called The Short Case for Nvidia Stock, which has been echoing across the tech industry as this new deep seek model has fired a shot all the way from China across the bow of the US AI industry and has left the US based AI companies scrambling, reeling both TradFi and crypto markets as everyone learns to digest deep seeks impact upon the world. Jeffrey, welcome to Bankless. Thanks for having me.
uh jeffrey i really enjoyed your article i want to kind of start with the punch line i want to read one of the last paragraphs in your article that i really felt summed up the entire digestion of everyone's analysis on how the new deep seek model has impacted the market so this is actually actually the second to last uh paragraph in your article you wrote perhaps the most devastating to nvidia's moat is deep seek's recent efficiency breakthrough achieving comparable model performance at approximately 1 45th the compute cost
This suggests the entire industry has been massively over-provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain of thought models, the aggregate demand for compute could be significantly lower than current projections assume. The economics here are compelling. When DeepSeq can match GPT-4 level performance while charging 95% less for API calls, it suggests either NVIDIA's customers are burning cash unnecessarily or margins must come down dramatically.
To me, Jeffrey, that was the punchline for, I think, what everyone felt on the markets Monday when Nvidia stock fell 17%. To me, I'm summing this up as there is a tug of war between hardware and software. And with the emergence of DeepSeek, the software side of this tug of war got a very large W. That's my interpretation. That's my analysis. Check me on that. How do you feel about that kind of conclusion? You know, it's funny because...
The deep seek is the part that everybody's the most focused on. But I actually think the whole shirt thesis still works pretty well without that for all the other reasons that we can discuss. And the one issue with the deep seek is that it's funny, there's this thing, Jevons paradox, which is like,
nobody was talking about this until suddenly now everybody's saying Jevons every other word. And, you know, it's something that comes from energy economics, which is like you think you make things more energy efficient, great, we're going to use less energy. But then what ends up happening is that the price of energy goes down and everybody wants to use more energy and so it actually increases demand for energy. And so everyone's saying now that, oh, this deep-seek thing is wrong because of Jevons. And, um,
You know, I am sympathetic to that to a degree, but it's not always so clear. And it's not like the Jevons stuff happens immediately. Like, there's often...
you know, sort of what causes booms and busts is these sort of temporary dislocations between anticipated demand and realized demand. And really, you know, what I think people miss is that the big decisions about CapEx come down to a couple people like, you know, Mark Zuckerberg. And a lot of it is sort of gut feel, like Masayoshi Son, like, is this a good time to just put
push on the accelerator. And I think someone like Zuck has to take a step back and say, listen, I know my guys are really smart, but maybe, you know, the answer is not necessarily to spend another, you know, $3 billion on NVIDIA chips that are very expensive. And, you know, where, I mean, literally like they're paying 40 grand for a GPU that's costing NVIDIA maybe what, 3,500 bucks to make. So it's, they're putting a lot of money in NVIDIA's pocket and,
And maybe they can, you know, pump the brakes just a little bit and then see if they can sort of still, you know, because they projected that they needed a certain amount of chips for their forecasted demand. So if they can, you know, and the DeepSeek stuff is all public, so they can look at the technical report. They can start making these changes themselves internally, theoretically, right?
you know for the at least for the next generation of models they're training and as a result maybe they can kind of you know because it's i think there is still some skepticism on you know wall street that like are are they going to see a return on this money because it's not like anyone's paying to use all this you know meta ai stuff yet and so i think it's um it's a little uh
It's, I'm not convinced that the Jevons, oh yeah, well, Jevons. It's like, okay, well, let's see if that's actually the case. But then, you know, really separately from that, like I was saying, even if you remove DeepSeek entirely, I believe that NVIDIA particular, and I want to clarify, I'm such a bull on it. I'm about as bullish, like 99th percentile on DeepSeek
AI as anyone you will ever meet. I live in the AI future all day, every day. I have three cloud accounts. I'm like, you know, using, I, I, I'm using this stuff nonstop all day, every day. So I, I'm a huge believer, like, but NVIDIA as a company, uh,
They, you know, this is just goes back to my sort of training and investing is that you see you see this over and over again, that with the one exception of a regulatory like enforced monopoly.
you do not have companies just get to print infinite profits without, you know, with, you know, triple digit revenue growth with 90% gross margins. You don't get to see, and without having everyone and their brother trying to figure out a way to beat them. And that's what's happening. And so you look at, you know, these companies, Cerebrus and Grok with a Q, like,
These companies already have extremely compelling hardware that, you know, largely does get around the NVIDIA mode, at least for inference. And, you know, in the case of servers, I think for training too. And, you know, there's all these other sort of... I mean, the other thing is like, you know, normal companies of the scale of NVIDIA tend to have extremely diversified revenue sources, whereas NVIDIA...
All the high margin data center revenue is coming from like, you know, five hyperscalers or something. Like it's very much power law distribution. And I just, it's funny because when I started writing the article, which I started writing because, you know, my friend who's a hedge fund guy asked me about it on Friday and I just started writing about it after, as I was explaining it to him, I realized like, I should just write this up.
And it's funny because it started out as, you know, if I was forced to make the shortcase for NVIDIA, here's what it would be. And by the time I had finished, I was like, shit, this actually is a short just from... Because I wasn't... Like, I knew there was a lot of custom silicon in the works, but it was kind of eye-opening to me that every single hyperscaler customer is literally making their own custom silicon, in some cases for both training and inference. So it's like Amazon...
Microsoft, OpenAI, Meta, it's like they're all doing this and it's like you don't like and as soon as they get this stuff to work. The other thing that's so important to remember is it doesn't necessarily have to be better than Nvidia stuff. Right. Right. Because Nvidia is charging 10x what it costs them. So if you can make it yourself for, you know, 1x what it costs, then you can cut the price by 50% to your end customers. You'll still make a huge margin. Right.
And what matters to you as a hyperscaler is how many requests you can handle to your APIs and stuff per dollar. You don't care if you need more chips. It's fine as long as you don't have to pay these inflated prices for them. And so I think all these – and look, there are other parts of the thesis we can talk about. But I actually think all of that stuff should be just as much of a focal point as the DeepSeek news.
Yeah, maybe to go back and trace over your article, I see your article in two parts. It's the moat of NVIDIA and how it's being unbundled at the margins by the various set of companies, some of which you just mentioned. Some of these moats are the fast GPU interconnect. NVIDIA has had this amazing ability to make their GPUs talk to each other with extreme bandwidths as if they are one big unit, like one big GPU. And that is getting unbundled by
another company that is just making very large GPUs that reduces the need. Well, not GPUs. They're making custom... Like, it's not really a GPU. It's like this weird...
mega chip. Like, I mean, it's funny because like the H100 is considered like an absolute unit when it comes to chip size because it's like this massive freaking package. But then the Cerebrus thing is like they literally took an entire 300 millimeter wafer and just made the entire thing one enormous chip. I mean, these chips are extremely expensive to make. And, but yeah, you don't need to worry about wiring things together if they're all on the same wafer, right? And, yeah,
I actually just want to point out, too, that even NVIDIA didn't make that technology. They bought Mellanox, this Israeli company, the doubles of the size. I think they had 10,000 employees by the time they bought Mellanox for $7 billion. And that brought in like another, you know, about the same. So it was a big...
Really smart thing. I mean, if they hadn't bought that company, they would not be in the dominant position they are today with data center stuff. But yeah, I mean, everyone has been sort of relying on, oh yeah, but what about interconnect? Even if AMD could get their act together and come out with a decent driver and come up with some alternative to CUDA, they don't have the interconnect, so you can't use it for this. And you hear that argument a lot. And I think, well,
You know, you're starting to see on the training side, this company Cerebrus with the wafer scale chip. But then also, you know, the other big news that started before DeepSeek was, you know, the O1 model from OpenAI. And that sort of unlocked this other new scaling law, which is about inference time compute, which is like it used to be almost all the...
you know, processing power was needed on the training side, and then the inference was pretty fast. But nowadays, with these models that do chain of thought, you know, the more they compute at the time you give them a request, the better the answer they can give. And so people are now saying, whoa, so actually most of the compute might be on the inference side. But the inference side is a very different
you know, a computer problem that can be, so if you want to, like right now they use the same GPUs for training and inference. Okay. Like, okay.
Can we just quickly define training and inferences for the layman?
you know, in the weights. And then in the process, you basically, the models learn these like coherent model of the world and how to understand things. Because it's the only way to compress stuff that much without losing all the information is to understand it. Whereas inference is you already have a trained model and now I want to ask it, you know, to write me an essay or to do a logic problem for me. And so the inference is a very different, like you don't need to have
You don't need thousands of GPUs to do it because you've already got the trained model. You just need a couple GPUs maybe and you can get the answers. So just to really trace that over one more time, training is like chat, GPT, open AI, creating their products, creating their models that I go on to chat GPT. And then when I type in a query, I am doing inference. And so it's really there's a weight here of just like a ton of compute up front that
to make the model once, and then hopefully a little amount of compute to run inference on it, which is just the daily requests. And in theory, there's a trade-off here between how much compute you do initially to train the model, and hopefully that just makes all future inferences as efficient as possible. But there's still compute on both sides. It just makes the model smarter. Smarter, yeah. Yeah, and that way you get better answers. But what changed recently, it used to be that
Basically, all the inferences, you know, use this sort of moderately, you know, or a fixed compute budget. But now it's like open-ended. Now, like, you know, O1 is like their flagship model from OpenAI. If you pay $20 a month for ChatGPT+, you can use O1, you know, certain number of requests per week.
If you pay 10 times as much, $200 a month for ChatGPT Pro, which I do and I recommend to anyone who uses this stuff a lot, it's got O1 Pro. It's the same model as regular O1. The only difference is that it takes much longer to respond because while it's doing inference, it's using up far more of these intermediate methods
logic tokens, as it were, this chain of thought, which is sort of like the scratch pad of its internal thinking process. And then it gives you an answer, but the answer is better. Like your code will work the very first time. You won't have any kind of mistakes in your essay or whatever.
Can we go over this one more time? So like it's the same, it's the same Pro and the $200 a month version and the $20 a month version is the same model, but there's this extra step, this extra layer of things happening where the Pro version is running that same model over and over and over again in chunks. And it is able to go back and trace over previous work to like check its work before it actually gives you an output.
- Right. - And you're saying that just because of this-- - It's not an additional layer, it's just like they just do it for longer. They just, it's like, 'cause you basically, it's a dial, you say how much money do I wanna spend generating tokens before I give the final answer?
And with Pro, it's like it would not be economical for them to use the amount of tokens that they use for Pro for the Plus. In fact, Sam Altman said that, you know, it's funny because everyone on Hacker News was like, and in the industry, you know, all these developers were like, $200 a month? Get real. How could that make sense? And Sam Altman came out later and said, believe it or not, we're actually losing money charging $200 a month because people are using it and it just uses like insane amounts of compute. And so it really flips the
you know, the equation in terms of how much compute is being used for inference versus training. And then this isn't really relevant because, like I said, with, you know, the NVIDIA GPUs, you buy an H100 GPU, data center GPU for 40 grand from NVIDIA. You're going to use the same GPU to train the model,
and do inference on it. But this company, Grok with a Q, everyone gets confused because Grok with a K is the Twitter. Not the Twitter Grok. Right, exactly. But Grok with a Q should be better known because this company is really, I mean, they've got unbelievable technology. They basically said, we're not going to try to solve training at all. We only care about inference. And so if you want to optimize the entire stack of
For inference only, how might you approach that? And the result of that is that they can do inference from, you know, like a standard model, like the LAMA 3.3 70 billion, which is like until the DeepSea came out, it was the sort of leading edge open source model, right?
And, you know, if you get a fancy desktop computer with one, let's say, NVIDIA 4090 GPU, which you can get for under $1,000 now, you could get, I don't know, maybe 40 tokens per second, which is actually like good enough that you could use that as your sort of home version of ChatGPT that works pretty well.
But when you try it on Grok, and anyone can try this for free, you just sign up with your Google account, and you can do inference from this model, and it's like insane. It's like instead of like 40 or 50 tokens per second, it's like 1,500 per second. And so you click the thing, and you just...
boom, there's your answer. And it's like, whoa, that's pretty interesting. And so even though the Grok hardware costs like millions of dollars for one server, if you have enough demand that you can just keep it busy all the time, it's actually much cheaper to use. And most importantly, you're not giving your money to NVIDIA, you're giving it to Grok, you know? So it's just an example of how people manage to, you know, like if you're trying to assault like a castle that has a big moat,
Instead of trying to cross the moat and get shot up by arrows, why don't you like dig a hole under the moat or do, you know, the catapult to go over it. You find creative ways to get around it. And that's what's happening is everybody's been focused on, well, a frontal assault's not going to work. And it's like, okay, but there are other ways to seize your castle. And that's what you're seeing is that all the ingenuity of the market of like, because the...
And the reason is because the prize is so big that if you can, you know, you too can make your company worth a trillion dollars if you can take a big piece of this pie. Whereas that was not true in 2016. It was like a backwater, you know? And so it just, the wheels take a lot of time. Like if you want to make your own custom, even if you're Amazon with infinite money to spend, if you want to make your own chips, you
you know, what do you know about making silicon? First, you have to, you know, poach or hire the really brilliant people. And then it's going to take them probably two or three years to design a really good chip. And then you're going to have to try to, you know, come with giant sacks of cash to TSMC and try to convince them to give you like
volumes at their fabs because they're like already just being you know inundated with money from nvidia and apple and stuff and uh and it takes a while to get ramped up but eventually the chips start coming out and you know the irony of it is like again it's like even though you know none of these custom silicon chips are going to be as good as the nvidia chips the sort of uh
way they're made is pretty similar in that they're all both going to be using TSMC as the fab, and they're both using the same machines from this Dutch company, ASML, that actually does the lithography. So it's like, yeah, they won't have the same brilliant design maybe. But again, that's the thing people miss. It doesn't need to be as good. It could be one-fifth as good, and it still makes sense for Amazon to use it because they don't have to pay 90% gross margin to NVIDIA.
Because Nvidia has the luxury of having very high margins. And what that creates is like, well, if your product is 90% as good, but you only take 10% of the margins, then all of a sudden you're solving a lot of market problems. And I'm saying when your margins are so high, just to put things into perspective,
Companies that sell chips, like in the semiconductor industry, it's generally not such a great industry. It's very subject to boom and bust cycles of overcapacity. And so if you look at another area, like memory, DRAM, which everyone has it in their phones and their computers,
You know, you might think on the surface that this should be like this great business because there's only basically three companies in the world that do it. It's like Micron, Samsung, and SK Hynix. I mean, there used to be like 15 memory companies, but they all like either went bust or merged. And so you would think it would be this oligopolistic thing with great pricing and margins. But if you look at the history of it over the last 10, 15 years, it's like it's very cyclical.
And at the very peak, when supply-demand mismatch is really out of whack and they can charge really high prices, they make like a 60% gross margin margin.
And then, but if you take the average over the cycle, it's closer to like 20%. And at the bottom of the cycle, gross margins actually turn negative. Negative, right, right, right. And so then you look at NVIDIA and you're like, you have a 90 plus percent gross margin on data. Their overall gross margin is more like 75% because they make much lower margins on the consumer side.
stuff like for playing video games and that's because they have competition from AMD you know like that's what happens in a competitive market and so but my point is that when your margins are that high it doesn't need to be 90% as good it could be literally like 40% as good and it still is a no-brainer for Amazon to switch as many loads over to their own thing because it's like
you know, it's like when you buy like a handbag from Hermes for, you know, 40 grand, how much do you think it costs them to make? Even though it's made by hand by some French guy, like, you know, it's probably only like, it's like two or 3000 bucks tops. And then they're charging you $40,000 for the, and it's like very similar margins for the, the GPUs from Nvidia. And so it's like, you don't, what matters is like,
And the users don't care. They're submitting requests. They want to use a model, Lama 3.370, but they don't care if an NVIDIA card is doing the inference on it. And so Amazon is going to, you know, Amazon made their own CPUs called like Graviton,
And they are very aggressive with the pricing of that to try to switch people over from if you normally use like an Intel or AMD CPU, try using one of our things and you'll save a lot of money. And you're going to see that where they're going to try to push people over to their product by making it more, you know, they're going to basically split
the savings, you know, with the customers. And I think that, and so all that stuff, you know, it's like death by a million cuts, like the combination of the competition from these different areas. And then of course, it's like A&D does compete with them effectively in consumer stuff, but they've been completely absent in this whole data center AI stuff, which is
It's just crazy. I mean, they're going to be writing business school case studies about how they squandered trillion-dollar opportunities. You can't get too mad at them because they also managed to kill Intel. Right, at the same time. It's not like they're not good, too. And it's so funny because Lisa Su, the CEO of AMD, is like first cousins with Jensen Huang from NVIDIA. I did not know that. Yeah, which is just like, how good are these genes in this family? So, yeah, I mean...
But if they can get their act together, and it's so funny because it's like they're so out of it. Like, I just don't understand it. But like, there are literally like people like George Hopps, the guy who's famous for jailbreaking the iPhone and all this stuff. He's like literally by himself without any help from them writing his own stack of that's like, you know, we'll be able to make these GPUs usable for doing at least anything.
you know, some training and inference. And so you might see even AMD coming up as a real competitor. And yeah. Yeah. So, yeah, going back to tracing over like the broad strokes of your article, I kind of break it out into two parts, two halves. There is the understanding
of NVIDIA's moat in the hardware side of things via hardware competitors, as you've kind of just traced over. But then also the DeepSeek side of things is a rebalancing of the value of software and in algorithm design, maybe is one way to put it. Maybe you can take us to the second half of that equation where how did DeepSeek really impact people's understanding of the value of software and its impact on the value of hardware? Well, it's
So, you know, when you say, like, what is the software side of the thesis, it's not, it actually has very little to do with DeepSeek. What it has to do with is one of the sort of biggest source of NVIDIA's mode has been, because, you know, AMD has quite reasonably good, you know, chips. So the reason is that NVIDIA basically,
was a very forward thinking. And when they noticed that like, um, this deep learning stuff was really taking off and like back in like 2012, um,
And they really figured out that we need to make it easy to use our chips for this sort of thing. And so they have this system called CUDA, which, because you have to understand, like, these GPUs are insanely complicated. I mean, in the old days, you'd have one CPU with one core. Now CPUs are pretty complicated. Like, I have a CPU and my computer has 32 cores. But these NVIDIA GPUs are like, they have like thousands of cores. Right.
Right, that's their whole deal. They have the loss of cores. And so it's like very, like if you were to try to like write code naively to like take your problem and break it up and send it to thousands of cores and reassemble it, like no one can do that, you know, basically. Mm-hmm.
And so instead, you've described a problem using these much more abstract, high-level concepts, and then CUDA turns that into hyper-optimized code that runs really, really well on NVIDIA GPUs, but not on anywhere else. CUDA is a NVIDIA-built software package to allow developers to use NVIDIA GPUs to their best degree possible. Yeah, without being like...
Without being, yeah. Einstein. Like, they can be very smart, but, I mean, it's... It's kind of like a driver. Is it a... No, it's like a framework for... A framework, okay. Yeah, the driver is a sort of separate layer of... But it allows the power of NVIDIA GPUs to be expressed to more people without them having to be highly trained. Yeah, it's like the difference between writing code in Python versus writing code in, like, Assembler, which is the lowest level...
And then it's actually, so CUDA is even, most people actually don't even write CUDA directly. Most people use machine learning frameworks like, you know, it used to be TensorFlow, but it's been sort of totally replaced by something called PyTorch, which is sponsored by Meta.
And so that's what most researchers use is PyTorch, which lets them think like in terms of the math. And, you know, as a researcher, say, oh, I have this loss function. I have this optimizer. And everything's like modular and plug and play. And then you write high level Python code, which is like very, very high level. And then internally, PyTorch can then run that on CUDA.
and then run it on a GPU from NVIDIA very, very, very officially. But if you have an AMD driver, AMD GPU, it's not as easy to have your stuff run really, really fast writing using like PyTorch and stuff. And so, and a lot of people were saying that it doesn't matter what anyone else does in terms of chips. If they don't have CUDA, you know, it's game over. And there's like a two, I think, big, uh,
assaults on that, which is that you're seeing the rise of these sort of even more high-level frameworks for expressing highly parallelized programming
And so you have this one, MLX is one. There's another one called Triton. And these are gaining, you know, momentum. And then for that, it's like CUDA is just one. You can write your stuff in MLX and then basically run it on an NVIDIA GPU really, really fast. But you could also make...
another, you know, compilation target of MLX that could run on a completely different chip, like the one, you know, that Amazon is making internally, you know, Tranium chip. And so...
And it's also very high-level language. So maybe it makes, you know, instead of writing and, you know, targeting CUDA, maybe you should target MLX or Triton. And then you can also get running on using CUDA, but you could also run it using these other things. And then you're not locked into using the really expensive NVIDIA chips. So that's one assault. And then the other one, I think, is this idea that... And I haven't heard a lot of people talk about this, but one thing I'll tell... Like, I use...
all the time for programming and they're just stunningly good at that now. But what they're really, really good at is if you already have a working prototype of code in Python or JavaScript or whatever,
So it can really understand what it is you're trying to do. They're unbelievably good at porting that to another language. So if you have this Python algorithm and you want to turn it into like Rust or Golang, they do that unbelievably well. Like maybe not on the first shot, but, you know, with a couple iterations, you can get it all working. And so what that made me realize is that, you know,
Because the part with the CUDA thing, it's become a lingua franca. Everyone who's good at this kind of programming knows it. And so they think in terms of CUDA concepts. It's just the fastest way for them to express these algorithms.
And so I was thinking that, like, they could write their code in CUDA like they normally do, but then instead of using it on NVIDIA GPU, they could use it almost as like what is called a specification language, where it's just for documenting the algorithm in a very efficient, elegant way. And then they could feed that into an LLM and say, all right, now port this into this other framework, which will work really well with...
you know, AMD GPUs or with, you know, Cerberus or something. And I think you really, you really explained this well in the article when you illustrated the, there's like a job market for CUDA engineers and it's insular to the rest of like, you know, engineering jobs, engineers out there. So if you're a CUDA engineer, it's like, there's this own independent, like vertical of job markets and like the cost for these engineers is
And the way that you illustrate it in the article is like, well, those walls break down. And all of a sudden, there's just like not really the same monopoly around CUDA. Well, no, it's not that it's... I think they'll still use CUDA, but the question is like, can they use CUDA but then not use an NVIDIA GPU? Which is where the moat comes from and where NVIDIA gets at least part of its value from. Yeah. And now you did bring up a point about like, so the DeepSeek...
in a sense, is software because by writing smarter training software, they did reduce the demand. But I'd say that's sort of separate. That's like kind of orthogonal, if you will, to this other stuff, which again, it's like, so if you took away the deep seek part of it, I mean, you can see the big threats to the mode, software and hardware, but
How is this? Now, let me just say, I just saw, right before we started talking, somebody said, here's why, you know, my thesis is all wrong. And they're saying that, well, the problem is that TSMC, which is Taiwan Semiconductor, which builds all these chips, and they're basically the only ones that can do, I mean, not the only ones, because like Samsung can also make pretty good chips, but like,
For the most, yeah, they make all the NVIDIA stuff and most of the Apple stuff. And, you know, but by the way, I want to point out,
Again, it's like, yes, it would be best to do something in a four nanometer process node, which is the smallest you can do. But, you know, you could use like a bigger, like an older process node and your chips won't be as fast and they won't be as energy efficient. But you've got a lot of room, wiggle room, because you just you don't need it to be as good. You just need to be cheap.
But so anyway, but the objection to my thesis is that these guys are book solid. Even if you came to them with like, you know, giant bags of money, they're book solid. And the reason is because... The manufacturer is book solid. They're backed up. They have too much orders. Yeah, for the next couple of years, they don't care how much money you give them because they're all book solid and they can't, you know, can't just instantly make a new... Although I will say like, you know,
Taiwan, Semi built a fab in Arizona and there was all this, you know, stuff about, oh, it's taking them so long and they can't hire good people. But you know what? They finally did get it all up and running and they could literally, if there was enough money in there to do it, they could just copy paste the blueprints, get another big chunk of land and like,
just replicate what they just did again and they could do that. Like, and it wouldn't take like that. It wouldn't take that long. And so in any case, but so, so that's the objections that, so even if everything I said is true, these companies, Cerberus and Grok and the hyperscalers like Amazon and, you know, Google and blah, blah, blah,
that they won't even be able to make these chips in enough volumes that it's going to dent NVIDIA. And my response to that is like, okay, your analysis is essentially conceding that this is a highly sort of transitory approach
circumstance here that like they're just very temporarily going to have this advantage. And then as soon as the additional capacity comes online or opens up, then there's going to be this massive flood of alternative supply, which is going to pressure market share,
Potentially, you know, if the, even if the pie grows, the market share is going to go down. But most importantly, it's like, it doesn't take, you know, something, there's some stuff that has nothing to do with technology. That's just basic, you know, economic, industrial finance kind of thinking about how do markets work. And the difference between having basically a monopoly and having even one or two competitors is like,
The margins really can fall quickly because it's like, you know, if you have two office buildings that are like, you know, 98% occupied, nobody's, you know, it's a race to the bottom to try to cut your rents. But like if both of them start losing tenants, you know,
And, you know, every day that goes by and this floor is empty is just they're losing money. And so there's this race to the bottom where they just and there's this critical threshold where, you know, once, let's say, the occupancy rate in a market for office, you know, dips below, let's say, 80%.
The rents, it's very nonlinear. You know, like if occupancy falls another 5%, rents are going to fall a hell of a lot more than 5% to make the market clear. And I think you'll see that the margins can fall very, very quickly once they're the real competitors. And then the question is, okay, again, this is not about technology. This is about how do you rationally value a stock? And I mean...
One of my favorite, I mentioned in my piece that, you know, I once won a prize from this Value Investor Club website for a short idea. This was like more than 10 years ago, but I'll quickly tell you the story of it because I think it's so relevant here, which is that this was a company called Petro Logistics, PDH was the ticker. And they were a company that just had a single plant that took propane and turned it into propylene, which is a...
through this like random, you know, it's basically because the shale play happened, all this like, I don't have to get into all the details. Suffice it to say, they were earning unbelievably high spread, much, much higher than historical or like when they built the plant, what they ever expected to earn. They're earning so much that their profit in one year from running this plant was like 80% of the cost of building a new plant. And it's not like,
rocket science to build one of these plants. You can just go to a big construction company like Bechtel and say, I want a conversion plant for propane to propylene. And they have off-the-shelf blueprints. They'll make it for you, guaranteed, in a couple of years. And sure enough, this company was earning these high returns and people were putting a big multiple on the earnings because they're like, look at this, the earnings have gone up so much.
But you could tell that all these other plants were already under construction, and you actually knew approximately when those plants would come online. And so you could basically figure out, all right, even if I grant you that they're going to continue earning these massive margins...
It's going to start stepping down in like a year. And then in 18 months, it's really going to step down. And in 24 months, it's going to be right back to the... So if I want to value this as, let's say, what is the present value of the future cash flows discounted because of the time value of money, I can do that. I can say big, big profits this year, a little bit less profits next year. And then after that, normal profits.
and add up the discounted cash flows, and you realize, like, you can't put a big multiple on earnings that are not sustainable. And right now, like, so if you tell me that, oh, well, but you're wrong because NVIDIA is going to keep earning these huge profits for the next two or three years, it's like, dude, you're putting like a 30, 40x multiple on that. That's essentially implying that it's going to sustain at this rate
like indefinitely. And that's just not how, you know, you should think about this value of a stock. And so it's really...
This is why I want to just say like, and a lot of the Jevons stuff, it's like, yeah, I am bullish on the aggregate. Like the amount of total demand for inference is going to skyrocket. The pie is going to grow. That's totally separate question from will NVIDIA be able to continue growing revenues, triple digit percentage year over year at NVIDIA?
These insanely high margins. That's a completely separate bit. And you need to answer that question if you want to feel comfortable putting such a high multiple on that earning stream. You have to know that it's going to sustain, and it seems actually quite likely that it won't sustain.
I do want to dive headfirst into the deep seek efficiency gains part of this conversation, because I think that's kind of where we should go next. One thing that you wrote in your article, you said the sum total of all of these innovations, these are innovations referring to the lab that made deep seek.
when layered together has led to the 45x efficiency improvement numbers that have been tossed around online. And I am perfectly willing to believe that these are in the right ballpark. Maybe you can just explain the significance of this new chat GPT-like model, DeepSeek,
and how it got to be 45X more efficient and what that efficiency, what 45X efficiency means when it comes to the industries that are the supply chain to create the usage of these models. Sure. So look, I mean...
I mean, it's funny, like in the West, it's like we have this sort of resource curse, you know, almost of like, we have too much money. It's easier almost to just throw money at a problem than to try to be like really clever.
And so, you know, the joke or the sort of parallel I make is like when you look at people's houses in Saudi Arabia, they're not very energy efficient. And that's because they get subsidized power because they have unlimited energy there. Energy's free, yeah. And so there's no point in wasting all this extra construction cost on double-plane glass and blah, blah, blah. And it's a similar thing at like Meta and Google. And they just have so much operating cash flow hitting, you know, every quarter. They're like, fuck it, let's just...
let's just hire more... Money's not an object? Yeah, yeah. Let's pay our people $5 million a year and let's... Or whatever, a million a year and let's just, you know, send over to Jensen another $3 billion. And whereas, you know, this...
China, it's, they're not getting paid that much, that's for sure. And, you know, then they do have these export controls. Now, I know a lot of people say, oh, they don't, they're smuggling them in through Singapore. I'm sure that's happening, but like... Smuggling chips. Yeah, because they, first of all, under Biden, they made it, they have basically have like a slightly crippled version of...
of the NVIDIA GPU just for the China market or export market that's not as good as the H100, but...
But then also, I mean, what people point to, which I think makes a lot of sense, is like something between 15 and 20 percent of NVIDIA's revenue comes from the tiny nation state of Singapore. It's like, really? They're using that many GPUs there? And it's like, because everyone knows that they're somehow getting laundered and smuggled into China. And so the question is, we don't even know how many NVIDIA GPUs are in China. Yeah.
And so we don't really know how many DeepSeek used. But the point is, they don't have as many as we do, and it's not as easy for them to get them. And so they have to... Maybe the punchline you're making is like that Tony Stark Iron Man meme of Tony Stark was able to build this in a cave. Exactly. And that's China. They don't have an abundance of capital. They don't have an abundance of chips. They have some chips. They have plenty of capital, but they don't have the ability to... And by the way, they're quickly...
that's a whole other story, but like they hired, like, you know, they poached some of the smartest guys at, uh, TSMC to make their national champion. Um,
and, um, SMIC or whatever it's called. And, uh, they're obviously not there yet, but like they made a pretty good Huawei, um, CPU. And, uh, I wouldn't be surprised if, I mean, that's the other giant, you know, wild card that nobody's really taking into account. Like don't count out. They got some of the smartest people from Taiwan Semi over there. And it's like, they'll buy the machines from ASML too. And, you know, so, but anyway, um,
What I wanted to say is that, you know, their engineers are, A, necessity's mother of invention. But also, you know, in the West, we tend to have this sort of bifurcation in the market where you're either in the AI, like, research track, in which case you have a PhD and you've written these papers and you're, like, a guy who, like, does stuff on the whiteboard or whatever. Right.
And often these people are not very good engineers. Like there's a joke that these researchers are actually horrible at programming. They're good at math, horrible at writing optimized code. It's obviously not universally true. There are some people who are great at both. But so what happens is usually the researchers think at this high level and then they make like a prototype and then they hand it over to these people who are more engineers, like high performance optimization guys, engineers.
people like John Carmack or Jeff Dean at Google who, you know, they're not going to invent the new optimizer or like, you know, some new loss function for AI models. But if you give them an algorithm, they know how to make it run really fast, you know, on a computer. And so it's sort of like they, the way we do it is this sort of two-step process in the West where
The researchers design the thing and prototype it, hand it off to the engineering department that says, all right, we have this algorithm. How can we make it go fast? The deep-sea guys are unbelievable at books. So it's like instead of...
having it be two teams working one and the other. It's like they kind of inverted it and they started out with, let's start out first with how can we saturate every ounce of performance on these GPUs so that nothing is wasted? Because it's like it almost doesn't matter how fast the GPU can calculate. If it's waiting to get data that it needs to do the calculation, then it's just sitting there idle, okay? And there's a lot of this interconnect, right? There's a lot of talking to each other, right?
And so normally you have to dedicate a big chunk of your processing power just to handling that communication overhead. So they did a lot of really clever work with making the communication stuff as efficient as possible so there's very little overhead. So they basically started with, rather than say, how do I make this algorithm go fast? They said, how can I make a really, really fast algorithm that'll
that'll really run these GPUs as much as I can, and then sort of design a smart training system based on that. So they sort of inverted things. And so there's just this collection of sort of optimization tricks. And by the way, I want to point out, like many of these ideas were not invented by them. Many of them were actually published
by American and other researchers like Noam Shazier, who just got rehired by Google for a zillion dollars. They bought his startup just to get him because he's like that smart. I mean, but it's implementing them in a clever way. And so I'll just give you just a couple examples of this. It's like, so, you know, the...
All this whole ChachiBT thing really exploded because there is this model design called the Transformer, which came out in 2017. This is probably the most cited paper in history now. It's called Attention is All You Need. And this kind of combined a sort of regular neural nets that
we've been using for a while with something called like the attention mechanism, which is this very clever way of like kind of contextualizing the information so that like instead of always processing it the same way, it depends on its context and you sort of automatically learn how to think about that context and storing information
All that data while you're training is like one of the major things that use up memory. And the memory, it's very important because you can't use like the system memory on a computer. You have to do everything on the, what's called the VRAM, the memory that's very fast memory on the GPU itself. Right.
And that's pretty limited. And so if you can save on the amount of memory you're using, that's huge because not only can you do more with fewer GPUs, but you're also not transferring as much data because it's just smaller. And so anyway, there's something called these KV, key value caches and indices that you need to keep in memory while you're training a transformer model.
And they came up with this incredibly smart, I mean, this is like probably the coolest thing in the whole paper, the DeepSeek v3 technical paper, is that they realized that, you know, really it's very wasteful how it's done normally, that you're storing way more data than you need to, that the sort of only like some very small subset of that data actually is meaningful, right?
And in fact, by storing more than you need to, you're almost like overfitting to like noise and it's not necessary. And so they... Maybe a simple way to explain this for listeners who want some extra help with that is it's just maybe closer to how your brain works with attention, where when you're applying attention somewhere, you're not thinking about every single thing under the sun all at once. You're kind of focusing on what's necessary and...
You can't go too far with the anthropomorphizing. Attention in this context means a very specific...
And it's not I don't think it's going to help people. Maybe I can't remember if I heard this in your article, maybe a different one. But it's like if a house has, you know, 20 different rooms and lights are on in every single room, even though a person is only in one room. And this new model only keeps the lights on for the specific room that the person is in at that one given time. It's some some loose, broad stroke concept.
pattern like that. Sort of. I mean, it's basically like instead of just naively storing this massive amount of like key value data that shows you like, it's basically like if you have like the word job, it's very different if you say nice job versus I just got a new job or, you know, like, you know, are you going to be able to handle that job for me? Like,
And it's like knowing, so like the word job has like a certain representation in the model, but that representation has to be altered depending on its context. That's what brought attention to this. And that means that for every word, every token, you have to really have to store lots of different things depending on the context. And that's why it takes up so much memory. And they're able to store that in a very efficient way
using, you know, basically like just storing this sort of subset of the data in a compressed representation. So that's one thing they did that saved a lot of things. Another thing they did that's very smart is what's called like multi-token predictions. So like usually these models, they predict the next token, the next word basically, based on the preceding tokens or words. And, you know, one at a time. And so it kind of is this bottleneck. And they're like,
Well, what if we tried to do, let's say, two or three at a time? Now, the problem with that is you can't really predict the second token without knowing what the next token is. And so how can you start with the second token until you know the first token? But you can do what's called a speculative decoding. But your speculative decoding might be wrong, in which case you wasted your time computing that second token. But
But what they did is they got very good at guessing what that second one would be, such that 95% of the time they get it right. And so they basically, just from that, you can sort of double your throughput on inference because, and by the way, that's part of the reason why they're able to charge so little for their API is because that's about inference cost. And so they said that one trick let them almost double throughput
for no additional cost, but just by... So that's a very clever trick they did. And then they did another very clever trick with just the... You know, these models are basically just a gigantic list of numbers, if you will, called the parameters of the model. And they figured out a way to store those parameters in a much more kind of compressed form...
And like kind of normally the way these models are trained is they use more precision. You can think of it almost as like more decimal places of accuracy, but that's not actually how it works. But it's just sort of close enough to understand conceptually. And then often what they do is once they then train the model that way to make it so that it can run on a cheaper GPU, they do what's called quantization, where they sort of then kind of truncate and round off the numbers a little bit.
But that does hurt the accuracy, or not the quality, the intelligence of the model. And what the DeepSeat guys did is they managed to, instead of having to train at a higher precision and then quantize to a lower precision at the end, they managed to figure out how to mostly do the entire process end-to-end using the smaller representation. And again, it's one of these things where efficiency gains are...
they pay for themselves so many times because it's like, not only do you use less memory, but the calculations go faster. And, you know, and then you don't need to do as much inter-GPU communication because there's less data. And so it's like, it's these efficiency gains paid off in multiple different ways. And so that's another thing they did. I mean, there's just this whole laundry list of like little calculations
tricks and optimizations they did that when you add them all together and they're not additive, right? They're multiplicative. Like each thing, you know, if it double, if this thing doubles it and this one increases it by 40% and this one doubles it also, you're multiplying those multipliers, if you will. And that's how you can get this very big number, like 45 times, which by the way, we don't really know, you know, we don't know exactly
For sure. They could have lied about the number of GPU hours they use. You know, one thing is clear, though, that they are charging 95% less for inference. So either they're losing money on that or they really can do at least the inference part much cheaper than, you know, we can here in the West because, yeah.
That 95% less money for inference, I think, is really the sticker, the shocking number that is sending companies like Meta and OpenAI reeling. Like Sam Altman had to put out a tweet. No, actually, Meta was up, I think, because it's actually, look, on the one hand, it's bad for Meta in that they have spent so many billions of dollars on GPUs and they're paying so much money.
to their team to like come up with the llama models and stuff like that. And then it sort of does make them look a little like foolish when these guys are able to beat them at their own game for, you know, on a shoestring. But at the same time, what they really care about is how much does it cost them to serve AI to all of their billions of users around the world. And so it's actually good for them if they can cut their costs 95%. That's great. Who is bad for is open AI and anthropic.
because it's going to put more pressure on their pricing. Right now, OpenAI charges a fortune for the O1 model API. And even GPT-4.0 is much more expensive. And so they're going to probably have to respond by cutting their API prices significantly.
Which is their profit. That's where they get their profit from, right? Well, they don't actually have profits. That's where they get their revenue from. Both companies are deeply unprofitable at the consolidated level. And I actually suspect even at the sort of incremental marginal level, they're not all that profitable because they're prioritizing revenue growth above sort of everything else. I don't think it's a case where they lose money on every unit.
you know, sold on the margin. You know, any fast-growing company is going to post consolidated losses just because they're always spending on growth and the new models. So the real question is, like, what if OpenAI and Anthropic completely stopped trying to do R&D and making the new model and just try to milk the business for money, what they have now? Would they be able to eke out a profit? And I think, yeah, the answer is probably yes.
But if they have to cut their pricing by 80%, then it's very unclear. So that's where it starts to be pretty relevant. The Arbitrum portal is your one-stop hub to entering the Ethereum ecosystem. With over 800 apps, Arbitrum offers something for everyone.
Dive into the epicenter of DeFi, where advanced trading, lending, and staking platforms are redefining how we interact with money. Explore Arbitrum's rapidly growing gaming hub, from immersed role-playing games, fast-paced fantasy MMOs, to casual luck battle mobile games.
Move assets effortlessly between chains and access the ecosystem with ease via Arbitrum's expansive network of bridges and onrifts. Step into Arbitrum's flourishing NFT and creator space where artists, collectors, and social converge and support your favorite streamers all up on chain. Find new and trending apps and learn how to earn rewards across the Arbitrum ecosystem with limited time campaigns from your favorite projects. Empower your future with Arbitrum. Visit portal.arbitrum.io to find out what's next on your Web3 journey.
And
As the home of the stablecoins, Celo hosts 13 native stablecoins across seven different currencies, including native USDT on Opera MiniPay and with over 4 million users in Africa alone. In November, stablecoin volumes hit $6.8 billion, made for seamless on-chain FX trading. Plus, users can pay gas with ERC-20 tokens like USDT and USDC and send crypto to phone numbers in seconds. But why should you care about Celo's transition to a Layer 2? Layer 2's Unify Ethereum, L1's
So, Jeffrey, I just want to kind of zoom out and summate everything. We have this new model, this deep seek model, which is 45 times more efficient than the
you know, ChatGPT or other competitive models. That's caused a repricing in NVIDIA because people think like, oh, wow, 45 times more efficient. We just need much less hardware in order to make that outcome happen. It's just we're getting more from less hardware. And so maybe we've been overpricing a hardware. And that's what has shocked the market with a repricing of NVIDIA. And then also now OpenAI, Sam Altman are getting a squeeze because DeepSeek is charging 95% less money for inference requests
But my broad question to you is like...
Well, isn't this the expected outcome? Like AI and AI technology is on a very steep curve and we're seeing, you know, breakthrough efficiency gains across the complete tech stack, whether it's hardware or the models. We've always known like AI is going to accelerate very, very quickly. And isn't this just what this looks like? Isn't this kind of the expected outcome here? Like, of course, we're going to get more efficient. That's how technology works. Like, why is everyone surprised?
I mean, it's clearly not the expected outcome because the stock wouldn't have moved so much. I mean, it was the expected outcome for me, which is why I wrote my article. And I think the answer is that everyone does expect progress. Progress on the hardware front, that every year the chips are going to get faster and bigger. Progress on the algorithmic front, that you're going to come
come up with a better way to train the models or do inference that's going to make things faster. I mean, when these LMs first really came out a couple years ago, they had a much more limited context window, like the amount of text you could put into them. That has gone up dramatically. And originally, everyone thought that that was going to be really hard to make that be able to go up because they thought it was going to
dramatically increase the amount of memory. But people came up with really brilliant inventions to, you know, new algorithms to make it faster. And so people do expect some level of algorithmic improvement, some level of hardware improvement every year. But they expect it to be, you know, like a Moore's Law type progression where it's somewhat predictable. And what really catches people off guard are step
function changes where overnight it goes. Now, so like if, if the news was that they tripled efficiency, you know, that would be, you know, I mean, can you imagine like if you made an air conditioner that was three times more energy efficient, like you'd crush them. You would get huge market share tripling something. It's like in any normal market,
You know, if you had something with triple the mileage for your car, like that would do great. But like we've become so sort of used to it in technology that it's like, you know, but 45 times, okay, now we're talking here. That's really crazy. And so when that happens overnight in a way that people didn't anticipate, that's when you get the sort of shocking thing. And, you know, the thing is like,
you know, there's this expression of being priced to perfection. Like the NVIDIA's share price, it only looked reasonable to people who extrapolated these curves out. And like, you have to be very careful when you extrapolate revenue growth that has been going at 120% year over year. And again, it's not just revenues, it's about the margins. And they were basically saying that the margins would maintain growth
And the revenues would keep growing at this incredible rate. And as a result, yeah, sure, this is... And that's why every single investment bank, basically, had a strong buy on NVIDIA. All of them. They all got caught completely off sides with this thing. They were all scrambling, honestly, to read my article. And they're like...
you know, I got inbound requests from some investment banks to like help because nobody even wants to talk to their analysts about this. They want to talk to experts. And so they're scrambling to find experts. Not that I'm even an expert, but compared to the equity analysts at the sales side, apparently I am. And so it was not
expected at all. Like that, that would happen like step function change like this. And that's what just is like the, the body blow to the stock is that, Hey, this thing was pricing in, you know, clear skies. And then it's like, all of a sudden it's like, Oh, there actually are these threats. And then again, it's not just the deep seek. It's like the people were ignoring a lot of these other threats. And I don't know why, because it's,
They're literally, these are people where their full-time job is to cover NVIDIA for Goldman Sachs and Morgan Stanley. And I don't know what the hell they're doing that they didn't, you know, how come they weren't talking about, you know, the competitive threats to CUDA or like, you know, Cerebrus and Grok. And maybe they mentioned it, but they certainly didn't figure out that this actually is going to be really important.
With the step function change, it's not just a step function improvement because it's also a step function improvement in a slightly different direction than what the market was thinking, correct? It's not we aren't just skipping ahead on Moore's Law. We're also going in a different direction. Well, it's additive to everything else. It's like you are going to have faster chips next year. You are going to have more chips next year. You know, like you are going to have other algorithmic improvements on the margin. But on top of that now,
Every big AI lab in the world is going to be, you know, the Lama team at Meta, the Anthropic guys. You better believe Zuck has brought these guys into his office and said...
We need to use every one of the tricks these guys are using for Lama 4. Yeah, so like as a consumer of AI products, if you're not exposed to, you know, NVIDIA, if you don't have open AI equity, private equity, if you are just a consumer, you're stoked. Oh, God, this is the greatest thing ever. The products coming down the pipeline are going to be sick in a very short order. Oh, and not only just that, but from a standpoint of like...
You'll be able to run this shit on your own computer. You get a $1,000 Mac laptop, you're going to be able to have AGI on your computer, out on tap, privately, and it's the most miraculous thing ever. No one would have believed this even a few years ago. Is that why Apple is up on the week? Because I think I saw Apple being up 3% or 4% when NVIDIA was down almost 20%. Apple is one of the guys that actually... It's so funny because...
Amazon and Microsoft and OpenAI, they're all like trumpeting in these big press releases about their custom chips that they're making. And, you know, Apple's so different. Apple's like so secretive. And it's like, but you know, they have like one of the best silicon teams in the world. But,
They only announce something if it's like they're ready to sell it to consumers. If they're making chips internally for their own uses, like no one even freaking knows about it. And all the people who do know about it are like signed up with NDAs like and no one talks about it. And for all we know, they have pretty fucking awesome chips already. And so, but they're essentially like users of NDAs.
AI, you know, so it's good for them. It means that they'll be able to use some of these tricks to make some of these models. In fact, there's an app you can get, I think it's called Apollo, on the App Store that lets you download these models.
And if you have like an iPhone 16 Pro or something, you can just run this thing and you could be on an airplane or whatever with no internet in a bunker somewhere and have essentially, you know, not quite AGI on tap, but like, you know, certainly like smarter than most college students on a lot of topics.
And it's wild to see it go. You know, you could go into airplane mode and be like asking it all these questions about, you know, chemistry and physics and history. And it'll give you really good responses at a reasonable pace. And so, yeah, it's good for Apple. It's good for Apple. I think it's ultimately good for Meta, which is why Meta's stock wasn't down. Right. Yeah.
You know, so it's not a bad thing. It's just bad so far as... Yeah, it's a recalibration. But, you know, I do think it was excessive in one day that the whole, you know, $2 trillion of capital wiped out. But it's... I'm not...
I'm not saying that you should be buying the dip in NVIDIA, though, because I think it did get ahead of itself and it could still look, it could fall to two trillion and, you know, two trillion is still a lot of money. OK, like this is a company that earned, you know, like five billion dollars, like, you know, a few years ago. Like, so that's still, you know, quite a quite a big valuation. Yeah.
Jeffrey, there's one last conversation before I let you go is the conversation of synthetic data. And this, I think, comes from just having stronger and better models creates this notion of synthetic data. And this is also part of the equation of the rebalancing of how people value things historically.
Can you just walk us through this synthetic data conversation? What is synthetic data? What do different and stronger models have to do with synthetic data? And what does it mean for the overall supply chain of AI? Well, I'm not so sure that it... I mean, I think it's an important concept. I'm not sure how much it applies to sort of those things. What it is is that when you're training these models, the pre-training that actually makes the model smart...
It's partly a function of how much compute you apply, you know, how many GPUs and how fast they are. But it's also the amount and quality of the data that you're training on. It's like, you know, when DeepSeek says we use 15 trillion tokens in our training set, that's what they're talking about. And the thing is, it's like there's only so much data that's
of high enough quality that you'd even want to use it to train a model out there. Like if you take all of Wikipedia, I don't know how many tokens that is, but it's like not that many, you know, it's like, it's like less than, you know, it's measured in the low single digit billions, not even close actually. Oh, no, sorry, maybe billions. Yeah. But it's like, if you take all the books out there, we're talking really like, it's just like,
a couple trillion. Like, if you talked about, like, all the newspapers that have ever been written, it's a couple trillion. But we're talking about 15 trillion. So what you're saying is the quality data that's out there is a processable amount of data. No, no, I'm saying that we're running out of data. We're running out of data, yes. That people write
smart books, like, you know, they're not writing, you know, the books fast enough, basically, to keep supplying us with more and more data. And so that's a big wall that we've been facing. Like, how are we going to keep improving the models if we're not going to be able to scale up the data that they're using? And people say, oh, but you could just take every YouTube video. But it's like, have you seen most YouTube videos? It's not going to make your model smarter. No, it's going to make it dumber. No. And so, but there is an exception to this rule. And so the exception... Now,
So synthetic data is using an LLM to generate text and then turning around and training a new model on that text. And so that sounds like very circular, like how it's like me trying to teach myself in a room without a book or anything, just talking to myself and I'm going to teach myself, like how am I, how's that supposed to work? Like in terms of getting new information, right?
isn't that in a sense almost getting high on your own supply that it's not going to help you? And that is sort of true if, let's say, you're talking about the history of the Peloponnesian War or something. You're not going to get anything new by just regurgitating your own output. The exception to all that is if you're talking about logic, math, physics,
computer programs, because in those things, the big difference is that you can verify that what you said is correct. So, you know, just like the rules of chess are very simple, but it's like almost unlimited complexity of the possible chess games. It's the same thing. Like there are so many possible simple Python programs that are like a hundred lines or less that, you know, we've only ever seen a tiny subset of them. So you could come up with a, you could say, oh, I want to make a Python program that does X, Y, Z.
generate a candidate and then test and be like, well, when I run it, did I get that output? And if you did, you know that the program's right. And so now you can say, okay, well, let me add that to the training set. And it wasn't in the training set originally, but it is correct and good. And so what you can do is you could start exploring the world of like all possible math theorems and
and working out, you know, all these math proofs, verify that they're right, and then add them to the training set. And in that way, you could basically come up with lots of data that's known to be super high quality,
And that's why these models are getting better at logic and math at a much faster rate than they're getting better at anything else. Because you could sort of just keep cranking and getting this synthetic training data, and then the scaling can just keep going forever. So that's why it's sort of funny that rich jobs are most at risk from AI, right?
I think a lot of people thought it was like, well, you're still going to need people who are really, really smart at quantitative stuff. And it's like, I got news for you. That's the thing that they're going to become superhuman at before anything else. Because you're still going to want to read the history book by a really smart human before you read the AI's history book. But the AI mathematician might...
you know, keep pretty good, you know, two years from now, like three years from now. Jeffrey, we heavily talked about your article, which I'll have linked in the show notes if people want to go read that for themselves firsthand. But also just tell us a little bit more about you, like where you come from, what you do, what else you're working on. Sure. So in my day job for the last couple of years, I founded and CEO of Pastel Network, which is actually a crypto company.
PSL is our ticker. We trade on a few exchanges like MEXC and Gate. And we started out as sort of NFT companies.
It's an interesting project. It's based on the Bitcoin core proof of work concept, but with all this additional layers to it. And but sort of in the last year, we've done a big pivot to decentralized AI inference. And so I've written just tremendous amount of code in the last year to essentially let you do inference across all sorts of modalities, all sorts of providers of AI.
AI models, including totally uncensored models. And you don't have to dox yourself by giving an email address and a credit card and your IP address. You can just pay with crypto and it's all pseudonymous and it's decentralized. It's going, it's all the inferences being handled by these supernotes that anyone can start and
themselves. And you can even, I mean, the example I like to joke about is you can use one of these uncensored
versions of the Lama models and say, like, how do I make meth at home? And they'll actually just tell you exactly the recipe, whereas, you know, good luck trying that on ChatGPT or Claude. Is this, would you call this the sovereign AI sector? Yeah, yeah. It's really sort of decentralized. I mean, part of the thinking of it was that, you know, for me, it's not necessarily on the consumer level like ChatGPT, although I did make something like that. Like, if you go to inference.pastel.network, you can try it
all in a browser and you could do inference across all these models. But it's also that to make it sort of an API that if you have another crypto project, like let's say you have a prediction market and you want to make it so anyone can, in a decentralized way, create their own prediction event.
But you want to have some rules around that. Like, you don't want people to make, like, assassination markets where they're predicting that somebody's going to die by a certain date. So you need to have some kind of moderation. You don't want to necessarily have it be that there's a moderator who has, like, power to delete stuff, right? Because how is that decentralized? So I think the better way to implement something like that is to have
And LLM do it in a totally impartial way where we have this prompt that says, you're not allowed to do an event about, you know, involving any of these subjects. And then you have the user wants to create their prediction event. They have to describe what is being predicted.
at the time they're trying to create that event in the system, it's going to show it to an LLM. The LLM's going to say yes or no, and based on that, they're going to say, no, you can't do this. You have to change it. Now, if you have this prediction market that's decentralized, you can't really go and use OpenAI or Cloud for this because that requires an API key hooked up to a credit card, which means that's not decentralized. It can't work like that. It has to actually be decentralized. So
That's the idea of pastel, is that they could use pastel and they can say with a straight face, in all honesty, that this is decentralized right down the line, that this is fully decentralized.
And that it can never be shut down by just turning off this one API key or credit card or, you know. And so that's the basic idea. And then I have some other, you know, side projects like my YouTube transcript optimizer, which is where I publish. People are very confused that Wyze is not on Medium or something or Substack. And I'm like, sorry. It's so funny because I basically was...
trying to, you know, help my organic search ranking of my little YouTube tool, which, you know, I've generated like $1,000 or something of revenue from it. And then it's like in the process, I may have inadvertently contributed to $2 trillion getting wiped off global equity markets. Because, you know, the fact is,
I really, look, I don't want people to say that I'm some megalomaniac here, but the fact of the matter is all of the news headlines came out saying the stock market crashed because DeepSeek. And it's like, I'd like to point out that the DeepSeek V3 technical paper that talked about the efficiency gains, it came out December 27th.
A month ago.
So they can understand it. And I published it like in the middle of the night on Friday. And then it started taking off.
And then it got shared by Chamath, who has, you know, whatever, 1.8 million. Right? And Chamath, and it's been viewed over 2 million times. Naval. Naval Ravikant has 2.5 million. And then, like, the Y Combinator, Gary Tan and the Y Combinator account, between them they have millions of followers. And not only did they share it, but they were, like, very effusive in their praise about this is really smart. And that...
went crazy. And I can tell you, I have been inundated by requests from huge funds that want to talk to me about this. And I believe that it did, in fact, as crazy as it sounds, precipitate the decline. Obviously, I didn't cause it. It was caused by the underlying situation. But in terms of highlighting it,
It didn't come from the investment banks. And I think part of the problem is just people are talking in different circles. They're not like the people who are buying NVIDIA with billions of dollars at a big fund are not reading the technical papers and they're not even necessarily reading the tweets from Andrej Karpathy.
You know, they're just relying on sort of this consensus of where things are going. And all it took was sort of a really in-depth discussion.
explanation that made sense to them and they were like holy shit I didn't know this and you know can I say one other funny thing is I have because it's running on my own blog I have Google Analytics I can see real time like who's you know not who but where they are and it's so funny because I just when it started going viral at first I was so thrilled that 50 people were reading it at once and then it was like before I knew it it was like 1500 people at any given moment and it's a 60 minute read it's 12,000 words so it's not short and
And, but at first it was like mostly guys in New York because that's where all the hedge funds are. And, but then I noticed right before I fell asleep on Saturday night that, you know, where the biggest place where people are reading it was San Jose. And I'm like, who? That sounds like where NVIDIA is based. And then, because there were like hundreds of people from San Jose reading the thing at the same time. And, yeah.
As of yesterday is when I last checked, over 2,000 people from San Jose read my article. And, you know, the funny thing about NVIDIA is that they've gone up so much that something like 80% of the employees...
have more than like $10 million worth of stock. And you know it's like the main thing that they talk about with their spouse and friends of like, man, I have a lot of the stock. Should I keep staying on for this ride? And they understand the technology, but maybe they don't understand how to value a company. And they read this, and this thing started passing around like wildfire. And I was like, oh my God, I bet you Jensen's reading this too. And I think...
There's a lot of stock that sort of never hit the market because it was awarded as RSUs and options to these people. And it only takes a little bit of that on the margin to start causing imbalances. And so I wouldn't be surprised if a lot of that sell pressure came from NVIDIA employees. But also it's like these big hedge funds like control a lot of this, a lot of fast money players. And they suddenly got spooked and it's like,
So it's wild to think about that, you know, it could have actually been this sort of, you know, the Reichstag fire, if you will, of setting off this whole course of events. But I actually do believe without it being so, you know,
I mean, I'm sure there are people who say no about this other guy wrote this and this other guy wrote this. And I was like, yeah, but my thing went pretty freaking viral. And from the right people... And other people's stuff cited yours, your article. Sure. Or, you know, maybe they didn't. I mean, I saw the guy, Ben Thompson from Statitory. It sort of sounded like he paraphrased my thing without giving me any credit, but whatever, you know, it's... But I just think it's really funny how, like, there's...
stories today from the New York Times and the Wall Street Journal that both said that
you know, they're always trying to assign causality to stuff. And they said it was caused by this. And I was like, not really. Because it's a ludicrous concept that the 45x efficiency gains, that was known a month ago. So you have to explain why there was a one-month lag. Okay? Whereas, like, this is very understandable, that this spread like wildfire from thought leaders like Chamath and Naval. I mean, Naval is, like, put on such a pedestal by...
by the VC guys. And the tech hedge fund guys look up to the VC guys, like Andreessen Horowitz and all these guys, and the Y Combinator guys. They're the experts, right? And then you have those guys saying that this is a great article
And it's like, well, okay. And so, of course, that can very quickly convince. And it's not like you have to convince everyone. You just have to convince, you know, the guys at like KOTU who are managing $70 billion that they should maybe like sell a little bit to get in front of this. And, you know, that's all you need. And so, anyway, I emailed both of the journalists that at least, you know, you should be aware that you may have gotten the causality a little wrong on this.
But anyway. Well, Jeffrey, it's an honor to have the original source of the information on the podcast. It was great to have you as a guest. And perhaps as these AI wars, NVIDIA chip wars, USA, we didn't even get a chance to talk about USA versus China. But as this progresses, maybe we can get you back on to keep on commentating. Yeah, I really appreciate you coming on. Great. Thanks a lot.
Bankless Nation, you guys know the deal. Crypto is risky. You could lose what you put in. But it sounds like the traditional market is also risky too. But we're headed west. This is the frontier. It's not for everyone, but we're glad that you are with us on the Bankless journey. Thanks a lot.