We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode What The Hell Is DeepSeek?

What The Hell Is DeepSeek?

2025/1/31
logo of podcast Better Offline

Better Offline

AI Deep Dive AI Chapters Transcript
People
E
Ed Zitron
一位专注于技术行业影响和操纵的播客主持人和创作者。
Topics
Ed Zitron: 我认为DeepSeek的出现对整个生成式AI行业,特别是对美国公司来说,是一个巨大的冲击。他们开发的模型不仅在性能上与OpenAI等巨头旗鼓相当,甚至在某些方面还超越了它们,但成本却低得多,运行效率也高得多。这直接挑战了美国公司长期以来奉行的‘越大越好’的策略,也动摇了他们赖以生存的高成本商业模式。DeepSeek的开源策略更是雪上加霜,使得任何人都可以免费使用和改进他们的模型,这将加速技术迭代,并进一步挤压美国公司的生存空间。 DeepSeek的成功,也暴露了美国生成式AI公司的一些问题。他们长期以来依赖于高额的资金投入和无限的算力,忽视了效率和成本控制,这使得他们缺乏应对竞争的灵活性。而DeepSeek则通过精巧的算法和技术创新,在有限的资源下实现了高性能和低成本,这为其他AI公司树立了一个新的标杆。 当然,DeepSeek的崛起也引发了一些担忧。例如,其资金来源和数据安全问题,以及潜在的国家资助等。但无论如何,DeepSeek的出现都将深刻地改变生成式AI行业的竞争格局,迫使美国公司重新思考其发展战略。

Deep Dive

Chapters
This chapter introduces DeepSeek, a relatively unknown Chinese AI company that has disrupted the generative AI industry with its efficient and open-source models. These models are significantly cheaper to run and outperform existing models from major players like OpenAI. This has sent shockwaves through the market, challenging the established narrative of expensive AI development.
  • DeepSeek's models undercut OpenAI's in several meaningful ways
  • DeepSeek's models are open source and significantly more efficient
  • The AI bubble narrative is challenged by DeepSeek's cost-effectiveness

Shownotes Transcript

Translations:
中文

Do you want to see into the future? Do you want to understand an invisible force that's shaping your life? Do you want to experience the frontiers of what makes us human? On Tech Stuff, we travel from the mines of Congo to the surface of Mars, from conversations with Nobel Prize winners to the depths of TikTok, to ask burning questions about technology, from

From high tech to low culture and everywhere in between, join us. Listen to Tech Stuff on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. Welcome to Decisions Decisions, the podcast where boundaries are pushed and conversations get candid. Join your favorite hosts, me, Weezy WTF, and me, Mandy B, as we dive deep into the world of non-traditional relationships and explore the often taboo topics surrounding dating, sex, and

Every Monday and Wednesday, we both invite you to unlearn the outdated narratives dictated by traditional patriarchal norms. Tune in and join in the conversation. Listen to Decisions Decisions on the Black Effect Podcast Network, iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

We want to speak out and we want this to stop. Wow, very powerful. I'm Ellie Flynn, an investigative journalist, and this is my journey deep into the adult entertainment industry. I really wanted to be a player boy in my adult. He was like, I'll take you to the top, I'll make you a star. To expose an alleged predator and the rotten industry he works in. It's honestly so much worse than I had anticipated. We're an army in comparison to him.

From novel, listen to The Bunny Trap on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. The OGs of uncensored motherhood are back and badder than ever. I'm Erica. And I'm Mila. And we're the hosts of the Good Moms Bad Choices podcast, brought to you by the Black Effect Podcast Network every Wednesday. Yeah, we're moms, but not your mommy. Historically, men talk too

Hello and welcome to Better Offline. I'm your host, Ed Zitron.

A lot of you have been getting in touch. Yes, you're getting your Deep Seek episode. In fact, this is the first of a two-parter. This will come out on Friday, which is when you're listening to this, and then it'll follow up on Monday. I apologize. I spent a lot of Monday writing this and also learning about a lot of this stuff in an attempt to distill it as best I could. This situation is extremely weird, and it's developing.

And I think even when I put out this episode, there will be new parts of it that I have yet to really get to. I will do my absolute best to explain in these episodes both what is happening with DeepSeek, what it means, what they've built, and what it's going to do in the future. But let's begin.

So as January came to a close, the entire generative AI industry found itself in a kind of chaos. In short, the recent AI bubble, and in particular the hundreds of billions of dollars being spent on it, hinged on this big idea that we need bigger models which are both trained and run on bigger and even larger GPUs, almost entirely sold by NVIDIA. And in turn, they're based in bigger and bigger data centers owned by companies like Microsoft, Oracle, Amazon, and Google.

Now, there was also this expectation that this would always be the case. Hubris within this industry is kind of part of the whole deal. And generative AI was always meant to be this way, at least for the American developers. It was always meant to be energy and compute hungry. Throwing entire zoos worth of animals and boiling lakes was necessary to do this. There was never any other way to do it.

And I thought, at least I've thought for a while, that this was because they just, they tried to make them more efficient, but they couldn't. There was just something about transformer-based architecture, like the stuff that underpins chat GPT, so the GPT model under chat GPT either. It wasn't the case though.

A Chinese artificial intelligence company that few people had really heard of called DeepSeek came along a few weeks ago with multiple models that aren't merely competitive with open AIs but actually undercut them in several meaningful ways. DeepSeek's models are both open source, which means that their source code and research is public,

And they're significantly more efficient as well. As much as 30 times cheaper to run in the case of their reasoning model R1, which is competitive with OpenAI's O1, and 15 or more times more efficient than GPT-4.0. It's actually kind of crazy when you think about it. And as you're going to hear, this whole thing has jokified me all over again. And what's crazy is that some of them can be distilled, which I'll get to later, and run on local devices like a laptop. It's kind of crazy.

And as a result, the markets have kind of panicked because the entire narrative of the AI bubble has been that these models have to be expensive because they are the future. And that's why hyperscalers had to burn $200 billion in capital expenditures for infrastructure to support this wonderful boom, and specifically the ideas of OpenAI and Anthropic. The idea that there was another way to do this, that in fact we didn't need to spend all this money and that maybe we could find a more efficient way of doing it, well...

That would require them to have another idea other than throw as much money at the problem as possible. Yeah, they just didn't consider it, it turns out. And now along has come this outsider that's upended the whole conventional understanding and perhaps even dethroned a member of America's tech royalty. Sam Altman, a man who has crafted, if not a cult of personality, some sort of public image of an unassailable visionary that will lead the vanguard in the biggest technological change since the internet. Yeah.

He's wrong. He never was doing that. I've been saying it for a while. He's never been doing this. But DeepSeek isn't just an outsider. No, they're a company that's emerged as a side project from a tiny, tiny Chinese hedge fund, at least by the standards of hedge funds, like $5.5 billion on assets under management. And their founding team has nowhere near the level of fame and celebrity or even the accolades of Sam Altman. It's distinctly humiliating for everyone involved that isn't DeepSeek.

And on top of all of that, DeepSeek's biggest, ugliest insult is that its model, DeepSeek R1, is competitive, like I said, with OpenAI's incredibly expensive O1 reasoning model, yet significantly, and I mean 96%, cheaper to run. And it can even be run locally, like I said. Speaking to a few developers I know, one was able to run DeepSeek's R1 model on their 2021 MacBook Pro with an M1 chip. That is a four-year-old computer.

Not a 30,000 GPU in sight. It's kind of crazy. Worse still, DeepSeq's models are made freely available to use with the source code published under the MIT Tech License, along with the research on how they were made, although not the training data, which makes some people say it's not really open source. But for the sake of argument, I'm just going to say open source.

And this means, by the way, that DeepSeq's models can be adapted and used for commercial use without the need for royalties or fees. Anyone can take this and build their own. It's kind of crazy. By contrast, OpenAI is anything but open, and its last LLM to be released under the MIT license was 2019's GPT-2. No, no, wait, wait, shit. Let me correct that. DeepSeq's biggest, ugliest secret is actually that it's obviously taking aim at every element of OpenAI's portfolio.

As the company was already dominating headlines this week, it quietly dropped its Janus Pro 7B image generation and analysis model, which the company says outperforms both Stable Diffusion and OpenAI's Darlie 3. And those are, by the way, image generation things. So you type in something like Garfield with boobs, and then out comes a Garfield with juicy cans. And that's probably the first time you hear that on the podcast, but probably not the last. And as with its other code,

DeepSeek has made this freely available to both commercial and personal users alike, whereas OpenAI is largely paywall DALI 3.

This is really, it's a truly crazy situation and it's also this cynical, vulgar version of David and Goliath where a tech startup backed by a shadowy Chinese hedge fund with $8 billion under management is somehow the plucky upstart against the lumbering lossy, oafish $150 billion startup backed by multiple public tech companies with a market capitalization of over $3 trillion.

I realize by the way I said earlier $5.5 billion under management, this is why you check your notes in advance. But I'm not cutting it, this is fresh. I am inside a closet in New York. The content must flow. Anyway, DeepSeek's v3 model, which is comparable and competitive with both OpenAI's GPT-4O and Anthropic's Claude Sonnet 3.5 models, which by the way has some reasoning features,

Like I said, it's 53 times cheaper to run the R1 when using the company's own cloud services. And as mentioned earlier, said model is effectively free for anyone to use locally or on their own cloud instances and can be taken by any commercial enterprise and turned into a product of their own should they desire to, say, compete with OpenAI, the loudest and most annoying startup of all time.

In essence, DeepSeek, and I'll get into its background and the concerns people might have about its Chinese origins, released two models that perform competitively and even beat models from both OpenAI and Anthropic, undercut them in price, and then made them open, undermining not just the economics of the biggest generative AI companies, but laying bare exactly how they work. The magic's gone. There's no more voodoo inside Sam Altman's soul. It's all out there.

And the last point is extremely important when it comes to OpenAI's reasoning model, which specifically hid its chain of thought for fear of these unsafe thoughts that might manipulate the customer. And then they added, slightly under their breath, that the actual reason they did it was a competitive advantage.

Now to explain what that means, when you make a request with OpenAI's O1 model, say, give me all the states with the letter R in them, it actually shows you like the thinking. And by the way, these things don't fucking think. They're computer bullshit. Like they don't think at all. But I'm going to use it just for this. So you see it say, okay, here are all the American states. Which ones have that letter? I'm checking all of those. It's effectively having a large language model check a large language model.

Now, the thing is, the steps they were showing you were all cleaned up. They would look nice. They would be formatted nicely. DeepSeek's chain of thought is completely laid bare, which is very interesting because it really takes the wind out of OpenAI's sails. And on top of that...

It allows you to see actually how these things think through things. Again, not really thinking. But still, you can see things about how large language models work that these companies didn't want you to have. On top of this, OpenAI's O1 model has something even shittier to it, which is these chain of thought things all cost money.

When you see it generate these thoughts, it's actually generating more thoughts than you see because they're hiding the chain of thought. So OpenAI is just charging you an indeterminate amount of money, an insane amount of money as I'll get to later. But nevertheless, you don't know what you're being charged for. You don't even know what's really going on under the hood. Or you could use DeepSeek.

And let's be completely clear, by the way: OpenAI's literal only competitive advantage against Meta and Anthropic was its reasoning models, 01 and 03. And 03, by the way, is currently in a research preview and is mostly just more of the same. Although I mentioned earlier in the show that Anthropic's Claude Sonnet 3.5 has some reasoning features, they're comparatively more rudimentary than those in 01 and 03, and I'd argue R1, which is DeepSeek's model.

In an AI context, reasoning works by breaking down a prompt into a series of different steps with considerations of different approaches. Like I said earlier, effectively a large language model checking its own homework with no thinking involved because, like I said, they do not think or know things.

And OpenAI rushed to launch its O1 reasoning model last year because, and I quote Fortune from last October, Sam Orman was eager to prove to potential investors that in the company's latest funding round, the OpenAI remains at the forefront of AI development. And as I've noted in my newsletter at the time, it was not particularly reliable, failing to accurately count the number of times the letter R appeared in the word strawberry, which was the code name for O1. Very funny stuff.

At this point, it's fairly obvious that OpenAI wasn't anywhere near the forefront of AI development. And now that its competitive advantage is effectively gone, there are genuine doubts about what comes next for the company. As I'll go into, there are many questionable parts of DeepSeek's story. Its funding, what GPUs it has, and how much it actually spent training these models. But what we definitively understand to be true.

is bad news for open AI. And I would argue every other large US tech firm that's jumped onto the generative AI bandwagon in the past few years. Do you want to understand an invisible force that's shaping your life? I'm Osvald Ossian, one of the new hosts of the long-running podcast Tech Stuff. I'm slightly skeptical, but obsessively intrigued. And I'm Cara Price, the other new host. And I'm ready to adopt early and often.

On Tech Stuff, we travel all the way from the mines of Congo to the surface of Mars to the dark corners of TikTok to ask and attempt to answer burning questions about technology. One of the kind of tricks for surviving Mars is to live there long enough so that people evolve into Martians. Like data is a very rough proxy for a complex reality. How is it possible that

The world's new energy revolution can be based in this place where there's no electricity at night. Oz and I will cut through the noise to bring you the best conversations and deep dives that will help you understand how tech is changing our world and what you need to know to survive the singularity. So join us. Listen to Tech Stuff on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

Hey, y'all, this is Reed from the God's Country Podcast. We had the one and only Bobby Bones in the studio this week, and we cover everything from his upbringing to his outdoor experiences with his stepdad, Arkansas Keith, to the state of country music. We may even end the episode with a little jam session led by Bobby himself. Y'all be sure and listen to this episode of God's Country with Bobby Bones on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. Don't go shopping to Target.

With khaki pants and a red shirt on Don't go shopping at Target With khaki pants and a red polo shirt on An old lady came up to me She said how much for this cream of wheat

Hey, it's Alec Baldwin. This season on my podcast, Here's the Thing, I speak with musician, photographer, and philanthropist Julian Lennon. One of the really important things that happened to me in my relationship with photography and the images was that I would have people write to me, people that...

couldn't financially afford to travel the world or go anywhere, couldn't or were disabled and couldn't travel the world or go anywhere. And what they had all said to me is that you bring these stories to us, you bring the truth, you bring life to us of cultures that we would never necessarily know anything about.

Photography really does allow me to do that. Have empathy for people on the other side of the world that you'll never ever meet, but you'll at least have some understanding of what their life is and what they went through or are still going through. Listen to the new season of Here's the Thing on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

Jon Stewart is back at The Daily Show, and he's bringing his signature wit and insight straight to your ears with The Daily Show Ears Edition podcast. Dive into Jon's unique take on the biggest topics in politics, entertainment, sports, and more. Joined by the sharp voices of the show's correspondents and contributors.

And with extended interviews and exclusive weekly headline roundups, this podcast gives you content you won't find anywhere else. Ready to laugh and stay informed? Listen on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. DeepSeek's models actually exist. They work, at least by the standards of hallucination-prone LLMs that don't, at the risk of repeating myself, know anything.

They've been independently verified to be competitive in performance, and their magnitude's cheaper in price than those from both hyperscalers, Google's Gemini, Meta's Llama, Amazon Q, and so on and so forth, and from those released by OpenAI and Anthropic.

DeepSeq's models don't require massive new data centers. They run on GPUs currently used to run services like ChatGPT and even work on more austere hardware. Nor do they require an endless supply of bigger, faster NVIDIA GPUs every single year to progress. The entire AI bubble was inflated based on the premise that these models were simply impossible to build without burning massive amounts of cash, straining the power grid and blowing past emissions goals.

and that these costs were both necessary and really good because they'd lead to creating powerful AI, something that's yet to happen. And it's kind of obvious at this point that that wasn't true. Now the markets are sitting around, they're asking a very reasonable question. Shit, did we just waste $200 billion? Anyway, let's get into the nitty gritty. What is DeepSeek?

First of all, if you want a super deep dive into what it is, I can't recommend VentureBeats write up enough. I'll link to it in the show notes as I usually do. It's really good and it goes into a lot more detail than I will. But here's the too long didn't read for you. DeepSeek is a spinoff from a Chinese hedge fund called High Flyer Quant. It's a relatively small and young company. And from its inception, it went big on algorithmic and AI driven trading. Later, it started building its own standalone chatbots, including a chat GPT equivalent for the Chinese market.

This is what we know right now. I'm sure some of you will say, oh, well, who knows if that's really true? Sure, I think that that's fair. I also think that there are parts of Sam Altman's legend that we should question as well. I think the circumstances under which Sam Altman got made head of Y Combinator are extremely questionable. I'm saying you can question DeepSeek, and indeed you should. We should be more critical of these powerful companies. But don't do it halfway. If we're going to be worried, let's be worried about everyone.

Now, DeepSeek did a few things differently, like open sourcing its models, although it likely built upon tech from other companies like Meta's Llama and the ML library PyTorch. To train its models, it secured over 10,000 NVIDIA GPUs right before the US imposed export restrictions, which sounds like a lot, but it's a fraction of what the big AI labs like Google, OpenAI, and Anthropic have to play with. I think I've heard estimates of like 100,000 to 300,000 each, if not more.

Now, you've likely seen or heard that DeepSeek trained its latest model for $5.6 million, as opposed to the insane amounts that I'll get to later. And I want to be clear that any and all mentions of this number are estimates. In fact, the provenance of the $5.58 million number appears to be a citation of a post made by an NVIDIA engineer in an article from the South China Morning Post, which links to another article from the South China Morning Post, which simply states that DeepSeek v3 comes with six

171 billion parameters and was trained in around two months at the cost of 5.58 million dollars with no additional citations of any kind. So you should take it with a pinch of salt but it's not totally ludicrous. While there are some that have estimated the cost DeepSeek's V3 model was allegedly trained using 2048 Nvidia H800 GPUs according to its paper,

And Ben Thompson of Strateetree has made this clear that the $5.5 million number only covers the literal training cost of the official training run. And this is made fairly clear in the paper, by the way, of V3. And that's the one that's competitive with OpenAI's GPT-4O model. Meaning that any costs related to prior research or experiments on how to build the model were left out. Now, big shout out to Minimax here, the guy on Blue Sky and Twitter. He's great. He is wonderful and also added that this is fairly standard for the industry. Again,

You choose how you feel about this, but I want to give you the information. And while it's safe to say that DeepSeek's models are cheaper to train, the actual costs, especially as DeepSeek doesn't share its training data, which some might argue means its models are not really open source, as I said, the numbers get a little harder to guess at.

Thompson notes that DeepSeq had to craft a bunch of elegant workarounds to make the model perform, including writing code that ultimately changed how GPUs actually communicated with each other. This functionality isn't otherwise possible using NVIDIA's developer tools. They really had to get in there. It's kind of cool.

DeepSeek's models, V3 and R1, are more efficient and as a result cheaper to run and can be accessed via its API at prices that are astronomically cheaper than OpenAI's. DeepSeek Chat, running DeepSeek's GPT-40 competitive V3 model, costs 0.07 cents.

per 1 million input tokens as in commands given to the model and $1.110 per 1 million output tokens as in the resulting output from the model. I know that these numbers kind of like just sound like numbers, like maybe you don't have context, so let me give you some. This is a dramatic price drop from the $2.50 per 1 million input tokens and $10 per 1 million output tokens that OpenAI charges for GPT-4.0.

This isn't just undercutting. This is a bunker buster.

Now, there is a side that I'll kind of get into a little bit later in that you are using models hosted in a country that you don't know, probably China. There are data concerns. But again, you can put this on your own server. You could put this in Google Cloud. Both Microsoft and Google are apparently thinking about it. Now, the information reported that Google had added it to Google Cloud. No, they did not. They didn't do that. They allowed you to connect Hugging Face. This is a whole bunch of technical stuff that if you understand, you'll be like, yeah, Ed, I know.

Long story short, the hyperscalers are already bringing DeepSeq out.

And I'll get to why that's bad later in detail, but it's also very funny. Now here's something else that's funny. DeepSeek Reasoner, its reasoning model, costs 55 cents per 1 million input tokens and $2.19 per 1 million output tokens. Now that sounds expensive. Maybe it is, whatever. That's goddamn nothing compared to the $15 per 1 million input tokens and $60 per 1 million output tokens of OpenAI. Woof.

If I'm Sam Altman, I'm shitting myself. But there's an obvious bar here. We do not know where DeepSeek is hosting its models, who has access to that data, or where that data is coming from or going to. We don't know who funds DeepSeek, other than it's connected to HiFlyer, the hedge fund that I mentioned earlier that it split from in 2023. There are concerns that DeepSeek could be state-funded, and that DeepSeek's low prices are a kind of geopolitical weapon, breaking the back of the generative AI industry in America.

I'm not really sure whether that's the case or not. It's certainly true that China has long treated AI as a strategic part of its national industrial policy and is reported to help companies in sectors where it wants to catch up with the Western world.

The Made in China 2025 initiative saw a reported hundreds of billions of dollars provided to Chinese firms working in industries like chip making, aviation and, yeah, AI. The extent of that support isn't exactly transparent, surprise, surprise, and so it's not entirely out of the realm of possibility that DeepSeek is also the recipient of state aid.

The good news is that we're going to find out fairly quickly. American AI infrastructure company Grok is already bringing DeepSeek's model online, meaning that we'll get at least a very... some sort of confirmation of whether these prices are realistic or whether they're heavily subsidized by whoever it is that backs DeepSeek. It's also true that DeepSeek is owned in part by a hedge fund, which likely isn't short of cash to pump into them.

But as an aside, given that OpenAI is the benefactor of billions of dollars of cloud compute credits and gets reduced pricing for Microsoft's Azure cloud services to run its...

Agile models, it's a bit tough for them to complain about a rival being subsidized by a larger entity with the ability to absorb the costs of doing business, should that be the case. Same goes for Anthropic, by the way. And yes, I know Microsoft isn't a state, but with a market cap of $3.2 trillion and quarterly revenues larger than the combined GDPs of some EU and NATO nations, it's kind of the next best thing.

But I digress. Whatever concerns there may be about malign Chinese influence are bordering on irrelevant, outside of the low prices of course offered by DeepSeek itself. And even that is speculative at this point. Once these models are hosted elsewhere and once DeepSeek's methods, which I'll get to in a little bit, are recreated, and by the way that's not really going to take very long, I believe we're going to see that these prices are indicative of how cheap these models are to run.

Do you want to understand an invisible force that's shaping your life? I'm Osvald Ossian, one of the new hosts of the long-running podcast Tech Stuff. I'm slightly skeptical, but obsessively intrigued. And I'm Cara Price, the other new host. And I'm ready to adopt early and often.

On Tech Stuff, we travel all the way from the mines of Congo to the surface of Mars to the dark corners of TikTok to ask and attempt to answer burning questions about technology. One of the kind of tricks for surviving Mars is to live there long enough so that people evolve into Martians. Like data is a very rough proxy for a complex reality. How is it possible that

The world's new energy revolution can be based in this place where there's no electricity at night. Oz and I will cut through the noise to bring you the best conversations and deep dives that will help you understand how tech is changing our world and what you need to know to survive the singularity. So join us. Listen to Tech Stuff on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

Hey, y'all, this is Reed from the God's Country Podcast. We had the one and only Bobby Bones in the studio this week, and we cover everything from his upbringing to his outdoor experiences with his stepdad, Arkansas Keith, to the state of country music. We may even end the episode with a little jam session led by Bobby himself. Y'all be sure and listen to this episode of God's Country with Bobby Bones on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. Don't go shopping to Target.

with khaki pants and a red shirt on don't go shopping at target with khaki pants and a red polo shirt on that's what you have song right of move an old lady came up to me she said how much for this cream of wheat

Hey, it's Alec Baldwin. This season on my podcast, Here's the Thing, I speak with musician, photographer, and philanthropist Julian Lennon. One of the really important things that happened to me in my relationship with photography and the images was that I would have people write to me, people that...

couldn't financially afford to travel the world or go anywhere, couldn't or were disabled and couldn't travel the world or go anywhere. And what they had all said to me is that you bring these stories to us, you bring the truth, you bring life to us of cultures that we would never necessarily know anything about.

Photography really does allow me to do that. I have empathy for people on the other side of the world that you'll never ever meet, but you'll at least have some understanding of what their life is and what they went through or are still going through. Listen to the new season of Here's the Thing on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

John Stewart is back at The Daily Show, and he's bringing his signature wit and insight straight to your ears with The Daily Show Ears Edition Podcast. Dive into John's unique take on the biggest topics in politics, entertainment, sports, and more. Joined by the sharp voices of the show's correspondents and contributors.

And with extended interviews and exclusive weekly headline roundups, this podcast gives you content you won't find anywhere else. Ready to laugh and stay informed? Listen on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. So you might be wondering, how the hell is this so much cheaper? And that's a bloody good question. And because I'm me, I have a hypothesis.

I do not believe that the companies making these foundation models, such as OpenAIR and Anthropic, have actually been incentivized to do more with less. And because their chummy little relationships with hyperscalers like Amazon, Google, and Microsoft were focused almost entirely on making the biggest, most hugest models possible, using the biggest, even huger-er-est chips, and because the absence of profitability didn't stop them from raising more money,

Well, they've never had to be fucking efficient, have they? They've never had to try. Maybe they should buy less avocado fucking toast anyway. Let me put it in simpler terms. Imagine living on $1,500 a month, and then imagine how you'd live on $150,000 a month, and that you have to, like Brewster's millions, spend as much of it as you can to complete a mission, a very simple mission. Live.

In the former example, you're concerned survival. You have a limited amount of money and must make it go as far as possible with real sacrifices to be made with every dollar you spend. If you want to have fun, you're going to have to eat less potentially. All the food you eat will have to be cheaper. You have to live on a budget. You have to make decisions and indeed you might learn to cook at home. You might walk more. You might do things that will help you not spend all your money.

In the latter example, where you have $150,000 a month that you must spend, you're incentivized to splurge, to lean into excess, to pursue this vague idea of living your life. Your actions are dictated not by any existential threats or indeed any kind of future planning, but by whatever you perceive to be an opportunity to live. Open AI and Anthropic are emblematic of what happens when survival takes a backseat to living.

They have been incentivized by frothy venture capital and public markets, desperate for the next big thing, the next big growth, to build bigger models and sell even bigger dreams, like Dario Amadei of Anthropix saying that your AI, and I quote, could surpass almost all human beings at almost everything shortly after 2027, and I just want to take a fucking second. Journalists, if you're listening to this, stop fucking quoting this bullshit!

Stop it! You're doing nothing! You are failing at your goddamn job! Every single time you quote this bullshit, this nonsense, shortly after 2027, what the fuck does that mean? 2028? 2029? 2030? What does surpassing humans and almost everything even mean? This shit doesn't work! This shit is not good! Oh my god! Anyway, back to the podcast, Ed, calm down...

Both OpenAI and Anthropic have effectively lived their existence with the infinite money cheat from The Sims. And I know some of you might say, by the way, it's not an infinite money, you just add, you go into the console, you get my point. And both companies have been bleeding billions of dollars a year after revenue. And that's, by the way, making billions of dollars and then still losing billions is insane. And they still operated as if money would never run out because it

And it wouldn't. If they were actually worried about that happening, they would have certainly tried to do what DeepSeek has done, except they didn't have to, because both of them had the endless cash and access to GPUs from either Microsoft, Amazon, or Google. And the Stargate thing is just... I will mention it later. Just long story short, they're not going to put $500 billion into it. It was up to $500 billion. I'm so tired of this shit.

OpenAI and Anthropic have never been made to sweat, unlike me in this closet where I'm recording this. And they've received endless amount of free marketing from a tech and business media happy to print whatever vapid bullshit they spout. And it's just very frustrating. They've raised money at will with... And Anthropic, by the way, is currently raising another $2 billion, valuing the company at $60 billion. And this was, I think, happening while DeepSea was going on, which is really funny. And they've done all of this off of a narrative of the...

We need more money than any company has ever needed, ever. Because the things we're doing have to cost this much. There is no other way. You must give us more money. My name is Sam Altman. I need more money than has ever been made from my huge, beautiful company that sucks and needs money to train it. Help me, please. My big, beautiful, sick company is dying, but the best and most important company of all time. It's also normal. Now...

Do I think that they were aware that there were methods to make their models more efficient? Sure. OpenAI tried and failed in 2023 to deliver a more efficient model to Microsoft called Arrakis. I'm sure there are teams at both Anthropic and OpenAI that are specifically dedicated to making things kind of more efficient, but they didn't have to do it, and so they didn't.

And as I've written before in my newsletter and argued on this very podcast, OpenAI simply burns money and have been allowed to burn money and up until recently likely would have been allowed to burn even more money because everybody, all of the American model developers, appeared to agree that the only way to develop large language models was to make them as big as humanely possible and work out troublesome stuff like making them profitable or turning them into a useful thing later.

which is, I presume, when AGI happens, a thing that they're still in the process of defining, let alone doing. DeepSeek, on the other hand, had to work out a way to make its own large language models within the constraints of the hamstrung NVIDIA chips that can be legally sold to China.

While there's a whole cottaged industry of selling chips in China using resellers and other parties to get restricted silicon into the country, the entire way in which DeepSeq went about developing its models suggests that it was working around very specific memory bandwidth constraints, meaning that the amount of data that could be fed into it and out of it and into the chips. In essence, doing more with less wasn't something it chose, but it's something they had to do.

I've touched already on the technical how of these models in greater depth, and you can really read in that in my newsletter. And you can go to where's your ad, not ad, it's at the end of the episode. But I'll also have show notes to articles like Ben Thompson's from Stratechery, because there are lots of things to read here. I know there are some really technical listeners, and I'm sure you're going to flay me in my emails. Please go and read it. I'm not wrong. I've checked with a lot of people too. And by the way, all of this austerity stuff seems to have worked.

There's also the training data situation and another mea culpa. I previously discussed the concept of model collapse and how feeding synthetic data, which is training data created by a generative model into another model, could end up teaching it bad habits, which in turn would destroy the model. But it seems that DeepSeq has succeeded in training its models using generative data.

Specifically, though, and I'm quoting GeekWire's John Theroux, like mathematics, where correctness is unambiguous, and using, and I quote again, highly efficient reward functions that could identify which new training examples would actually improve the model, avoiding wasted compute on redundant data. And it seems to have worked. Though model collapse may still be a possibility, this approach, extremely precise use of synthetic data, is in line with some of the defenses against model collapse I've heard from LLM developers I've talked to.

This is also a situation where we don't know the exact training data, and it doesn't negate any of the previous points I've made about model collapse. Now, we'll see what happens there, but synthetic data might work where the output is something that you could figure out using a calculator. But when you get into anything a bit more fuzzy, like written text or anything with an element of analysis, you'll likely encounter some unhappy side effects. But I don't know if that's really going to change how good these things are.

There's also a little scuttlebutt about where DeepSeq got its data. Ben Thompson at Stratechery suggests that DeepSeq's models are potentially distilling other models' outputs, by which I mean having another model say, Meta's Lama or OpenAI's GPT-4.0, which is why DeepSeq identified itself as ChatGPT at one point, spit out outputs specifically to train parts of DeepSeq. This obviously violates the terms of service of these tools, as OpenAI and its rivals would much rather have you not use its technology to create its next rival.

And OpenAI, by the way, has recently reportedly found evidence that DeepSeek used OpenAI's models to train its rivals. And this is from the Financial Times. Although it failed to make any formal allegations, but it did say that using ChatGPT to train a competing model violates its terms of service. And David Sachs, the investor in Trump administration, AI and CryptoZar says it's possible that this occurred. Although he failed to provide evidence, I just want to say how fucking funny it is that OpenAI is going, oh,

Where are you stealing my stuff? Don't steal my things. Where fucking coward pansy bastard bitches fucking hell What a what a bunch of whiny babies. Oh, no, my plagiarism machine got plagiarized where kiss my entire asshole Sam Altman you little worm you fucking embarrassment to Silicon Valley you should be ashamed of yourself for many reasons, but so much this though where

Oh, no, you stole from me. My plagiarism machine that requires me to steal from literally every artist and author on the internet. The thing where we went on YouTube and transcribed everything and fed it into the machine. That's not stealing. That's good. But you using our model to generate answers, that's just not fair. What a bunch of babies. You guys, Sam almost worth billions of dollars. He has a $5 million car. Cry more, you little worm.

Personally, I genuinely want OpenAI to point a finger at DeepSeek and accuse it of IP theft, mostly for the yucks, but also for the hypocrisy factor. This is a company that, as I've just very cleanly said, exists purely from the wholesale industrial larceny of content produced by literally fucking everyone. And now they're crying. I'm Sam Altman. I'm a big baby. I filled my diaper because someone stole from my plagiarism machine. Kiss my arse. Kiss my arse.

These companies haven't got shit. OpenAI doesn't have shit. They don't have anything. They don't have a next product. Without reasoning, they haven't got anything. And now they don't have that disgusting justification. That overspending the fat, ugly American startup culture of spending as much as you can to build America's next top monopoly. They should be fucking ashamed of themselves. They shouldn't be billionaires. They should be poverty stricken. They should have to pay everyone they stole for.

And it's just, it sickens me seeing the reaction from some people on this, seeing the xenophobia, but seeing this level of defensiveness of a company like OpenAI or Anthropic. And as I'll get into next episode, we are really running out of time here. And I think DeepSeek is really...

I think it could be really the end of days for these companies. I don't know how much they've got left time-wise or even money-wise, and I'm not sure how they even raise money. But in the next episode, I'm going to deep dive into DeepSeek and I'll tell you how they sent the US tech market into a panic and what it actually means for the future of OpenAI, Anthropic, and the hyperscalers backing them. This has been a crazy few days. I hope this has helped.

And on Monday, you'll find out more. Thank you so much for listening. The support I've got for the show has been incredible. And the emails I've got about Deep Seek, I've been trying, okay? I've really been trying. It's the fastest I could do it. But I'm so happy to do this show, and I'm so grateful for all of you. Thank you.

Thank you for listening to Better Offline. The editor and composer of the Better Offline theme song is Matt Osowski. You can check out more of his music and audio projects at mattosowski.com. M-A-T-T-O-S-O-W-S-K-I dot com.

You can email me at ez at betteroffline.com or visit betteroffline.com to find more podcast links and, of course, my newsletter. I also really recommend you go to chat.whereisyoured.at to visit the Discord and go to r slash betteroffline to check out our Reddit. Thank you so much for listening. Better Offline is a production of Cool Zone Media. For more from Cool Zone Media, visit our website, coolzonemedia.com or check us out on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

Do you want to see into the future? Do you want to understand an invisible force that's shaping your life? Do you want to experience the frontiers of what makes us human? On Tech Stuff, we travel from the mines of Congo to the surface of Mars. From conversations with Nobel Prize winners to the depths of TikTok. To ask both of us, what makes us human?

burning questions about technology, from high-tech to low-culture and everywhere in between. Join us. Listen to Tech Stuff on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. Welcome to Decisions Decisions, the podcast where boundaries are pushed and conversations get candid. Join your favorite hosts, me, Weezy WTF, and me, Mandy B, as we dive deep into the world of non-traditional relationships and explore the often taboo topics surrounding dating, sex,

and love. Every Monday and Wednesday, we both invite you to unlearn the outdated narratives dictated by traditional patriarchal norms. Tune in and join in the conversation. Listen to Decisions Decisions on the Black Effect Podcast Network iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

We want to speak out and we want this to stop. Wow, very powerful. I'm Ellie Flynn, an investigative journalist, and this is my journey deep into the adult entertainment industry. I really wanted to be a player boy in my adult. He was like, I'll take you to the top, I'll make you a star. To expose an alleged predator and the rotten industry he works in. It's honestly so much worse than I had anticipated. We're an army in comparison to him.

From novel, listen to The Bunny Trap on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. The OGs of uncensored motherhood are back and badder than ever. I'm Erica. And I'm Mila. And we're the hosts of the Good Moms Bad Choices podcast, brought to you by the Black Effect Podcast Network every Wednesday. Yeah, we're moms, but not your mommy. Historically, men talk too

much. And women have quietly listened. And all that stops here. If you like witty women, then this is your tribe. Listen to the Good Moms Bad Choices podcast every Wednesday on the Black Effect Podcast Network, the iHeartRadio app, Apple Podcasts, or wherever you go to find your podcast.