Welcome to today's episode of Lexicon. I'm Christophe McFadden, contributing writer for Interesting Engineering. Today we're joined by Alfredo Esposito, a digital rights advocate, copyright expert and AI law specialist to explore the rise of DeepSeek, China's open source answer to open AI.
From AI regulation to the global tech arms race, Alfredo breaks down the legal, ethical and geopolitical challenges shaping the future of artificial intelligence. So join us as we dive into AI's open source revolution, a battle over copyright and what deep seeks rise means for the future of innovation markets and digital sovereignty.
Gift yourself knowledge. IU Plus is a premium subscription that unlocks exclusive access to cutting-edge stories, expert insights and breakthroughs in science, technology and innovation. Stay ahead with the knowledge that shapes the future. Alfredo, thanks for joining us. How are you today? Oh, thanks for having me. I'm really, really good today. Thank you. Excellent. Our pleasure. For our audience's benefit, can you tell us a little bit about yourself, please?
Well, I'm a lawyer. Some are going to call me a nerd lawyer or a geek lawyer because I'm a lot into new technologies. I founded my first music label before starting with all this low-end stuff.
towards 2010. And I've been starting with copyright issues related to Creative Commons licenses and open source. And then my career just developed into a bit more low path. So now I'm really into studying generative AI and everything that's related to content creation, copyright, and of course, all the generative AI models. Fantastic. We're done a bit of a rabbit's war in then.
That's an interesting start to finish. Okay. Can you explain what DeepSeek is and how it differs from other well-known AI products like OpenAI's ChatGPT? Well, DeepSeek starting is a generative AI model and it has been developed by a Chinese startup over the past few years.
And basically, at the first sight, it could look a bit quite similar to ChatGPT. That's probably the most famous generative AI model. But I can say that's just a surface-level observation because the first thing to say is that DeepSeq is an open-source model. So this means that any developers could see, modify, and run this model.
Just to give you an example, Sam Altman, that's CEO of OpenAI a few weeks ago, during a Reddit Ask Me Anything session, he says, I personally think that we've been on the wrong side of history here and we need to figure out a different open source strategy. So that gives us the differentiation between OpenAI that was quite close to the Psyk. So basically there are some sides on this history. I hope it's going to be interesting for our listeners because the first part
could be started from the bias that OpenAI is not open at all. The same Elon Musk that was one of the founders of OpenAI started a lawsuit against OpenAI saying that the non-profit idea of the company has been completely, completely cut off.
So the other reason is that the rise of strong open source AI models, such as DeepSeq, that has got R1 and B3 models, this is our decode, makes a bit of demonstration and some ultimate idea of what the potential of collaborative AI developments could bring. So this is potentially probably the biggest difference between open AI models
ChagGPT and Dipsyc. And then there are differences between these two models also, and probably we can dig a bit later on this thing, on how one model is cheap and how another one is quite expensive. So this is something that has been bringing a lot of discussions also into our generative AI model. Okay, brilliant. And so
What is the difference between R1 and B3, did you say, versions of DeepSeq? So basically, there are a few differences between the models.
One of the things that probably makes interesting the differences between the models is that DeepSeq just uses a fraction of the whole model that has been traded. Technically, this is called a mixture of experts, so MOE, and it just activates some differentiation into the parameters of the query and also alongside the tokens that are used.
during the quest and also the question that we usually do through the prompt. So there are as well a neutral language processing, NLP, machine learning and as well large language models but what happens is that that's allegedly the use of the Psyc should be more cost-effective
than you have despite the other model. Okay, fair enough. So obviously it was developed in China, but presumably it's multilingual. Do we know if it's better in native Mandarin or it just doesn't really matter what language you talk to it? Well, that's really interesting because, you know... I can't even answer that.
Yeah, thank you. Like, yeah, Mandarin and Chinese language is really, really hard to understand for these kind of models. There are also a lot of graphic signs. So we could say that, of course, it's optimized for a native Chinese speaker.
And that's kind of regional optimization. But from another side, I don't think that this comes at the expense of all the other languages. We have to say that this kind of mixture of expert system has got, that's allegedly, 671 billion parameters. So that gives the idea that
could always give really, really good performances also with other languages. If we think about also other models that are being developed like
Deep, that's one of the probably the most used all over the world. Probably they are quite good. And I could say that also the same JGPT is not really, really, I'm not really enthusiastic in translation of JGPT model. So I could say that probably it's really, really hard working with Mandarin language, but I don't think
that this comes, as I told you before, at the expense of other languages. Probably we don't have the same level of sophistication as the Chinese language, but I think that it doesn't represent at all an issue for DeepSeq. It wouldn't be a fair metric to use, really, either. OpenAI or ChatsGPT is presumably trained primarily in English.
Yeah, absolutely. Like asking the question in reverse, really, wouldn't it? So that's fair enough. I'm Italian. I could say, you know, like sometimes if you just translate to Italian straight away, of course you lose a lot of nuances. So, you know, it's just where the model has been built probably is going to be better in that language. But I can say that, you know, I think they are strong enough for translation and they probably got the same
kind of differences also when it comes to a really good translation. Probably not the best, but I don't think that the models are so different in that way. Fair enough. I presume you're a native Italian speaker. Yes. So if you've used ChatGPT to translate into or from Italian, how do you find it? Is it too mechanical? Yeah, I have to say I don't really use ChatGPT for translation.
At least I mostly work with English and Spanish, and I could say that probably I'm better in translating than Chad GPT, but not because I'm a really good translator. It's just because, as I told you, there are too many nuances that you cut off from the language. So probably there are a few models. I think they're quite better than Chad GPT. That's fair enough. Just out of interest. I wouldn't know the difference. Yeah.
So reportedly DeepSeq's training costs were a lot lower than other models like GPT-4. Do we know how this is achieved and can these claims be trusted? Well, I think it's kind of misleading that saying that it, you know, that was so cheap training. First of all, because of course, you know, probably...
The cost that they promoted was just the cost that was represented for the final pre-training run. So there are so many expenses around infrastructures, people that have, you know, workers, data acquisition and energy consumption. And, you know, just as well the chip that they should use like for developing everything. I think this is a number that really...
I don't really trust in this number, I have to say. And then, you know, as well, you could say that probably there were like cheaper, cheaper way of developing AI, but I don't really trust this, I have to say. It's simply numbers. It's simply numbers. I don't think, you know, that these kind of figures, we should trust these figures.
that by my experience and by simply how much does it cost to build up a model and also compared to all the other models. So yeah, that's my answer. Of course, as you know, we don't have official numbers, although we can check really official numbers, but I'm quite skeptic on that. Fair enough. Again, the same is probably true for, say, ChatGPT. How can you check, verify any costs that they give? Also with China,
Some of the costs are a lot lower, aren't they? I don't know how much programmers make in China, but I'm guessing it's considerably less than the United States. So that would factor in, presumably. Presumably, yeah. Well, on the same subject, given that presumably apparently it's cheaper to train it, do things like the US sanctions on China for things like the NVIDIA chips...
for the model as it kind of backfired in a way. If the claims are true, it is cheaper to train up the AI. Well, I don't know if it's backfiring. It's a proper thing. Usually looking as well what happened to the market, maybe we can dig a bit later on. This kind of sanction probably are making
advanced data a bit more difficult and expensive for Chinese companies. That's why as well I don't think that the initial figures on how cheap DeepSeq was is true, but probably I think this is going into a bit of long-term strategic implication in this kind of cyber cold war that we are looking at.
That's fair enough, fair enough, Anson. So switching tracks a bit, with regards to the training of the model, there's claims that DeepSeek kind of stole OpenAI's models to help speed up its training. Do we know if there's any truth to this? And is it a case of hypocrisy? OpenAI is basically...
how do I say it, scrapes the internet basically to train its models. Well, here is my take. Copyright or digital copyright is dead.
at least in the AI training space, is dead. And that's not just my opinion. If you look at the recent core decision regarding Meta and Facebook and OpenAI's training practices, so basically they've been essentially validated large-scale data scraping for AI training. The reality is that the enforcements of the traditional copyright in the age of artificial intelligence and generative AI is nearly impossible given the scale
and the nature of how this model learned. Sam Altman and Amira Murati and everyone from OpenAI. It's going to be impossible having a generative AI model without, you know, or like respecting copyright rules. So that's simply, that's why my take is copyright is that, or we are not going to have generative AI, or at least we have to rethink about copyright.
has been done like till today. So then, on OpenAI, when OpenAI accuses DeepSeq of training on their models, yeah, they're probably right. Like, I can't say anything, they're probably right, but that's the irony. OpenAI itself built GPT and like all the other models on great, incredible and vast amount of script content.
So include it, as I was saying before, every kind of copyrighted material. You can just see a lawsuit from New York Times and all the other paper companies. That's clear, and this is going to happen. In the end, we're just going to see if, at least with U.S. law, this is going to be considered as fair use. I don't think so. Or simply, we're going to have other kinds of models
we should probably shift to another kind of paradigm where traditional intellectual property concepts, as we've been seeing right now, they are breaking down and probably this is going to be changing. We have to rethink the idea in which we have to think the copyright all over the world. And I can understand that...
Some of them could say, you know, they're not copying, so you can just say that you are infringing copyright, but you're scraping, so you're not being inspired by something. You're just taking things from the web and making money out of it. So probably, and you know, that's my field, so it's something I've been trying to study as much as possible in the last three years. A proper conversation about that should be...
you know be just on uh not who he's coping whom but how to establish and it's going to be really really really difficult challenge we're going to have new frameworks for fair compensation
and attribution. That's another thing that probably everyone forget about it. In this kind of era where information are all over the web and it's going to be really hard to stop it. And in the end, I just published an article on that as well on a lawsuit in India against
against OpenAI the enforcement of this. How also, you know, let's say, Dipsy copied and scraped chat GPT and OpenAI, OpenAI scraped the web, but it's going to enforce the law.
You don't have a global idea. We just, is everything, you know, really, really theoretical. So that's why everything is being shift on geopolitical war or let's say, you know, cyber war.
Well, that's fair enough. I think with Google, it started to do AI kind of summaries of your search terms, isn't it, Google, for example. But it kind of attributes where it's getting it from. You get the sources, don't you, to the right of it. I mean, maybe that's the future. Things like OpenAI is forcing it to give its sources and attribute them in effect to you.
Yeah, there are. Are you talking about like Google search? Yeah, Google search, yeah. Yeah, but with Google search, basically, you're just adding the list of things and after you click on the link and the link goes straight on the link. In this case, you're just taking everything by yourself, putting in your model and taking money
by subscription. So legally, from the view, it's quite different. You can have a fair use looking at the US law, and on the other side, you don't have fair use. The profit is one, and the direct profit is one of the probably of the axis in which this could change between the two legal visions. Okay. Yeah, that's fair enough. Okay.
Okay, next question. And since deep seek servers are mainly based in China, this kind of raises concerns that the technology could be leveraged for foreign influence campaigns, data siloing, or potentially cyber operations. How would you respond to these concerns? Or how could we?
Yeah, we're looking at this, of course, from a Western point of view. And I could say from the other side, they could say the same. So the reality is that an advanced AI system always becomes like, I could say, a dual-use technology. Whether we're talking about deep-seq servers in China or like system host in US, Russia or anywhere else,
So these are basically the tools that can be deployed for strategical national interest in what I was told before, this ongoing digital Cold War. So I think just think about it, we could say that language models can be used for everything from analyzing intelligence to generating propaganda, from defensive cybersecurity to offensive operation,
I don't think that's not unique to Chinese companies or to Western ones. Probably simply the nature of technology itself when it's used by human purpose for geopolitical strategies. Probably if we want to just go back also to something that is not generative AI based, we can see this kind of historical patterns in every kind of transformative technology.
from nuclear power to space capabilities. Now it's again coming to the space with SpaceX and the Elon Musk. And we could say as well to the internet itself.
So always this kind of tools, they are used to trying to acquire this kind of domain for competition and strategic competition between these nations. So it's really, really valid, of course, always talking about server location and data governance. But I think we should recognize that this is not just a core issue that's related to DeepSeq.
It's probably that AI, artificial intelligence, has become another theater in this kind of modern, great power competition. And I think this is just the nature of international relations that are related to any technological advancements. So we shouldn't blame...
anyone because or everyone is to blame or no one is to blame. Yeah it's fair enough. I think I was kind of driving at that is with obviously the American legal system, their constitution, protections, things like that are presumably very different to those in China. Same with their ideology and philosophy on things like this. Just wondering if that would in some way impact
users data security, you know, they're talking, especially they bring sensitive information into deep seek, not that you should be doing that. That's true. That's true. But from another side, you see that in this moment, this kind of great discussions between how European Union treat the data
how US treat the data or will treat the data and how as well like China could do. But it's just in the last few days, like I think we are kind of looking at, I don't know if this could sound as a good word, like Americanization of Europe because Europe regulates as we know, but now they are
they are trying to soft this kind of regulation so I'm not 100% sure which is going to be the model, the ideal model to follow. I always thought that the European one was the good one but now with this kind of pressure from US and also the new administration probably we won't have
the data democratization that we were aiming for, I think so. Okay, all right. I mean, on the subject of the EU, there's obviously a right to forget, isn't there, of your personal data. I don't know how they would be able to enforce that with something like an AI model, but theoretically there's mechanisms they could use, right? Theoretically. Theoretically, all right. Fair enough then. Yeah.
The release of DeepSeek, our one model, kind of rocked the stock market a bit, particularly impacted NVIDIA and the NASDAQ. Some have said that this is a sign that the AI so-called bubble might be about to burst or is approaching a market correction. Do you have any views on that? Well, yeah, the market reaction was like really, really strong. I don't remember. I think it was like probably just NVIDIA lost money
I think it was around 500-600 billion in stock market and that the stock dropping around 20-25% I'm not 100% sure but that was the number.
And not just, of course, NVIDIA, everything that was related to US tech. But the thing is, that was influencing the crypto market. So it does work in the same way that it worked with Dogecoin by Elon Musk and all the meme coins. Everything is related to
announcement. So okay, now we're going to develop this system, the system is going to be really cheap. No one is checking if this is true and as we can see the number probably deep seek wasn't so cheap as they were promoting. But just this kind of communication, this kind of news, it went into the stock market in such a way that of course everything dropped down. So probably
I think that still to develop all this AI model we have to look at massive computing resources and investment. Probably the idea of shifting to an open source model could demonstrate a bit more of lower resources that we were thinking but from another side I don't think that's
this kind of stock market crash gonna reveal like a bubble into the legal tech AI market. That's not, of course, financial advisor, but simply, you know, it's... I don't think that has been overvalued. I think that right now this is how the market is. This is... We know that a lot of resources are required for this kind of model and if we want to build up model that are also...
with what we think in the European Union are good regulations. And then, of course, you can have a deep-seq model that doesn't use the whole model, so it could be cheaper in terms of token, in terms of how many resources are consumed day by day or for each kind of prompt or query. But I don't think that it's going to be...
that the market had an over-evaluation over that time. Or at least this is my idea looking at simply this kind of influencing news that could really, really already had, already did, had, you know,
put the market down. And I think NVIDIA is going to recover a bit because as well, right now is probably the only one huge company that could do this. Maybe other company going to come up. So we are not going to have any more monopoly. We're going to have more companies. But right now, also all the other competitors, no one probably right now produce something as NVIDIA. Stock markets, notoriously flighty.
If you have a position in the East and you're watching your stock position daily, you'll go crazy. Yes, absolutely. And plus, you know, that's the sentiment of the market. It gets emotional. When you see that NVIDIA went up, I think it was like 200, 300% over the few years and all of a sudden you see that it goes down to 20%.
You say, okay, this is going to happen, you know, I'm going to go to zero straight again. So all the investors, they will probably start selling because everyone is selling. But that's not a meme coin. You know, we're talking about a company that's got to build up quite a huge structure. So we should be careful about this. Absolutely. Okay.
Thinking of a more geopolitical side of things now, given the potential threat that DeepSeek poses to OpenAI and Microsoft and foreign governments, arguably, could we be, well, you've mentioned it a few times, but could we be heading towards a cyber cold war between AI superpowers, which could potentially lead to outages of these chatbots, these AIs, or bans of certain products in countries?
I said before that we are already in this cyber-cold war and it's involving not just the US, it's involving not just China, it's involving probably Russia, it's involving European Union and also the idea of regulation.
I wouldn't call it a nuclear arms race as in the previous century, but of course, just to use an inert word, that also this company like Gipsy or OpenAI
internally in their country and externally they are kind of proxies in this kind of geopolitical struggle. And this is why as well I think also the lawsuit in the US against OpenAI, they are going slowly and probably the copyright as speaking before is going to change. They are going to change internal law also because now there are tools for this kind of geopolitical war.
There is this really famous sentence that you say that U.S. innovate, China replicates and sometimes scales, and Europe regulates.
But now, you know, it's where this comes really interesting, and this is why now, in this day, everything is kind of tricky. Europe's strict regulatory approach, particularly around privacy, you know, DeepSeq in Italy now is not working anymore if you don't use a VPN. It's not working in Belgium anymore if you don't use a VPN. So this kind of regulatory approach is functional, right?
to US to block DPSIC in Europe. So US doesn't want European regulation because they say it's too strict, but this kind of European regulation is not blocking DPSIC to overcome into the market of generative guy model. So
In this case, the AI Act or the GDPR are not coming anymore to their eyes as this kind of bureaucratic hurdles, but probably a bit more of a defensive barrier against the expansion of Chinese AI in Europe. But from another side, we don't know how this is going to last because from another side, they want Europe to deregulate
So it's kind of in the middle and now everything is becoming too much probably political. So that's why I'm talking about this cyber cold war.
and less related to law. It's less related to the consumer's right, it's less related to rights of content creators, so less to copyright and everything. So now we are assisting in probably a shift between what has been democracy and the rights and enforcement of rights as in the previous century because I think that the stakes are too high to...
to leave the democracy as we've been living till today. That's kind of sad, but honestly, I don't see, I can't foresee anything that could really, really make the users and consumers' rights enforced. Now it's too high, and it's really too high, and probably Europe is going to be much closer to U.S.,
And probably we will see a bit of less regulation. So probably the US will innovate, the China will innovate as well, and Europe will follow. That doesn't make me happy because law and regulation is probably what brought our continent to be a superpower because we don't have innovation of China, we don't have innovation of Europe,
of us but we got rules to to have a better environmental world and if we lose that well i don't know what's gonna be regulation it's a very blunt tool um it tends to be either ineffective or like as you've seen you mentioned the vpns you can bypass that you could pass regulation deep seek is banned here but you can bypass that with a vpn
It just doesn't work. And there's no way it's too slow to catch up. So the secret is going to be innovation. I mean, philosophically, like Europe and Britain, we differ. As you say, Europe regulates. Britain, for example, everything is legal until it's illegal.
but in Europe it's the opposite philosophy, you know. Yeah, but in UK, like plenty of authors and content creators, they are opposing all the laws that, you know, allow this copyright, so the end of copyright. So that's depending on what you want to privilege. If you want to privilege the authors, of course, the regulation is going to be good. If you want to do innovation at any cost, there is someone that is going to,
pay a price. So that's why it becomes political. That's a fair enough answer. Okay, last one. With DeepSeek proving that AI can be developed on a small budget, if it's to be believed, could we see an influx of cost-efficient AI startups potentially challenging big tech over the next few years or not?
Yeah, like I said before, that's a bit misleading because we need to challenge the premises here. DeepSeq hasn't proved, absolutely, hasn't really proved that AI can be developed on a smaller budget. Probably, I wouldn't say quite the opposite, but in the end, probably we won't see a huge evolution in this competitive landscape right now. Maybe something's going to change in the future. Yeah.
probably the market is going to be developed much more in a niche idea. So there's going to be some kind of generative AI model that's going to be specialized in application rather than general purpose AI, as we're looking right now. Let's say DIP, the translation model. So something that just works in a really good way with translating, so not general purpose.
We will have to see what it's going to be, having more open source models where, you know, this kind of niche is going to be built up. How this kind of model is going to be trained, because, you know, if you have to think about compensation
for the scraping that has been done. This is going to have a cost. If tomorrow, let's say, some court is going to decide that this, you know, you can train and you can scrape the whole internet so the copyright is going to be forever dead, this is going to be cost-reducing. So there are still a few parameters that we have to look at. So I don't think there's going to be...
much more startups that could directly challenge OpenAI or Google or Microsoft in building these kind of foundational models. But probably we're going to have a bit more niche models, so a bit more specialized application. But there are still too many parameters that we have to look at, and now it's really not the... We don't have all the...
we can't say right now I think what it's gonna be. It's gonna be cheaper as always in the world but you know there are also uh there is also the idea that it's not gonna be cheap as we think looking at deep-seq development. Fair enough. I don't know if you know the answer I don't know the answer either but um
It must be ways to, say you do publish your own work online. It must be a way to block the training data models accessing it without something paying for it or I don't know, a way around it like that. Like a kind of,
AI paywall or something. That's one of the discussions that is going on in the UK as well. How much it is. If the percentage of your work is 0.00001%, probably you're going to receive revenues as you can receive on Spotify as a singer. So not so much. Yeah.
Fair enough. That's all of my questions. Is there anything else you'd like to add? You think that's important we haven't touched on? I think we've been touching quite a few things. Probably, I'm just going to take the Spotify idea and say that probably this kind of model that now, right now,
music has been downloaded i found it in the creative commons music label uh 15 years ago because i thought uh there was no any more way of letting people pay uh you know for downloading music like music is available everywhere you can listen to youtube you can download it you can put in your ipod how can you enforce people to pay music so
That's why I would have decided to simply release the music with Creative Commons licenses. And no one is going to be useless having copyright. Not because I don't trust and don't believe in copyright, but if you have a law and you don't have any way to enforce it, of course, this is going to be...
This is going to change a bit the idea of this. So I wouldn't say we are going probably to look at revenue model as a Spotify for generative high data training, but I think that something similar is going to go in that way.
So probably, let's say, the biggest New York Times or like huge authors or like huge paper company or huge model or graphics or illustration company, they will receive probably a bit of compensation out of court that will be enough or not, we don't know, to produce any more contents.
But if we look at what happened in music, we don't have any more little halter, we don't have any more little production houses or firms. So
I'm worried that probably it's going to happen the same with content creation and with editors and with authors. So probably we're going to have a Spotify model, also a compensation for generative AI. But this is not going to be under a private company. Probably this is going to be established by a federal law in the US or a law in the UK. And probably later on, slowly, slowly, slowly later on, also Europe will follow.
Okay, excellent. Great. And with that then, Alfredo, that's our time up. Thank you for your time. Also, don't forget to subscribe to IE Plus for premium insights and exclusive content.