We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode News Companies Sue OpenAI for Allegedly Violating Contract Terms

News Companies Sue OpenAI for Allegedly Violating Contract Terms

2024/5/3
logo of podcast AI Education

AI Education

AI Deep Dive AI Chapters Transcript
People
主持人
专注于电动车和能源领域的播客主持人和内容创作者。
Topics
我今天要报道一个关于人工智能的大新闻:五家主要的美国报纸正在起诉OpenAI版权侵权,同时还起诉了微软。我认为,仅仅起诉新兴公司是不够的,还需要起诉为其提供资金支持的微软。这是一个非常有趣的诉讼案件。OpenAI最近与多家新闻机构达成了协议,例如与Axel Springer和金融时报的合作,以及之前与纽约时报的诉讼和解。我认为,OpenAI与众多新闻出版公司签订协议,向其支付内容费用,这引发了其他公司的效仿,他们也想分一杯羹。OpenAI的训练数据可能包含所有新闻公司的数据,这使得所有新闻公司都有理由起诉。即使OpenAI排除了特定新闻机构的数据,也无法完全避免其训练数据中包含该机构的数据,因为数据很容易被复制和转载。互联网上复制和引用文章片段可能构成合理使用,这使得版权侵权的界定更加复杂。许多新闻机构也会引用其他来源的内容,这进一步模糊了版权的界限。许多主要新闻机构都直接引用了Marcus Brownlee YouTube视频和社交媒体评论中的内容,这可能构成合理使用。证明OpenAI使用了特定新闻机构的数据,需要OpenAI输出该机构文章的直接引用,但这并不一定意味着OpenAI直接使用了原始文章。OpenAI输出的引用内容可能来自其他来源的引用,这使得判断其是否构成侵权变得困难。早期的AI工具就已经被用于改写文章,这与人们阅读文章后撰写评论并无本质区别。使用AI工具改写文章与人们阅读后发表评论之间存在灰色地带。起诉OpenAI的八家报纸都属于同一家投资公司——Alden Global Capital。纽约时报此前也曾对OpenAI提起类似诉讼,但它是独立行动的。Alden Global Capital起诉OpenAI的主要动机是经济利益。许多新闻机构与OpenAI达成了付费协议,而微软对此事保持沉默。起诉OpenAI的八家报纸都是地方性报纸,属于Alden Global Capital旗下。Alden Global Capital聘请了与纽约时报相同的律师事务所,并在同一地区提起诉讼。Alden Global Capital的诉讼策略可能是为了搭便车,利用纽约时报的诉讼结果。Alden Global Capital的诉讼策略是希望法官合并两起案件。Alden Global Capital选择起诉OpenAI和微软,而不是谈判达成协议。Alden Global Capital可能因为谈判未能达成理想的赔偿金额而选择起诉。此案的结果将对AI公司如何使用新闻内容产生重大影响。Alden Global Capital可能将旗下更多报纸加入诉讼。Alden Global Capital可能会将所有60家报纸都加入诉讼。此次诉讼的核心是OpenAI未经授权使用新闻文章训练模型,构成版权侵权。诉讼还包括OpenAI和微软删除版权管理信息以及稀释商标的指控。诉讼指控OpenAI和微软在ChatGPT和Copilot中使用了新闻机构的商标。诉讼还指控OpenAI和微软因AI的幻觉而损害了新闻机构的声誉。AI的幻觉可能会损害新闻机构的声誉,但可以通过技术手段来解决。OpenAI使用新闻机构的数据可能反而提升了它们的知名度。新闻机构担心AI工具会取代搜索引擎和广告收入。此案的结果将对AI公司如何使用新闻内容以及新闻出版商的商业模式产生重大影响。生成式AI工具正在颠覆传统的搜索引擎和新闻获取方式。AI新闻简报正在取代传统的新闻阅读方式。新闻出版商担心AI工具会影响其广告收入。OpenAI否认ChatGPT故意复制粘贴新闻内容,并表示这只是一个罕见的bug。

Deep Dive

Chapters
Major US newspapers are suing OpenAI and Microsoft for copyright infringement, alleging that OpenAI's training data included copyrighted material without permission. The lawsuits raise complex issues around fair use, data scraping, and the impact of AI on traditional news publishing.
  • Five major US newspapers are suing OpenAI and Microsoft.
  • The lawsuit alleges copyright infringement and the unauthorized use of copyrighted articles to train AI models.
  • The case questions the boundaries of fair use in the context of AI training data.

Shownotes Transcript

Translations:
中文

Today we have some big news in AI and that is the fact that five major US newspapers are currently suing OpenAI for copyright infringement and they're also suing Microsoft. I think, you know, you don't want to just sue the new startup. You want to sue the guy that's actually giving them all the money, which of course is Microsoft. So this is kind of an interesting deal and lawsuit. I want to break it down. We've seen similar lawsuits in the past, but all of this is coming on the backs of OpenAI.com.

making a bunch of new deals with news corporations. We have a deal with Axel Springer. They just last week signed a deal with the Financial Times. And of course, we've had them have this whole lawsuit and a deal conclusion with the New York Times as well in the past. So in my opinion, what's really going on right here is probably that OpenAI is doing deals with a lot of news publishing companies.

They're paying them out essentially for their content or for whatever. And that's totally cool. But I think because of that, then of course we have a bunch of other companies that are like, hey, look, we want a little piece of the action, right? They don't want just the New York Times get paid out from the lawsuits. They all want to jump on. And you pretty much see this where if there's like a piggy bank, everyone wants to smash it. The more money that OpenAI gives out to other news corporations, new ones are going to jump on. And I think because OpenAI essentially trained from the entire industry

internet, like, yeah, you can make a case for every single news company that they were included in the training set. And something important I want to say is in these lawsuits, and we'll break down more of the specifics, but like to essentially conclude that, look, OpenAI trained off of like, let's say the New York Times, it doesn't even mean like if OpenAI blacklisted the New York Times' URL, which I don't actually think they did because I think they wanted all that data, but let's say they did because they didn't want to get in some sort of lawsuit with them.

That doesn't mean that they would be completely exempt from having trained off the New York Times data because it's super easy for anyone on the internet to copy and paste the article and repost it on their own website. Now that's, you know, step one and maybe the New York Times could get mad at that website or whatever, especially if it's supposed to be behind their paywall or whatever. It gets a little bit trickier though when something that's like fairly fair use is that you go to any article on the internet and you

and you copy like a paragraph or a couple sentences and you quote them somewhere, right? You're like, as the New York Times said in their article, this blah, blah, blah happened. And also, lest you think like that there's some like wholly protected class of New York Times articles, like I've seen Bloomberg and the New York Times

quote things from other people as well, or other blogs, or, you know, most recently, we have this huge tech debacle where we have Marcus Brownlee, who's been reviewing a bunch of AI tech, and he gives he's given a bad reviews, whatever. And it's this big viral moment. It's on Twitter. Yeah, every major news company is quoting direct quotes from his YouTube videos. And they're also, you know, direct quotes from his ex account and people that are commenting on his ex posts. So like,

I don't know. It's in my opinion, it's all fair use, right? Like we have things happening and there's commentary going down. So this all, I say all this because it makes it a little bit trickier when you're doing these lawsuits and you're saying like OpenAI trained off of a New York Times snippet, because the way they actually prove that is they try to go to OpenAI and say like, hey, give me like an exact quote from this article about this thing that happened. And if

if opening eyes spits out like a direct quote from the new york times article the new york times like haha we caught you right so anyways but like it doesn't necessarily mean it's actually from the original article it could be someone quoting it and is that even bad because like it's kind of fair use for us to grab quotes from things that are happening and and talk about them i think like obviously it would be bad if you um said what happened in this specific event and it like literally copy and pasted the whole new york times article but it

but it's not even doing that. And I think it gets one step further. Sorry for all of my analogies here, but the last thing that is important is as these AI tools are becoming more and more popular, even before ChatGPT came out, when we had things like Jasper AI, which I was using back in like two years ago, September, a couple of years ago, I was using this thing a ton. And that was kind of essentially DaVinci, which was an older version of ChatGPT before ChatGPT was released, whatever. The point is,

People were using these AI tools to rewrite articles. And today you can go stick a New York Times article into Chatsheput, get it to rewrite it, even though the New York Times might have had exclusive data on that. And then maybe the New York Times could get mad about that. But at the same time, what's the difference between that and like me reading a New York Times article and writing my opinion piece on what they said, right? Everyone's kind of doing it.

So there's a lot of gray areas. So these lawsuits are a little interesting. Let's dive into exactly what's going on here. So who are the newspapers? They're eight big newspapers owned by one company. I think this is what's important. The same investment company, which is the Alden Global Capital, is

I think it's important because it's not like all these guys are friends and they all get together and sue OpenAI. Really, it's Alden Global Capital that's suing OpenAI. So let's make that clear. The second thing is the New York Times had a similar lawsuit and they got a bunch of... They had a bunch of publishing claims. Up until now...

They were the only one that really took legal action against them. What's interesting is that it seems like they might still be the only independent one that took legal action versus like this is an investment group. So, you know, they are an investment group, so they probably want money, whatever. This is it's definitely about the money. So a lot of other newspapers, including the Financial Times, Associated Press, Axel Springer, have all made specific statements.

paid deals with OpenAI, other AI companies where they're getting millions of dollars annually to essentially get their content included. Microsoft isn't talking about this. They're not commenting on this whole story, this whole lawsuit, whereas OpenAI is a little bit more chatty about it. But who are these

companies that essentially are one big conglomerate. We have the New York Daily News, Chicago Tribune, Orlando Sentinel, South Florida Sun Sentinel, San Jose Mercury News, Denver Post, Orange County Register and St. Paul Pioneer Press. OK, these are like local newspapers, if we're being honest. Right. I guess Chicago Tribune, probably one of the bigger ones in there. But otherwise, yeah, these are like local city news. Really one big investment firm.

So they're all being represented by Rothwell, Figg, Ernest, and Manick, which is one of two law firms that was supporting the New York Times in their lawsuit against OpenAI Microsoft. So it's kind of like we get a big investment firm that owns a bunch of these newspapers. They go hire the same law firm that was used by the New York Times to sue OpenAI. The lawsuit was filed in the same district as the Times lawsuit, which

interesting because if the same judge is chosen to oversee both of the cases, they could actually choose to combine the two complaints. Was this done on purpose? Was this random? Obviously this was done on purpose. They

They see this lawsuit from the New York Times is probably going well or has a high chance of going well. So this investment firm grabs their conglomerate of newspapers. They file a similar lawsuit in the same district, hoping the same judge gets it. If the same judge gets it, he could combine the two cases. And essentially, this investment firm would be piggybacking on the New York Times' case. So you can obviously see some pretty solid financial motives for all of this. But it's interesting. Yeah.

someone that apparently was familiar with this and was reporting by Axios. So, you know, I mean, the familiar with the case, I like put a caveat on that because I hate those kind of sources when there's not an actual person's name behind it. And you could get fake stuff in news all the time on this. But in any case, apparently, according to Axios, someone familiar with the Alden subsidiaries that owns the newspapers said that the paper is right now opting to sue the

the two firms instead of attempting to negotiate a deal right so the new york times tried to negotiate a deal with opening eye and microsoft for a bunch of months before their lawsuit um which opening i said was like a surprise because they're like i thought we're gonna negotiate a deal but you know it took them a long time evidently what that means when they say tried to negotiate a deal and didn't work is that like they asked for way too much money or according to opening i way too much money opening i didn't want to pay the price they were asking so they're like fine we'll sue you um

So evidently, same thing. They tried to negotiate a deal and then now comes a lawsuit because they're not getting the price they want. It's going to be interesting to see if they ever get the price. I think it's going to be a precedent that, you know, essentially has a lot of implications for these AI companies, what they're able to do. So.

For now, Alden isn't ruling out having more of its, I think they own 60 newspapers eventually join the lawsuit. They're starting with eight right now. I think what that means is they like found eight newspapers that they had like really high odds of them, you know, being included in the data set. And they're like, well, we don't maybe 100% know or can prove if all 60 are, but they're like,

They're like not ruling out adding all 60 of their newspapers to this lawsuit. I guess if it comes down to it where it's like per news site, they get a certain amount of money, then they're like, well, 60 is better than 80 are multiple on that. So they might just like dump them all in. It'll be interesting to see what happens.

So how does all of this work? Similar to the New York Times lawsuit, I think at the really center of this complaint is copyright infringement claims around essentially OpenAI using their articles to train the model. The newspapers right now, or this investment firm, is accusing OpenAI and Microsoft of, quote, purloying millions of the publisher's copyrighted articles without permission and without payment. And this is, of course, to make money from Chet Chepti. So the

The newspaper right now is also claiming that OpenAI and Microsoft removed copyright management information like the journalist's names and titles from the work when the information was cited. So the lawsuit also includes diluted trademark claims, which allege essentially that OpenAI and Microsoft are...

They didn't have like the authorization to essentially use this. And they they also say that they use the newspapers trademarks in the answers on ChatGPT and Copilot. You can kind of like join Microsoft and opening eye together in this complaint because Copilot is essentially ChatGPT running in the back end too, right?

So one other thing that the newspaper has also accused both of these companies of is reputational damage because of the AI's hallucinations. So essentially they're like, look, you're like using our info and you're like using that to give people answers, but then it can also hallucinate and say the wrong thing. And that's bad for us because I think this, this point is kind of moot because like, obviously may, okay, maybe I guess if I straw man, this or steel man, this, maybe what they're saying is, um,

according to the New York Times article, this, this, this, this thing happened, right? So maybe ChatGPT could say that. And then it had like some erroneous fact. And then they're like, it's going to damage our reputation because now it's associating our brand name with that. The easy solution is to just like, say, tell ChatGPT they could just hard code it to say you're never allowed to say the New York Times or the Washington Post or whatever. There's obviously pros and cons to that. It'd make it feel like less high quality stuff. I think that

If they want to play hardball, OpenAI does that. And as far as like stealing their copyright and branding, obviously it's like Google, in my opinion. It's helping their branding. It's making them relevant to cite them. The last time I went to the New York Times to read an article was, frankly, never. Probably because it has a paywall, but also because...

I don't know. You know, I get my news in other ways. I get it from newsletters. I get it from different places. So it seems like it's becoming less relevant. If you want to stay relevant, I think it's probably great to be sordid cited. You know, there's a higher chance that I'm going to go read a New York Times article if I see it quoted in something, even in chat GPT, then if it then if it never says the name. So it's kind of interesting because they're like mad they're using their data and they but they don't want them to use their name, whatever.

Let's talk about the big picture here. I think the outcome of this lawsuit is going to be really big for how AI companies incorporate news into their content. So news publishers up until now have obviously just relied on ad revenue that has come from search traffic. It's been 20 years where we've seen this. Generative AI tools can essentially wipe that out because I

I'm not always going to Google. I mean, honestly, I don't, yeah, I use perplexity. I use you.com. I use a lot of other tools instead of Google. So Google's getting disrupted in this, but I'm getting like a concise roundup of answers. I also use like newsletters, which has kind of replaced my, uh, like AI newsletters are fantastic. Um,

Shout out. And so like I use that to get a lot of my AI news and they cite the sources, but I'm not going to click on it if they give me a really concise, which I know they just use AI tools to like to like summarize an article or whatever. But if they're just going to summarize it right there, I don't need Google to find it and I don't need to read the article to get the information. So like, yeah, between the two of those is quite disruptive. So.

That's been the model, and it's definitely getting disrupted. They're all concerned about that, and they all don't want this to happen. So it's going to be interesting to see how this plays out. Opening Eye right now has already opposed both of these lawsuits. They're fighting both of them in courts. And they say that the New York Times specifically cited how Opening Eye's tools regurgitated verbatim copies of the New York Times. So they're saying it's literally just

spitting out the same thing and they say that this is a rare bug that we're working to drive to zero like in other words chai chibi t isn't designed to copy and paste so this is probably not something that happens a ton if it does it's a rare bug they're working to drive to zero as they've said

And so, yeah, obviously everything's just going to get rewritten. In any case, it's going to be fascinating to see how this goes. If you enjoyed the podcast today, if you learned something new about these lawsuits and kind of how the landscape is going to play out in the future, I would really appreciate it if you like the video or give us a review. If you're listening on Apple or Spotify, subscribe and I will catch you in the next video.